cs42200:spring19:labs:lab03

Lab 03 : Design and Implement a File Access Client and Server using the Socket API

Feb 9th (02/09/2019) 11:59 pm.

  • Use the socket API to implement a simple File Access Protocol that allows a client to read a remote file
  • Handle multiple types of messages by parsing text
  • Handle errors gracefully

In this lab you will write a simple file access server and client using the Socket API in Linux. The communication between client and server will be done using the Transmission Control Protocol (TCP). The server will support four operations as specified in later sections.

  1. To connect to a machine in the Xinu lab: ssh [Put Purdue Account Name Here]@[Put Target Computer Here]
    1. For example:arastega@xinu1.cs.purdue.edu
    2. Note: Target computer should be one of the xinu machines (xinu01-xinu21)
  2. You will be prompted for your password
  3. Once the password has been verified, you will gain access to the remote system
  4. You should already have directory cs422 under your home directory (provided you completed lab1 correctly)
  5. Download lab03.tar.gz with wget under ~/cs422/. Note that option '-O' is the letter O and must be capitalized.
    cd ~/cs422/
    wget http://courses.cs.purdue.edu/_media/cs42200:spring19:lab03.tar.gz -O lab03.tar.gz
  6. Untar the downloaded file.
    tar xvf lab03.tar.gz
  7. Navigate to the apps directory under lab03.
     cd ~/cs422/lab03/apps 
  8. You will find two empty files under this directory: server.c and client.c . Your code for server and client will go in these files. Use echoserver.c and echoclient.c as hints on how to write the code for this lab. There is also one global.c file that can be used for writing helper functions if you need to do that.
  9. To compile the files:
    cd ~/cs422/lab03/compile_linux/
    make server client 
  10. If compiled without errors, this will generate two executables: server, client.

You will implement a File Access client that receives a user request from standard input and sends the appropriate message to the file access server to carry out the request. You will also implement a server that accepts commands from the client, performs the specified operation, and returns a result to the client. You have already seen in class the several ways “transfer protocols” represent messages - purely as text (ASCII) or integers (network byte order) with defined meanings, or a combination of both. In this lab, we will use the first method - All messages will be transferred as text (ASCII), just like SMTP.

The client will be provided the following parameters on the command line:

  1. Server IP address
  2. Server application number (= port number)
./client SERVER_IP_ADDRESS SERVER_PORT_NUMBER

The commands will be provided to the client using the standard input. The client will pass on the message to the server and when the server responds, the client will print the response on the standard output. While printing, all messages sent by the client must be preceded with “C> ” and messages from the server must be preceded with “S> ”

Here is sample output in the format expected:

$ ./client SERVER_IP_ADDRESS SERVER_PORT_NUMBER
C> OPEN testfile1.txt
S> 1

The server will be provided only one parameter on the command line:

  1. Application number (= port number)
$ ./server SERVER_PORT_NUMBER

The server should not print any output while it's running. For debugging purposes, you can add printf statements but make sure all debugging output is removed when you submit the final code.

Overview of the Socket API

Setting up a simple TCP server involves the following steps:

  1. Creating a TCP socket, with a call to getaddrinfo and socket.
  2. Binding the socket to the listen port, with a call to bind(). Before calling bind(), a programmer must declare a sockaddr_in structure, clear it (with memset()), and the sin_family (AF_INET), and fill its sin_port (the listening port, in network byte order) fields. Converting a short int to network byte order can be done by calling the function htons() (host to network short).
  3. Preparing the socket to listen for connections (making it a listening socket), with a call to listen().
  4. Accepting incoming connections, via a call to accept(). This blocks until an incoming connection is received, and then returns a socket descriptor for the accepted connection. The initial descriptor remains a listening descriptor, and accept() can be called again at any time with this socket, until it is closed.
  5. Communicating with the remote host, which can be done through send() and recv() or write() and read().
  6. Eventually closing each socket that was opened, once it is no longer needed, using close().

Programming a TCP client application involves the following steps:

  1. Creating a TCP socket, with a call to getaddrinfo and socket.
  2. Connecting to the server with the use of connect(), passing a sockaddr_in structure with the sin_family set to AF_INET, sin_port set to the port the endpoint is listening (in network byte order), and sin_addr set to the IP address of the listening server (also in network byte order.)
  3. Communicating with the server by using send() and recv() or write() and read().
  4. Terminating the connection and cleaning up with a call to close().

Here are the list of socket API functions that you should use.

socket()

#include <sys/types.h>
#include <sys/socket.h>
 
int socket(int domain, int type, int protocol);

socket() creates an endpoint for communication and returns a file descriptor for the socket. socket() takes three arguments:

  • domain, which specifies the protocol family of the created socket. For example:
    • AF_INET for network protocol IPv4 or
    • AF_INET6 for IPv6.
    • AF_UNIX for local socket (using a file).
  • type, one of:
    • SOCK_STREAM (reliable stream-oriented service or Stream Sockets)
    • SOCK_DGRAM (datagram service or Datagram Sockets)
    • SOCK_SEQPACKET (reliable sequenced packet service), or
    • SOCK_RAW (raw protocols atop the network layer).
  • protocol specifying the actual transport protocol to use. The most common are IPPROTO_TCP, IPPROTO_SCTP, IPPROTO_UDP, IPPROTO_DCCP. These protocols are specified in <netinet/in.h>. The value 0 may be used to select a default protocol from the selected domain and type.

The function returns -1 if an error occurred. Otherwise, it returns an integer representing the newly-assigned descriptor.

Since you should use TCP and IPv4 for your application, use SOCK_STREAM and AF_INET.

bind()

int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen);

bind() assigns a socket to an address. When a socket is created using socket(), it is only given a protocol family, but not assigned an address. This association with an address must be performed with the bind() system call before the socket can accept connections to other hosts. bind() takes three arguments:

  • sockfd, a descriptor representing the socket to perform the bind on.
  • my_addr, a pointer to a sockaddr structure representing the address to bind to.
  • addrlen, a socklen_t field specifying the size of the sockaddr structure.

bind() returns 0 on success and -1 if an error occurs.

listen()

int listen(int sockfd, int backlog);

After a socket has been associated with an address, listen() prepares it for incoming connections. However, this is only necessary for the stream-oriented (connection-oriented) data modes, i.e., for socket types (SOCK_STREAM, SOCK_SEQPACKET). listen() requires two arguments:

  • sockfd, a valid socket descriptor.
  • backlog, an integer representing the number of pending connections that can be queued up at any one time. The operating system usually places a cap on this value. For this lab, you can set it to 5.

Once a connection is accepted, it is dequeued. On success, 0 is returned. If an error occurs, -1 is returned.

accept()

int accept(int sockfd, struct sockaddr *cliaddr, socklen_t *addrlen);

When an application is listening for stream-oriented connections from other hosts, it is notified of such events and must initialize the connection using the accept() function. The accept() function creates a new socket for each connection and removes the connection from the listen queue. It takes the following arguments:

  • sockfd, the descriptor of the listening socket that has the connection queued.
  • cliaddr, a pointer to a sockaddr structure to receive the client's address information.
  • addrlen, a pointer to a socklen_t location that specifies the size of the client address structure passed to accept(). When accept() returns, this location indicates how many bytes of the structure were actually used.

The accept() function returns the new socket descriptor for the accepted connection, or -1 if an error occurs. All further communication with the remote host now occurs via this new socket. Datagram sockets do not require processing by accept() since the receiver may immediately respond to the request using the listening socket.

connect()

int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);

The connect() system call connects a socket, identified by its file descriptor, to a remote host specified by that host's address in the argument list. Certain types of sockets are connectionless, most commonly user datagram protocol sockets. For these sockets, connect takes on a special meaning: the default target for sending and receiving data gets set to the given address, allowing the use of functions such as send() and recv() on connectionless sockets. connect() returns an integer representing the error code: 0 represents success, while -1 represents an error.

gethostbyname(), gethostbyaddr(), getnameinfo(), getaddrinfo()

struct hostent *gethostbyname(const char *name);
struct hostent *gethostbyaddr(const void *addr, int len, int type);

The gethostbyname() and gethostbyaddr() functions are used to resolve host names and addresses in the domain name system or the local host's other resolver mechanisms (e.g., /etc/hosts lookup). They return a pointer to an object of type struct hostent, which describes an Internet Protocol host. The functions take the following arguments:

  • name specifies the name of the host. For example: www.cs.purdue.edu
  • addr specifies a pointer to a struct in_addr containing the address of the host.
  • len specifies the length, in bytes, of addr.
  • type specifies the address family type (e.g., AF_INET) of the host address.

The functions return a NULL pointer in case of error, in which case the external integer h_errno may be checked to see whether this is a temporary failure or an invalid or unknown host. Otherwise a valid struct hostent * is returned. These functions are not strictly a component of the BSD socket API, but are often used in conjunction with the API functions. Furthermore, these functions are now considered legacy interfaces for querying the domain name system. New functions that are completely protocol-agnostic (supporting IPv6) have been defined. These new function are getaddrinfo() and getnameinfo(). They are based on a new addrinfo data structure and documentation on these functions can be found in: http://beej.us/guide/bgnet/pdf/bgnet_USLetter.pdf

recv()

ssize_t recv(int sd, void *buf, size_t nbytes, int flags);

Read at most nbytes bytes into buf from sd. Returns -1 on error, 0 on EOF (connection closed), or the actual number of bytes read. The function take the following arguments:

  • sd: socket descriptor.
  • buf: address of char buffer.
  • nbytes: sizeof buf.

You should use 0 for the flags in your application.

send()

ssize_t send(int sd, const void *buf, size_t nbytes, int flags);

Attempts to write nbytes from buf to sd. Returns -1 on error or the number of bytes actually written. Arguments similar to recv().

You should use 0 for the flags in your application.

close()

int close(int sd);

Close socket descriptor sd. Returns -1 on failure, otherwise 0.

There are four types of messages that a file access server supports as explained in the following sections.

This command is sent by the client to the server. It is used to open a file. The command is specified as follows:

OPEN, followed by a single space followed by the filename, and a single newline ('\n').

OPEN testfile1.txt

In this case, the actual bytes that will be sent to the server will be:

'O' 'P' 'E' 'N' ' ' 't' 'e' 's' 't' 'f' 'i' 'l' 'e' '.' 't' 'x' 't' '\n'

There are two possible responses by the server: 1 ( '1' ) or -1 ( '-' '1' ). Remember, these are text characters and not integers.

You must consider the following cases in the server:

  • If the file name contains “..” anywhere, return -1
  • If the specified file does not exist, return -1
  • If a file is already opened in the server, return -1
  • If no file is currently opened, and the file specified opens successfully, return 1

You must have figured out by now that you need to maintain some state in the server. Here you need to keep a state of whether there is any file that is already open. As you will see further, you will need to maintain more state related to open files.

You can assume that a file name does not exceed 1024 characters.

This command is sent by the client to the server and is used to read bytes from an opened file. The command is specified as follows:

READ followed by Number of bytes to read (in ASCII text) followed by a single newline ('\n')

READ 10

As you can see, read only specifies the number of bytes to read. This is the number of bytes from where the client stopped reading the last time (or the beginning of file if this is the first read). As you must have guessed, you need to know where the current offset (or cursor) is in the file. If you have handled files in UNIX before, you know that UNIX maintains this information for you. The read() system call in UNIX updates the offset and the next read() starts from where it was last left off. Unfortunately, an operation specified in the next section will require you to maintain this state explicitly in the server and not rely on UNIX state.

The server responds in the following format: Number of bytes followed by a single space followed by data bytes from file

You must be wondering why the server needs to specify the number of bytes since the client already knows how many bytes were asked for. This is true if there are enough bytes left in the file. But with the current position in file, let's say there are only 5 bytes left till EOF, and the client asks for 10, the server responds with only 5 bytes. In this case, the server has to specify first how many bytes are sent in the message followed by the actual bytes from the file.

If the offset is at the end of file and the client asks for more bytes, the server responds with -1

Here is an example of how READ works:

# At the server
$ echo -n "123456789" > testfile1.txt

$ ./server SERVER_PORT_NUMBER

# At the client
$ ./client SERVER_IP_ADDRESS SERVER_PORT_NUMBER
C> OPEN testfile1.txt
S> 1
C> READ 4
S> 4 1234
C> READ 3
S> 3 567
C> READ 5
S> 2 89
C> READ 2
S> -1

This command is sent by the client to the server and is specified as follows:

BACK followed by number of bytes in ASCII text followed by a single newline ('\n')

BACK 5

The BACK command takes the offset in the file back by the number of bytes specified. This command is the reason why you need to store the current offset in the file at the server side. UNIX file seek operation cannot do this directly. But, if you know the exact position, you can use the UNIX lseek() system call.

The server responds with the following messages:

  • 1 if the operation is successful
  • -1 if you ask the server to go back before the start of the file. In this case, the server will not make any change to the current offset. The server also responds with -1 if the system call fails.

Here is an example of how BACK works:

# At the server
$ echo -n "123456789" > testfile1.txt

$ ./server SERVER_PORT_NUMBER

# At the client
$ ./client SERVER_IP_ADDRESS SERVER_PORT_NUMBER
C> OPEN testfile1.txt
S> 1
C> READ 6
S> 6 123456
C> BACK 4
S> 1
C> READ 5
S> 5 34567
C> BACK 10
S> -1
C> READ 2
S> 2 89

This is a command sent by the client to the server and is specified as follows:

CLOS followed by a single newline ('\n')

CLOS

This commands closes an open file in the server. The server responds with:

  • -1 if there is no open file, or if the close() system call fails
  • 1 if there is a file open and the close() system call succeeds
  • All the commands are exactly 4 bytes (OPEN, READ, BACK, CLOS). At the server side, when it is time to receive the next command, you can read exactly 4 bytes, and then depending on the command read the rest of the input.
  • Use system calls open() read() lseek() close() on the server side to perform file operations. You can read the details in the corresponding man pages
  • As mentioned, you have to store some state at the server side. You must store the open file handle (if a file is opened) and the offset in the file. This offset must be updated after READ and BACK operations.
  • Remember, recv() system call to receive bytes from a TCP connection can return any number of bytes less than or equal to the number of bytes asked for. You need to handle this as you did in the previous labs.
# At the server
$ echo -n "1234567890" > testfile1

$ ./server SERVER_PORT_NUMBER

# At the client
$ ./client SERVER_IP_ADDRESS SERVER_PORT_NUMBER
C> OPEN testfile2
S> -1
C> OPEN ../testfile3
S> -1
C> OPEN testfile1
S> 1
C> READ 7
S> 7 1234567
C> BACK 2
S> 1
C> READ 10
S> 5 67890
C> READ 2
S> -1
C> BACK 8
S> 1
C> READ 3
S> 3 345
C> BACK 8
S> -1
C> READ 2
S> 2 67
C> CLOS
S> 1

Note: OPEN testfile2 failed becuase it does not exist. OPEN ../testfile3 failed because the file name contains “..”

# At the server
$ echo -n "1234567890" > testfile1

$ echo -n "abcdefghij" > testfile2

$ ./server SERVER_PORT_NUMBER

# At the client
$ ./client SERVER_IP_ADDRESS SERVER_PORT_NUMBER
C> READ 10
S> -1
C> BACK 6
S> -1
C> CLOS
S> -1
C> OPEN testfile1
S> 1
C> READ 2
S> 2 12
C> OPEN testfile2
S> -1
C> CLOS
S> 1
C> OPEN testfile2
S> 1
C> READ 4
S> 4 abcd
C> CLOS
S> 1

Note: READ, BACK and CLOS must return an error if no file is opened at the server. Also, as shown above, a client should be able to close a file and open a new one in the same connection. You can assume the number tested will always be positive integer and will not overflow.

To earn extra credit, design a concurrent server. That is, design the server such that it is able to handle multiple clients at the same time. Specifically, the server must handle multiple TCP connections simultaneously. Your concurrent server will need to maintain state for each client separately – the socket used for communication with each client, the file opened for the client, and the position in the file.

As such, there are three ways a server can handle multiple connections at the same time:

  • Using the select() system call
  • Forking a separate process for each new client (i.e., each new TCP connection)
  • Using pthreads to create a separate thread to handle each client (i.e, each new TCP connection)

This webpage provides a nice tutorial on the first 2. This webpage shows how to use pthreads with a server

NOTE Please create a README file under lab03 directory if you implemented the extra credit part. Otherwise, you can ignore this note.

You should use turnin command to submit your whole directory.

cd ~/cs422
turnin -c cs422 -p lab03 lab03

You can check with turnin -v.

turnin -c cs422 -p lab03 -v
Grading Criterion Points
TA Test cases +15
Organization, coding style, commenting, etc. +3
Code compiles without errors and runs +2
Total +20

Extra credit is worth 3 points.

Date Due

Feb 9th (02/09/2019) 11:59 pm.

  • cs42200/spring19/labs/lab03.txt
  • Last modified: 2019/01/30 10:14
  • by arastega