cs50300:fall16:lab7 [Computer Science Courses]

MIDWAY SUBMISSION DUE: Tuesday, November 22nd by 11:59 PM

FINAL SUBMISSION DUE: Friday, December 9th by 11:59 PM

By the end of this lab students will be able to:

Understand how to dynamically load programs and libraries into memory.
Understand how to alter the location for relocatable variables.
Create new system calls to manipulate shared libraries.

Today operating systems such as Linux and Windows support the execution of programs from a backing storage. The typical helloworld program when compiled by gcc creates an executable that can be executed from a shell (the default name generated by gcc is typically a.out). However in XINU, all processes are created and executed using a function pointer. There is currently no way to execute a program that has been compiled outside of XINU. Your job is to provide a mechanism and interface to allow for the execution of dynamic code in XINU. You will do this in two parts:

Part 1: The loading and execution of a program that is stored outside of the XINU executable.
Part 2: The dynamic loading of libraries that contain functions that can be called from within XINU.

File Storage

Executable and library files will be stored in XINUs remote file system (RFS). The RFS consists of two pieces:

The rfserver - a network server that runs on a XINU frontend (e.g. xinu19.cs.purdue.edu)
The RFS client within XINU - runs within XINU on the backend

The rfserver, which runs on your frontend work station, serves files to the XINU instance running on the backend. XINU connects to the rfserver through a UDP port, and sends/receives messages to perform file operations on the file system where the rfserver is running. For this lab, all you will need to use the RFS to read the binary images for the programs and the libraries.

The source code for the rfserver is located inside the tarball for the lab (see below) in the rfserver directory. You will be provided with a UDP port number specifically for you to use for the RFS server via email. You will need to change the macro RF_SERVER_PORT in the file include/rfilesys.h.

To allow XINU to connect to the rfserver, the IP address for the server needs to be set in include/rfilesys.h. The value of RF_SERVER_IP must be changed to match the IP address of your frontend workstation. For example, if your frontend workstation is xinu19.cs.purdue.edu then the RF_SERVER_IP value in include/rfilesys.h needs to be changed to 128.10.136.69.

Running make in the directory rfserver generates the executable “rfserver” for the RFS server and places it in the programs directory. You will need to run this code before you start your XINU image on the backend, like so:

$ ./rfserver

NOTE: Every time you change include/rfilesys.h make sure you run a make clean and then make prior to running your xinu image on a backend.

To open a file in the remote file system, you will use the open system call specifying the name of the file that you wish to open and RFILESYS as the device name (this tells XINU that you are opening a new file in the remote file system). For example:

int fd = open(RFILESYS, "helloworld", "or");

This opens the file helloworld, located in the remote file system, for reading (“r” tells XINU to open for reading, “o” tells XINU to ensure that file exists, if it does not exist SYSERR is returned). The resulting file descriptor is stored in fd. If there is a problem opening the file (timeout connecting to the rfserver, etc.), then fd will contain the value SYSERR.

NOTE: If you do not use “o” in file open mode, a new file will be created.

NOTE: the remote file server creates a file system that is relative to the directory where the rfserver is executed. For example, if your current working directory on your frontend workstation is xinu-fall2016-lab7/programs and you run the rfserver command, then the root directory that XINU will see on the open system call will be xinu-fall2016-lab7/programs on the frontend workstation.

NOTE: do NOT use a beginning slash '/' in your file, or '..' anywhere in file path. Doing so will cause open to fail. The RFS does not use a root directory with '/'. The remote file server will always look for your path in the directory starting with the current working directory when rfserver was executed.

Assuming the file opens successfully, you can later read the file with the read system call, specifying the file descriptor, a read buffer, and read size. For example:

char read_buffer[300];
int rc = read(fd, read_buffer, sizeof(read_buffer));

This will return the number of bytes actually read from the file or SYSERR if an error occurred. You can request the size of the file in bytes by using the control function like so:

int32 filesize = control(RFILESYS, RFS_CTL_SIZE, fd, 0);

Here fd is the file descriptor returned by open(..)

HINT: To read the entire file, you can first find out the size of the file, allocate enough memory to hold the entire file and then read the entire file.

You can close an opened file using the close() system call like so:

close(fd);

Here, fd is the file discriptor returned by open(..)

File Format

Compiled files will be created by gcc in the executable and linkable format (ELF). For part 1 (loading of an executable) the file will contain a single function with the symbol name “main”. For part 2, the file may contain one or more functions of different names.

As part of this project you will have to familiarize yourself with the ELF format and be able to parse the headers and various sections of the ELF file.

Things to consider in the ELF file:

A loaded program can make calls to existing XINU functions (e.g kprintf). When the compiler compiles the program, it does not know the address of kprintf. The compiler builds relocation entries for the symbols that it cannot resolve at compile time. These functions are present in the XINU text section at runtime. Your job is to read the relocation entries in the ELF file and fix up the relocation offsets as part of the load_program and load_library system calls. For more information on relocation, see the ELF specification (links provided in references section).
When a program is compiled resulting in an ELF file, the file contains a section called “.symtab” (symbol table). The symbol table contains the names and addresses of all symbols in the executable. As you know, XINU is also compiled into an ELF file (xinu.elf). At the time of resolving relocation entries in the loaded program, you will need the addresses of XINU functions. Your task is to read the xinu.elf file (using RFS), specifically the symbol table in the xinu.elf file. You only need to do this once during a run.
When you run make from the compile directory to compile XINU, the resulting xinu.elf file is also placed in the programs directory. You will open this xinu.elf file using RFS, to read the XINU symbol table.

Part 1: Running a program executable in XINU

Recall the format for the create system call:

pid32	create(
	  void		*funcaddr,	/* Address of the function	*/
	  uint32	ssize,		/* Stack size in words		*/
	  pri16		priority,	/* Process priority > 0		*/
	  char		*name,		/* Name (for debugging)		*/
	  uint32	nargs,		/* Number of args that follow	*/
	  ...
	)

NOTE: the create system call takes as the first parameter the pointer to the function that the new process will execute. In part 1, you need to create an interface to allow a program to be loaded into XINU's memory from a program located in a file system. To do this you will be required to write a new system all with the following format:

void* load_program(char* path);

This system call takes as a parameter the path name to a program and returns a function pointer to the program that has been loaded into memory. If the operating system could not load the program (e.g. the file doesn't exist, there is no more memory to load the program into, etc.) then the function returns SYSERR cast to a void pointer:

(void*)SYSERR

As previously mentioned the program to be loaded must contain only a single function with the symbol named “main”. If more than one function exists in the file, then return (void*)SYSERR.

NOTE: the name “main” for the symbol of the function is only used to find the location of the program entry point. Once the load_program system call returns, the pointer to the function will be used.

Once the program is loaded, the program can then be executed by using the create system call. For example, consider the following “helloworld” program:

#include <xinu.h>
	
int32 main(int32 argc, char* argv[])
{
	kprintf("Hello World\n");
	return 0;
}

Using the new system calls, this program should be able to be executed with the following:

void* helloworld = load_program("helloworld");
resume(create(helloworld, 4096, 20, "helloworld", 0, NULL));

Part 2: Loading a library into XINU

For part 2, you will load a dynamic library (i.e. a file that contains multiple functions), and will create a mechanism that allows an existing XINU function to call any of the functions in the dynamic library. As an example, suppose a library file contains functions x, y, and z. Once the library has been loaded, a XINU process can:

Ask for the address of x and then
Use the address to call function x

To make the example concrete, suppose you wanted to create two utility functions in XINU and dynamically load them for programs to use:

int add1(int val) 
{
	return val + 1;
}

int add2(int val)
{
	retun val + 2;
}

You created these functions in a file myadd.c and compiled the file to myadd. You later want to load these functions into XINU and call them. To do that you will need to provide two new system calls:

syscall load_library(char* path);
void* find_library_function(char* name);

The first system call, load_library, is similar to load_program as it loads the image from the remote file system and performs the fix up for relocatable variables. However, it does not return a function pointer. Instead it stores the pointers to all the functions within the file in a table. Another system call, find_library_function, can then be called to find the function pointer for a dynamically loaded function by name.

For the myadd example:

int myvalue = 2;

/* Load the library */
if(load_library("myadd") == SYSERR) {
	return;
}

/* Find the add1 function */
int32 (*add1)(int32) = find_library_function("add1");
if((int32)add1 == SYSERR) {
	return;
}

/* Call the function */
add1(myvalue);

If the load_library function cannot load the library for any reason, it returns SYSERR. Similarly, if the find_library_function cannot successfully retrieve the function pointer for the given name, then it will return (void*)SYSERR.

For this lab:

You will have to be able to support a maximum of 3 loaded libraries, each with a maximum of 10 functions each (30 functions total). If a library contains more than 10 functions then return SYSERR.
A library can only be loaded once, so if a process attempts to load the same library twice then return SYSERR.
A function loaded from a library can only exist once in XINU so if two libraries both define the same function name then return SYSERR when the second library is loaded.

In /homes/cs503/xinu there is a file called xinu-fall2016-lab7.tar.gz that contains a start to the code. Unpack:

tar zxvf /u/u3/cs503/xinu/xinu-fall2016-lab7.tar.gz

This will create a directory called xinu-fall2016-lab7.

Along with the main code for XINU, this tarball contains the following files (additional explanation of the contents of the files is in the following sections).

system/load_program.c - function declaration for the load_program system call.
system/load_library.c - function declaration for the load_library system call.
system/find_library_function.c - function declaration for the find_library_function call.
rfserver/* - The source code for the rfserver (see above)
programs/* - The directory for your programs with an example hello world and Makefile
include/prototypes.h - modified prototypes.h file which includes the load_program, load_library, and find_library_function system calls.

First things to be done when you untar the lab tarball:

Change the macro RF_SERVER_PORT in file include/rfilesys.h to the port that is assigned specifically to you.
Change the macro RF_SERVER_IP to the IP address of the XINU frontend on which you will run the RFS server.
Go into directory rfserver and run make. This will place the executable “rfserver” in the directory programs. You will have to do this step every time you change the RF_SERVER_IP.

How to run the RFS server, cd into directory programs and run the following:

$ ./rfserver

NOTE: Whenever you make any changes to the programs or to XINU, you will have to restart the RFS server.

In the shell directory you will find a modified version of the XINU shell which has been expanded to use dynamically loadable commands. As part of this lab, you will be required to implement the following two new commands for the XINU shell. Place these code for the new commands in the programs directory.

semdump - this command takes no parameters and prints the contents of the semaphore table to the user
ls - list the contents of a remote file system directory

Provide the implementation for the new shell commands as loadable programs from programs directory. The details for the commands are as follows:

semdump - this command takes no parameters and prints the contents of the semaphore table to the user. You must provide a readable table format for the output including a table header and the contents of each semaphore table entry. See include/semaphore.h for a description of the contents of the semaphore table. For example:

xsh $ semdump
Entry	State	Count	Queue
0	S_USED	3	3
1	S_FREE	0	0

ls - list the contents of a remote file system directory. This command takes zero or one parameter. If a parameter specifying a directory is given, the contents of that directory within the remote file system are printed to the user or an error is printed indicating that the directory does not exist. For each entry within a directory that is also a directory put a '/' at the end. If no parameter is specified, the contents of the remote file system root directory are printed. For example:

xsh $ ls mydir
mydir1/
mydir2/
myfile1
myfile2

Note: A directory is a special file in Linux, which maintains a list of files in the directory. For this command, you will need to open a directory as a file using RFS. The mode must be “ro” when opening a directory as a file. For example: you can open the directory in which the RFS server is running using the open system call like so:

int32 dirfd = open(RFILESYS, ".", "ro");

Each read performed on an opened directory will return the next entry in the directory which will be in the following format (defined in include/rfilesys.h):

struct rfdirent {
	byte	d_type;
	byte	d_name[256];
};

The d_type field can have the following two values (defined in include/rfilesys.h)

#define RF_DIRENT_FILE 1
#define RF_DIRENT_DIR  2

You can use this value to differentiate between a file and a directory.

A read on a directory will look like this:

struct rfdirent rfdentry;

rc = read(dirfd, (char *)&rfdentry, sizeof(struct rfdirent));
if(rc == SYSERR) { /* error accured while reading */

} 
else if(rc == 0) { /* Reached end of list */

}
else { /* Handle next entry in the directory */

}

You will have to make multiple calls to read() to get all the entries in the directory. When you reach the end of the list, read() will return a zero.

NOTE: Do not forget to close the opened directory once you are done with all the reading.

Make sure to remove all debug output from your system calls. When the TAs run your submitted code, calling a system call directly should not produce any output.
Provide a set of test cases to ensure that your code works as required. Put these test cases in main.c
- Make sure your test cases not only test the new functionality of the new system calls, but that existing system calls still function correctly.
- Don't forget to include test cases for the extra credit (see below) if you choose to implement the extra credit requirements. No extra credit will be awarded if test cases are not included.
The TAs will be replacing main.c with their own test cases after running your submitted test cases. Make sure you do not define any dependent variables in main.c. You are free to modify any other file(s) to implement the lab requirements.
- Make sure that there are no dependent declarations in main.c.
If your submitted code does not compile (either the exact submitted code or the code after the TA's replace any test case files), you will receive zero (0) points for code execution. If this happens, you will be allowed to resubmit for half credit only.
Please run “make clean” prior to submission so that you don't submit object files
NOTE: When you make xinu for this lab the make file will generate two files in the compile directory:
- xinu - this is the file you will download to the xinu backend
- xinu.elf - this is the executable and linkable format version of the xinu binary. This is not the format to use when sending to the backends. However, you will need to use it to find the addresses of symbols and to perform relocation.

On Intel x86 there is a structure called the global descriptor table which defines characteristics for regions of memory. One of those characteristics is which segment of memory code resides. If the CPU attempts to fetch an instruction from a memory segment that is not designated as code, then a general protection violation is signaled. Normally, Xinu only has the .text section of memory designated in its code segment. This makes sense since normally only code (.text) should be executed and not data from the stack or getmem, etc. However for lab 7, you need to be able to execute code that is located beyond the .text section of Xinu since you're loading programs into memory allocated with getmem. To allow this, you will need to extend the code segment in the global descriptor table. In the function setsegs() in system/meminit.c the following needs to be changed:

/*------------------------------------------------------------------------
 * setsegs  -  Initialize the global segment table
 *------------------------------------------------------------------------
 */
void	setsegs()
{
	extern int	etext;
	struct sd	*psd;
	uint32		np, ds_end;

	ds_end = 0xffffffff/PAGE_SIZE; /* End page number of Data segment */

	psd = &gdt_copy[1];	/* Kernel code segment: identity map from address
				   0 to etext */

	/* Change the following line which sets np */
	//np = ((int)&etext - 0 + PAGE_SIZE-1) / PAGE_SIZE;	/* Number of code pages */
	np = ds_end;						/* End of memory */  

	psd->sd_lolimit = np;
	psd->sd_hilim_fl = FLAGS_SETTINGS | ((np >> 16) & 0xff);

	psd = &gdt_copy[2];	/* Kernel data segment */
	psd->sd_lolimit = ds_end;
	psd->sd_hilim_fl = FLAGS_SETTINGS | ((ds_end >> 16) & 0xff);

	psd = &gdt_copy[3];	/* Kernel stack segment */
	psd->sd_lolimit = ds_end;
	psd->sd_hilim_fl = FLAGS_SETTINGS | ((ds_end >> 16) & 0xff);

	memcpy(gdt, gdt_copy, sizeof(gdt_copy));
}

Unloading a library

Since you can only load at most 3 libraries at any given time, there is a limit to the number of dynamically loaded functions that a program can use. For extra credit, implement a system call that allows a process to unload a library so that it can later load the same or a different library using load_library. To do this implement the following system call:

syscall unload_library(char* path);

This system call unloads the library specified by the path name. Any functions that are contained in the library will be no longer usable (find_library_function will return SYSERR for those functions). You may assume that any processes using functions from the library to be unloaded have coordinated to ensure that no process is using the library while it is being unloaded. If the library to be unloaded has not been loaded, then return SYSERR.

In your report, consider what might happen if a process asks for a library to be unloaded while another process is using the library. What mechanism could you add to prevent such a problem? (Hint: consider a reference count).

Loading a library archive

In operating systems such as Unix and Linux, libraries can be created and loaded from groups of files instead of just one. Statically linked libraries (also called archive libraries) can be created with the Unix ar command (see man ar for more information on the ar command). When used, the compiler statically links the symbols that your program uses and makes the code part of your program. You can see this by creating a helloworld program and compiling with gcc -static. The resulting binary will be much larger than if you don't use -static. Dynamically linked libraries (also called shared object libraries) can be created using gcc with the -shared flag. When used, the compiler just checks for the existence of the symbol at compile time and the library itself is loaded by the operating system automatically at run time.

As extra credit, expand the load_library system call to allow it to load archive libraries into the XINU address space. To do this you will need to review the standard for the .a archive library format created by the Unix ar command. This file consists of a global header and a file header for each file within the archive.

NOTE: the GNU variant of ar archives also contains a symbol table in the list of files. You will need to be able to support this.

NOTE: a single .a file even though it contains multiple files within it, is still considered (for the purposes of load_library) to be a single file. You will have to be able to support the loading of up to three .a files that each consist of multiple object files.

NOTE: For the purposes of this lab, a single .a file will still contain only a maximum of 10 functions within the files (as is stated in the load_library description above. For example, a single .a file might contain the following 2 object files:

myadd - containing the add1 and add2 functions
mysub - containing sub1 and sub2 which behave the same as the add1 and add2 yet subtract instead of add

This is to be considered a single library file containing 4 functions.

To create your own .a for testing use the following:

ar rcs LIBRARY_NAME FILE1 FILE2 ...

Where: LIBRARY_NAME represents the library file to be created FILE1, FILE2, etc. are the object files

A more concrete example:

ar rcs mymath.a myadd mysub

Creates a single archive library called mymath.a with the two object files myadd and mysub. This file can then be loaded into XINU using the system call:

load_library("mymath.a");

More information on the Unix ar command and the file format can be found here and here.

There are a couple commands available on the xinu lab frontend machines that you might find useful for this lab:

readelf - Displays information about ELF files
objdump - Display information from object files

For example, you can use the readelf command to print out the symbols in a xinu.elf file:

$ readelf -a xinu.elf

Midway Submission

You will be required to perform a midway submission (see due date above). In your midway submission include all of your source code for XINU, any programs and/or libraries you have written, and a midway progress report in PDF format in the system directory called lab7_progress.pdf containing the following:

What have you been able to implement and get working up until now?
What do you plan on implementing before the final due date?
What problems have you run into while preforming your implementation? What did you do to solve those problems?

To turn in your lab for midway submission, use the following command:

turnin -c cs503 -p lab7_mid xinu-fall2016-lab7

assuming xinu-fall2016-lab7 is the name of the directory containing your code.

If you wish to, you can verify your submission by typing the following command:

turnin -v -c cs503 -p lab7_mid

Do not forget the -v above, as otherwise your earlier submission will be erased (it is overwritten by a blank submission).

Note that resubmitting overwrites any earlier submission and erases any record of the date/time of any such earlier submission.

We will check that the submission time stamp is before the due date; Any submission past the due date will be deducted the appropriate number of grace days. If submission is beyond your remaining number of grace days, your work will not be accepted.

Final Submission

Submit using turnin command your complete source code (all of XINU) including the any files you added to complete the lab. In the system directory include a PDF file called lab7_analysis.pdf with a report discussing:

The details behind your implementation. As part of this discussion write answers to the following questions:
- The separation between a Xinu ELF file and the running image leads to a potential problem: if the ELF file is changed (Xinu sources are recompiled) after an image starts to run, symbol table addresses in the ELF file may no longer match the locations of items in the running image.
- How can you ensure the ELF file read at run time matches the image that is executing?
- What is the most difficult aspect of the project? Why?

To turn in your lab use the following command

turnin -c cs503 -p lab7_final xinu-fall2016-lab7

assuming xinu-fall2016-lab7 is the name of the directory containing your code.

If you wish to, you can verify your submission by typing the following command:

turnin -v -c cs503 -p lab7_final

Do not forget the -v above, as otherwise your earlier submission will be erased (it is overwritten by a blank submission).

Note that resubmitting overwrites any earlier submission and erases any record of the date/time of any such earlier submission.

We will check that the submission time stamp is before the due date; Any submission past the due date will be deducted the appropriate number of grace days. If submission is beyond your remaining number of grace days, your work will not be accepted.

Lab 7 - Dynamic Program and Library Loading

Objectives

Lab Sections

1. Background

File Storage

File Format

Part 1: Running a program executable in XINU

Part 2: Loading a library into XINU

2. Setup

3. Modifications to the XINU Shell

4. Additional Requiremnets

5. Updating the Global Descriptor Table

6. Extra Credit

Unloading a library

Loading a library archive

7. Useful Commands

Lab Submission

What to turn in

Midway Submission

Final Submission

References

Computer Science Courses