Exam 1, Solution

Question 1: Multiprogramming and Multiprocessing (20 points)

Professor Frink has 500,000 large image files that he needs to convert to another format. He has a program called ConvertPro that can convert each file. When he runs it, it will read in an image from disk, do a set of complex computations and transformations on the data, then write out the new image to disk. Prof. Frink tries out this program on a sample image and finds that it takes 5 seconds to run -- 1 second to read in the file, 3 seconds to convert it, and 1 second to write it out again -- on his computer which has a single CPU and a single hard drive, and which uses direct memory access. He does a little math and figures out that, if he runs the program on one image after another in a simple sequential manner, it will take nearly a month to process all 500,000 images. He decides that this is too long.

A. Fortunately, Prof. Frink knows a little about the concept of multiprogramming and realizes that by having his computer convert more than one image file at the same time, his entire library of 500,000 images will be converted sooner. In the best case, what percentage of time could he save using multiprogramming, given the situation described above? Explain your answer.

4 pts.

In the best case, Prof. Frink could save nearly 40% of the time by using multiprogramming. The idea behind multiprogramming is to save time by overlapping the I/O of one process with the CPU use of another. Since the basic pattern of processing these images is two-fifths I/O time to three-fifths CPU time (these processes are CPU-bound) and the best Prof. Frink could hope for is a complete overlap of I/O and CPU times, that would save him the two-fifths for each image -- i.e., 40%. In fact, if Prof. Frink does things right, he should get close to this speed up, because his machine uses direct memory access, so I/O will minimally interfere with computations involving the CPU.

To get full points, you needed to have the 40% computation correct, as well as explaining that what was allowing this was an overlap of CPU and I/O use.

B. Unfortunately, Prof. Frink knows very little about the concept of multiprogramming and decides that the way to proceed is to write a program that runs through a loop quickly and spawns a child process to convert each file. He reasons that since his computer can spawn new processes quickly, his program will run for about 10 seconds while creating all the child processes, then exit, and all of this will add 10 seconds to the best case scenario numbers that you gave above. List and explain one problem with Prof. Frink's approach.

4 pts.

There are several problems with Prof. Frink's approach. To get full points, you only needed to list and explain one of them. Problems include:

It is likely that his program will run into limits if it tries to spawn 500,000 processes at once. Some of these limits include:

Hardware limits (e.g., there may not be sufficient memory to handle 500,000 processes).

Software limits (e.g., the process table in his O/S may not be able to hold 500,000 processes).

Administrative limits (e.g., the system may be configured to limit a single user to fewer than 500,000 processes).

(You would only need to list and explain one limit.)

If his program exits after spawning all of its children, it will not be able to keep track of whether its children succeed or fail. This means he will need to come back later and check all 500,000 files to see if they have been converted or watch for failure messages during the time the conversion is taking place. If his program stayed running it could monitor the progress of each child, retry processing images that failed and/or make a record of success and failure that would be easy for him to use.

There are other problems you could consider as well.

C. Give an alternate multiprogramming approach to Prof. Frink's and explain why yours is better. (Note that you aren't able to change the code for ConvertPro; it will always just read in a single image from disk, do a set of complex computations and transformations on the data, then write out the new image to disk.)

4 pts.

He could have the parent process spawn only a small number of child processes at the beginning, then only spawn new ones as old ones exit. This would allow his program to work within system limits and keep it running to monitor of what its children are doing. If all of his images are roughly the same size, then two children are sufficient to keep the system working at its maximum I/O and CPU overlap although it probably wouldn't hurt to have several more around at the same time.

D. Prof. Frink also knows a little about the concept of multiprocessing and thinks that perhaps adding a second processor to his computer might speed things up. In the best case, without changing anything else in his system, what percentage of time could he save using multiprocessing with two processors, over the amount of time he could already save using multiprogramming (as given in your answer to part A)? Explain your answer.

4 pts.

By using two processors, he could cut the average execution time of the actual image conversion time in half -- e.g., from 3 seconds to 1.5 seconds for the sample image. (Note that this is true whether ConvertPro knows how to access two processors or the O/S does. If ConvertPro is only written to use one processor, the O/S will simply overlap a second copy of ConvertPro running on a second image and using the second processor, with the copy of ConvertPro that is running on one image on the first processor.)

However, while the conversion time will be cut in half, the I/O time will remain unchanged. This means the system is now limited by the two-fifths time taken up by I/O (i.e., the system is now I/O bound), so we are saving three-fifths of the time over simple sequential processing of all files -- i.e., we are saving 60% of the time over that case. However, since we had already saved 40% of the time over that case by using multiprogramming, the addition of the second processor is only saving us 33% more time over simple multiprogramming.

E. Unfortunately, Prof. Frink knows very little about the concept of multiprocessing and thinks that perhaps adding 499,998 more processors to his computer (for a total of 500,000 processors) might speed things up considerably over having 2 processors. If Prof. Frink's computer could handle 500,000 processors, in the best case, without changing anything else in his system, what percentage of time could he save using multiprocessing with 500,000 processors, over the amount of time he could already save using multiprocessing with 2 processors (as given in your answer to part D)? Explain your answer.

4 pts.

With two processors, the system is already I/O bound on this task. Therefore, adding additional processors won't help at all. Prof. Frink will save, at best, an additional 0% by adding the last 499,998 processors to his machine -- a very bad investment.

Question 2: Processes, Parents, and Reporting Back (20 points)

In a POSIX system, when a child process exits, its parent is signaled and the return value from the child is made available to the parent. However, if the parent exits before the child, then the child becomes an orphan.

A. What happens to an orphan process and to what process is its return value made available when it exits?

5 pts.

An orphaned process is adopted by the init process. Its return value is also made available to the init process.

B. Suggest a method for dealing with orphaned processes that is an alternative to the method used by POSIX systems.

5 pts.

(This question has many valid answers, this is only an example.)

The orphaned process could be adopted by its parent's parent process.

(Other answers need to satisfy the following criteria to get full points: 1. They need to do handle orphans in a way that POSIX does not. 2. This method must handle normal and abnormal termination of the parent.)

C. Give one advantage of your suggested method (from part B) over the method used in POSIX systems.

5 pts.

(This question is based on the previous question and has many correct answers.)

The parent of the parent process may know more about the meaning of the exit status of the orphan than the init process.

D. Give one advantage of the method used in POSIX systems over your suggested method (from part B).

5 pts.

This solution would increase the difficulty of programming, because the program would have to be able to handle the orphaned processes. This would be a substantial increase in work load. (Other reasonable answers accepted, including a simple "Implementing this would mean your system would no longer be POSIX compliant.")

Question 3: fork and exec (20 points)

A. Given the code fragment below, how many processes will be created by executing this code and how will they be related to one another? Explain your answer. (Be sure to cover the cases where some or all of the system calls fail, as well as the case where all of them succeed.)

pid_t childpid;

for (i = 0; i < 10; i++){
    childpid = fork();

    if (0 == childpid){  /* This is the line that is different in parts A and B. */
        execlp("some_prog", "some_prog", (char *)0);
        fprintf(stderr, "Error with exec.\n");
        exit(2);
    }
    else if (childpid < 0){
        fprintf(stderr, "Error with fork.\n");
        exit(1);
    }
}

10 pts.

This code would create 10 child processes which would all have the same parent (the original program).

The child processes will call execlp. They will either run some_prog normally or will print an error message to stderr and exit.

If any of the calls to fork fail, the parent program will exit. This means that the parent program will terminate after making 0 to 10 children, depending on which call to fork fails, if any.

In the case when all of these calls succeed, we will have one parent with ten children that will call execlp.

(A diagram can be used in exchange for a description of the relationship.)

B. Given the code fragment below, how many processes will be created by executing this code and how will they be related to one another? Explain your answer. (Be sure to cover the cases where some or all of the system calls fail, as well as the case where all of them succeed.)

pid_t childpid;

for (i = 0; i < 10; i++){
    childpid = fork();

    if (childpid > 0){  /* This is the line that is different in parts A and B. */
        execlp("some_prog", "some_prog", (char *)0);
        fprintf(stderr, "Error with exec.\n");
        exit(2);
    }
    else if (childpid < 0){
        fprintf(stderr, "Error with fork.\n");
        exit(1);
    }
}

10 pts.

If all the calls are successful, this code would create 10 child processes. The parent will create a child and then enter the execlp section of the code. The resulting child will go through the loop again and create a child itself. The relationship will look like this:

Parent-->Child-->Child-->Child-->....and so on.

If the execlp fails, an error message would be printed to stderr, otherwise it will run some_prog successfully. Whether this succeeds or fails will not affect the other processes.

If any of the calls to fork fail, the chain of processes will stop at that point. The process that called fork unsuccessfully will exist without ever having created a child.

Question 4: wait and waitpid (10 points)

A. How can wait and waitpid be used for process synchronization? Explain in general and give one specific example.

5 pts.

A process may use wait or waitpid to synchronize its operation with that of a child (as long as the WNOHANG option is not specified for waitpid). To do this, the parent process simply calls wait or waitpid which causes it to be moved into the waiting (a.k.a., blocked) state until the child exits. Once the child has exited, the parent is moved into the ready state and can resume processing, knowing that the child is no longer running.

There are many examples of this, including ones that we covered in class (e.g., an email client that spawns off a text editor to create the message, then waits for the text editor to exit before asking the user what to do next), in your second project (e.g., PicShare waiting for gimp to create the image), and in the textbooks (e.g., a shell spawning off a user-specified process and waiting for it to exit before displaying the command prompt again).

(Some students gave code examples. I said in class that I wouldn't require you to write code on the exam but if you chose to do so on this question, I didn't deduct points.)

B. Besides process synchronization, what is the other primary use for the wait and waitpid system calls? How are they used for this? Explain in general and give one specific example.

5 pts.

The other primary use for wait and waitpid is to get back exit status information from child processes that have exited.

Again, there are many examples that could be given. Here's one: If the email client mentioned above checked to see if the editor was successful, it could decide what to do next based on the outcome -- e.g., offer to send the message if the editor was successful or offer to run a different editor if the first one was unsuccessful.

Some students said that the other primary use for the wait and waitpid system calls is to clean up processes after they have called exit so that there aren't a lot of zombie processes on the system taking up resources.

On the one hand, this is true in a sense. If your code spawns off a lot of processes that complete and call exit, but your code never waits for them, it will be creating zombies that are taking up system resources. If you wait for them, the zombies will be removed and the resources freed. For this reason, this answer was accepted on the exam.

On the other hand, this is kind of a backward way to think about things. The reason the system keeps the processes around after they have called exit -- that is, the reason the system turns them into zombies -- is to allow your code to wait for them and get back their exit status information. If wait and waitpid (or some other system call) weren't designed for getting back exit status information, then the O/S wouldn't be designed to create zombies.

Question 5: Scheduling (20 points)

A. What is the biggest obstacle to implementing a Shortest Job Next scheduling algorithm on a general purpose computer? Explain your answer.

5 pts.

The biggest obstacle to implementing a Shortest Job Next (SJN) scheduling algorithm on a general purpose computer is that we generally have no way of knowing in advance how long a job will take to run. Without this, we can't say which job is shortest. Note that things like the length of the code may bear little or no relation to the amount of time it will take to run. Only on special purpose computers are job times likely to be known in advance.

Some students answered that process starvation was the biggest obstacle to implementing a SJN scheduling algorithm on a general purpose computer. However, while process starvation can occur on a system that uses SJN, this is merely a disadvantage to using this method, not an obstacle to implementing it. If the job times are known, it is very easy to write the code to implement this algorithm.

B. Which of the following scheduling algorithms are subject to process starvation? Explain your answers.

Many of these questions could be answered either way, depending on implementation details. (For example, on one hand, Priority Scheduling using Multi-Level Queues could lead to starvation if each process is assigned to its queue and never moved and each queue must be emptied before the next gets any CPU time. On the other hand, if a process is moved up from one queue to a higher level one after some time has passed or the processor is time-multiplexed between all queues, then process starvation cannot occur.) For this reason, the explanation was worth more than an answer of "subject to starvation" or "not subject to starvation."

i. First Come, First Served.

3 pts.

As long as no process fails to properly exit, the O/S will simply move on to the next process (in the order in which the processes were started) when the current one is completed and each of them will get its turn running.

ii. Shortest Job Next.

3 pts.

If the scheduler allows new jobs to be added to the process queue, long jobs may never get any service as shorter ones may keep getting added in front of the longer ones.

iii. Earliest Deadline First Scheduling.

3 pts.

If every job has a deadline, then if it is admitted to the queue it will not starve because, as its deadline approaches, it will eventually be the process with the earliest deadline and will get service. (This may mean that some processes never get admitted to the queue. They have not, technically, starved -- they have simply never been fully instantiated.)

iv. Round Robin.

3 pts.

No starvation is possible with round robin, as one process will get a time slice (then be moved to the back of the queue if it has not completed or been moved to the waiting state), then the next process will get a time slice, and the next, and so forth, until all have had a turn.

v. Priority Scheduling using Multi-Level Queues.

3 pts.

See the general statement above.