Project 1 -- I/O and Devices

Electronic Copy Due Tuesday Thursday, October 6, 1:00 pm

Hard Copy Due Tuesday Thursday, October 6, 1:30 pm

NOTE: This assignment, like the other projects in this class, is due at a particular time, listed above. This means that if you are even a minute late, you lose 20%. If you are worried about potentially being late, turn in your project ahead of time. Do this by submitting it electronically before it is due and giving the hard copy to me during office hours or by sliding it under my office door before it is due. Do not send assignments to my personal email address. Do not leave hard copies in my departmental mail box or attempt to give them to departmental staff (who cannot and will not accept them).

As discussed in class, disk I/O scheduling can have a dramatic impact on system performance. In particular, various disk scheduling algorithms can affect performance metrics including average response time, maximum response time, response time variance, and total system throughput. We would like to minimize the values related to response time while maximizing throughput.

The Assignment

You are to write a program and associated testing protocols to compare the performance of several disk scheduling algorithms we have covered in class. You will follow the testing protocols you devise and report the results you find.

Part one of the assignment is to write a program called schedule to calculate approximate performance values for each disk scheduling algorithm on a given set of I/O requests.
Scheduling Algorithms
The scheduling algorithms to be covered by schedule are:
- FCFS
- SSTF
- SCAN
- LOOK
- C-SCAN
- C-LOOK
- FSCAN
- N-step SCAN
For all algorithms, schedule should assume that the disk head starts on Track 0, Sector 0 when the first request comes in.
Invocation

schedule is invoked as follows:
schedule [OPTION]… [INFILE]… [OUTFILE]…
schedule reads from standard in and writes to standard out if input and/or output files are not specified. Recognized option flags are:

-t, --tracks=NUM

the disk has NUM tracks; default is 1024

-s, --sectors=NUM

the disk has NUM sectors; default is 512

-p, --platters=NUM

the disk has NUM platters; default is 8

-R, --rotational-delay=NUM

the maximum rotational delay of the disk; default is 4 ms

-S, --seek-time=NUM

the maximum seek time of the disk; default is 16 ms

-D, --data-transfer-time=NUM

the time to transfer one sector’s worth of data (i.e., to read or write one sector); default is rotational delay divided by number of sectors. If a value for data transfer time is specified that is less than the default value, the user is notified that the parameter value is invalid and schedule exits normally with a return value of 1.

-a, --algorithm=ALG

the disk scheduling algorithm to use is ALG (recognized values are FCFS, SSTF, SCAN, LOOK, C-SCAN, C-LOOK, FSCAN, N-step); default is FCFS

-N, --n-step=NUM

the size of N for N-step SCAN; default is 16

Input

The format for input data is
REQUEST₁; REQUEST₂;… REQUEST_n.
where REQUEST_i is a quadruple of the form
T, t, s, p
where T is the time at which the request is received (a positive integer or zero; these values should be monotonically increasing), t is the requested track, s is the requested sector, and p is the requested platter. If the requested track, sector, and/or platter is beyond the capacity of the drive as specified on program invocation, schedule should send an error message to standard out to notify the user of this fact and exit normally with a return value of 2. Note that the period/full stop character (.) at the end of the input line above is the indicator to your program that the data is complete and the performance values should be calculated and written out.

Output
The format for output data is
algorithm: ALG response time: MAX (max), AVE (ave), VAR (var) throughput: THRU
where ALG is the algorithm specified, MAX is the maximum response time given the scheduling algorithm and the data set, AVE is the average response time, VAR is the variance in response time, and THRU is the throughput which is simply the average number of requests satisfied per time unit.
For calculating output values, you may assume that the rotational delay is simply a linear function of the number of sectors between the current sector under the read head and the sector to be read with the range [0, R], where R is the maximum rotational delay of the disk. In contrast, seek time will be 0 if the track sought is the current track, otherwise it will consist of a fixed cost for start up and settling equal to half the maximum seek time plus a linear function of the number of tracks between the current track at which the arm is positioned and the track to which the arm will be moved with the range [(S/2)/(t-1), S/2], where S is the maximum seek time of the disk and t is the number of tracks on the disk. Note that seek time should be calculated before rotational delay because the disk does not stop rotating while a seek takes places. Therefore, to know which sector is under the read head at the start of the rotational delay, you must first calculate the amount of rotation that takes place during the seek.
Part two of the assignment is to test schedule and analyze what your testing demonstrates.

Test Data

Test data is a moderate sized sequence (dozens or hundreds) of I/O requests to be made to a disk drive. You should create at least three sets of such test data, conforming to the input specification given above. This data should be sufficient to test the performance of the various scheduling algorithms listed above under various conditions, including low, moderate, and high levels of proximity for subsequent I/O requests.

Test Configurations

In addition to the test data itself, you will need to create test configurations for disk drives on which to test your data. Configurations consist of the number of tracks, sectors, and platters on a drive, as well as the rotational delay, seek time, and data transfer time for that disk. You should create the test configurations in conjunction with the test data, so that all requests in the each set of test data are valid on at least one disk configuration.

Results

Once you have created your test data and configuration sets, you should invoke schedule with each scheduling algorithm for each configuration on all of its valid data sets and record the output you received for each. These are your results.

Analysis

Once you have collected your results, you need to compare what you found to performance expected for each algorithm, according to your text and the in-class discussion. This analysis should consider whether characteristics of the test data (such as proximity level) affect performance.

What to Turn In

You will turn in both a hard copy and an electronic copy of your assignment. Electronic copies must be submitted to the appropriate drop box in D2L for the course. Do not send them to my email address.

Both the hard copy and the electronic copy will contain a cover sheet documenting group membership and contributions (see below), your analysis document, all source code you created for schedule and a write-up of 1/2 to 1 page (roughly 80 characters per line, 50 lines per page) explaining the data structures and algorithms used in your code. This page limitation does not include figures used in your explanations, which are encouraged and may take up any amount of space. (The explanations do not remove the requirement that your code be well commented.)

The electronic copy will also contain an executable for schedule which should be called schedule, your test data (with each set in its own clearly labeled text file), your test configurations (each of which should include the configuration in the form of option flags that can be copied and pasted on the command line for schedule to invoke it with that configuration), and your results (given in the output format specified).

Your source code should be well structured and well commented. It should conform to good coding standards (e.g., no memory leaks).

Other

You may write your program from scratch or may start from programs for which the source code is freely available on the web or through other sources (such as friends or student organizations). If you do not start from scratch, you must give a complete and accurate accounting of where all of your code came from and indicate which parts are original or changed, and which you got from which other source. Failure to give credit where credit is due is academic fraud and will be dealt with accordingly.

As noted in the syllabus, you are required to work on this programming assignment in a group of at least two people. It is your responsibility to find other group members and work with them. The group should turn in only one (1) hard copy and one (1) electronic copy of the assignment. Both the electronic and hard copies should contain the names and student ID numbers of all group members. If your group composition changes during the course of working on this assignment (for example, a group of five splits into a group of two and a separate group of three), this must be clearly indicated in your cover sheet (see below), including the names and student ID numbers of everyone involved and details of when the change occurred and who accomplished what before and after the change.

Each group member is required to contribute equally to each project, as far as is possible. Your cover sheet must thoroughly document which group members were involved in each part of the project. For example, if you have three functions in your program and one function was written by group member one, the second was written by group member two, and the third was written jointly and equally by group members three and four, your cover sheet must clearly indicate this division of labor.

Note that all personally identifying information (names, student ID numbers, 4x4s, etc.) must only be included on the cover sheet and nowhere else in the project materials.