Project 2 -- fork, exec, and wait

Due Tuesday, October 21

(Note that due date is later than originally listed in the class schedule.)

NOTE: This assignment, like the other projects in this class, is due at the beginning of the class period. This means that if you are even a minute late, you lose 20%. If you are worried about potentially being late, turn in your homework ahead of time. Do this by submitting them electronically then giving the hard copy to me or the TA during office hours or by sliding it under my office door within twenty-four hours after the time it is due. Do not send assignments to me through email or leave them in my departmental mail box.

As discussed in class, part of the UNIX program design philosophy is not to create each application from scratch but, rather, to build new applications by utilizing existing programs. One way to tie these programs together is to spawn off child processes using fork and run the programs using exec.

One application that people might find useful is one that assists them in maintaining the links on their web pages. This program could be built from several existing programs that allow us to download information from the web, display it, and compare files. The new code would tie these existing programs together in a new and useful way, with a minimum of new code required.



The Assignment

For this assignment, we'll implement a simple link checking program that we'll call LinkCheck. Users will run LinkCheck in one of two modes: (1) initialize and (2) check.


For initialize, a user will type:

    LinkCheck -i <URL>
where <URL> is the Universal Resource Locator for an html web page that the user wishes to maintain.

When started in initialize mode, LinkCheck will download the specified web page and save it locally. It will then download all web pages directly linked to from the specified page and save them locally as well.

If the web page specified cannot be downloaded for one reason or another (does not exist, no permission to access, etc.), LinkCheck will display an appropriate error message for the user. Similarly, if any of the web pages directly linked to from the specified page cannot be downloaded for one reason or another, LinkCheck will display an appropriate error message for the user.

When the last pages have been downloaded, LinkCheck will display the message:

    Thank you for initializing LinkCheck on <URL>
where <URL> is the Universal Resource Locator specified by the user.


For check, a user will type:

    LinkCheck -c <URL>
where <URL> is the Universal Resource Locator for an html web page that the user wishes to maintain.

When started in check mode, LinkCheck will download the specified web page and save it locally. It will then compare this copy of the web page with the one downloaded when LinkCheck was initialized for this page. If there are differences, LinkCheck will display the message:

    Maintained page has changed.  
    Display (o)riginal, (c)urrent, (b)oth, (n)either:

If the user hits 'o' (or 'O') followed by <return>, LinkCheck will display the original copy of this web page to the user.

If the user hits 'c' (or 'C') followed by <return>, LinkCheck will display the newly downloaded copy of this web page to the user.

If the user hits 'b' (or 'B') followed by <return>, LinkCheck will simultaneously display to the user both the original copy of this web page and the newly downloaded copy of this web page.

If the user hits 'n' (or 'N') followed by <return>, LinkCheck will go on without displaying either of these web pages to the user.

If the user hits any other key or combination of keys before hitting return, LinkCheck will loop back and repeat the message above.

If the user elects to view one or both pages, LinkCheck will wait until any process used to display these pages terminates, then go on.

After determining that the original and new copies of the page the user wishes to maintain are the same or allowing the user the chance to view these pages if they are different, LinkCheck will download all pages directly linked to from the page the user wishes to maintain and compare these copies to the originals of these pages. In case of any differences, LinkCheck will display the message:

    Page linked to has changed.
    Display (o)riginal, (c)urrent, (b)oth, (n)either:

LinkCheck will respond to user input as previously described.

When the last pages have been compared (and, if appropriate, displayed), LinkCheck will display the message:

    Thank you for checking <URL> with LinkCheck.
where <URL> is the Universal Resource Locator specified by the user.




Notes on this assignment

Some parts of the assignment above are vague or incomplete. This is intentional. This is to give you experience with the way software development is done in "the real world" (industry, academia outside the classroom, government labs, etc.). Often you will get problem descriptions that are vague or incomplete, even if they seem concrete and complete initially.

You should read through the assignment carefully, look for ambiguities or cases not covered, then ask about them, either in office hours or through email, as soon as you can. (Recall that those people who come to office hours get priority over those who call during office hours who, in turn, get priority over those who send email.) If you wait until the last minute to ask about ambiguities or missing cases, you may not get an answer before your project is due. If you don't get an answer in time (or at all, if you don't ask), then you will have to guess how to handle certain situations. If you guess wrong, you will lose points.

To run other programs from within LinkCheck, you must use fork, exec, and (if appropriate) wait or waitpid, not system.

For downloading files, I suggest you consider using wget. For checking for differences, I suggest you consider using diff. For displaying files, I suggest you consider using netscape.

You can assume that all links will be absolute, rather than relative. That is, you don't have to worry about your program working correctly with a base URL specified at one place in the file and the URL relative to the base specified elsewhere. All URLs will be complete.



What to turn in.

You will turn in both a hard copy and an electronic copy of your assignment. Please follow the instructions on how to send electronic copies. Do not send them to my email address.

Both the hard copy and the electronic copy will contain a write-up and all source code you used in this project. The electronic copy will also contain the executable version of your code. The electronic copy of your write-up should not be in a proprietary format (such as MS Word); it should be either in plain ASCII text or in a portable format (such as Postscript or PDF). Your source code should be in a single file called LinkCheck.c and your executable code should be called LinkCheck.

Your source code should be well structured and well commented. It should conform to good coding standards (e.g., no memory leaks).

Your write-up will include 1/2 to 1 page (roughly 80 characters per line, 50 lines per page) explaining the data structures and algorithms used in your code. This page limitation does not include figures used in your explanation, which are encouraged and may take up any amount of space. (This explanation does not remove the requirement that your code be well commented.)



Other

You may write your program from scratch or may start from programs for which the source code is freely available on the web or through other sources (such as friends or student organizations). If you do not start from scratch, you must give a complete and accurate accounting of where all of your code came from and indicate which parts are original or changed, and which you got from which other source. Failure to give credit where credit is due is academic fraud and will be dealt with accordingly.

As noted in the syllabus, you are required to work on this programming assignment in a group of at least two people. It is your responsibility to find other group members and work with them. The group should turn in only one (1) hard copy and one (1) electronic copy of the assignment. Both the electronic and hard copies should contain the names and student ID numbers of all group members. If your group composition changes during the course of working on this assignment (for example, a group of five splits into a group of two and a separate group of three), this must be clearly indicated in your write-up, including the names and student ID numbers of everyone involved.

Each group member is required to contribute equally to each project, as far as is possible. You must thoroughly document which group members were involved in each part of the project. For example, if you have three functions in your program and one function was written by group member one, the second was written by group member two, and the third was written jointly and equally by group members three and four, both your write-up and the comments in your code must clearly indicate this division of labor.