Empirical Testing

You have collected loads of data in your testing. You can see that some versions of the program took more time to run and others took less. The question you are left with is, "Did one version of the program take significantly more or less time than another?"

This is not an arbitrary assignment of significance, such as "Your thirtieth birthday is a significant event in your life." Rather, this is a question of statistical significance: "Is this result likely to be due to something other than mere chance?"

In science, we can never be absolutely sure that what we are observing is something other than mere chance but we can be more or less confident. In fact, we can choose what confidence level we want to have for a result. Typically, we choose the 95% confidence level, which is to say that there is a 5% probability that chance alone could have given us the results we see when we choose to call a result significant.

(Note that this is why we so often hear in the popular media that some study has shown something or other causes some medical problem, only to have another study come along later that reportedly shows that there is no connection between the purported cause and the problem. All the scientists involved -- including the authors of both studies -- realize that what they are reporting are likelihoods and might be wrong. However, either because the reporters are trying to "dumb down" the reports for their readers or because the reporters themselves don't understand science, the details about confidence levels and probabilities are left out.)

There are numerous ways to compare samples for statistical significance. The best way to do it for a given study depends on many factors relating to the size of the samples and what is known about them. Rather than turning this class into a statistics or empirical methods course, I'll simply direct you to an appropriate test for this assignment -- the t-test.

You will use the t-test for testing all of your empirical results against the predictions that you made. However, to make sure you aren't just plugging in numbers and getting results that you don't understand at all, I'm going to leave the following choices up to you: Paired or unpaired. One-tailed or two-tailed.

For each computation that you do, you should choose which of those options to use and include in your write-up your reason for that choice. This will probably require you to do a little research in order for you to understand the t-test, unless you are already familiar with the test, of course.

Besides books, there are plenty of web sites out there that explain t-tests. There are also web sites with free access to t-test calculators. You can also use tools such as matlab on the CS UNIX machines for t-test calculations.