These days, I'm a PhD candidate in Computer Science at the University of California at Davis. I started my coursework in the fall of 2000, was officially admitted to the program in the fall of 2001, and finished my coursework in the spring of 2004.
The way the PhD program works in CS at UC Davis is you do your coursework, hopefully pulling off A's for the courses in the four core areas (Theory, Systems, Applications, and Architecture), which thankfully I did, because the alternative is to take the preliminary exam for each area (and I don't test well). You then prepare some preliminary research and a dissertation proposal into a paper, which you present to your qualifying committee, who will grill you over it. And I mean grill you (more on that in a moment). Eventually, they pass you, you do the research you said you would in the proposal, put it together into a dissertation, and have your dissertation committee sign off on it, pass go and collect your PhD.
Notice that I didn't say anything about a dissertation defense. That's right -- no dissertation defense. Instead, you do your defense up front, in the form of your qualifying examination. The qualification examination is as grueling, if not more so, than most dissertation defenses. I think this is great, because it keeps students from traveling down a long path on a dissertation only to have their committees tell them that their research is fundamentally flawed. What's more, it keeps committees from rejecting the research just because they don't like the findings; for example, discovering that an idea is a bad approach to a problem is a valid (and rather common) finding for research, but some committees don't like that and send students back to the drawing board. The approach at Davis ensures that students don't waste time with bad approaches to research.
Shortly after starting grad school, I knew that I wanted to research data mining approaches to network intrusion detection. This was prompted by a project at work where they came to me and said, "We've got all this connection log data from a firewall -- we want a tool that will look at it and flag the suspicious connections." I figured that there was probably some off the shelf tools we could grab and, with a little integration work, kick this project out. Well, there weren't any. Okay, so I figured that some academics probably had solved the problem, and we just needed to build a system that implemented their techniques. Indeed, there was a little research, but it was obvious that it was far from a solved problem. In fact, at the time (late 2000), it was a very hot research area. Dissertation city, here I come!
Spending a few years surveying the field, I developed a presentation
Now then, having actually survived the Qualifying Examination, proper, allow me to offer some Advice to other UC Davis CS Qualifying Exam.
So, I got through the exam with a "Conditional Pass". Instead of waiting for the chair of the committee to get back to me with the changes they wanted before continuing, I forged ahead. The first step was to baseline against Snort with the DARPA IDS Eval data. What I found was that the data was very good at modeling attacks that signature-based IDS, such as Snort, wouldn't easily detect, however there was no basis for the background traffic generated in the dataset. In other words, the data could tell you if your IDS had a good true-positive rate for non-signature detectable attacks; however, it was useless for evaluating the false-positive rate of the system. These results were written up as
Despite these problems with the DARPA dataset, I still see it widely used in KDD research. As a result, I wrote the following:
By the time I finished the DARPA data assessment, I still didn't have the requirements to pass the qualifying examination. I did, though, understand why my committee had serious reservations about my proposal. I couldn't effectively test my data mining methods for network intrusion detection without data. Now, you have to understand that one can not use real network data to test IDSs, because one does not know what the intent behind every connection is. While many may appear obvious, there are still many that may either be malicious attacks, or benign misconfigurations. I think the chair of my committee, who is not only notorious for being overworked and nonresponsive, but is also one of the nicest gentlemen you'll ever meet, just didn't have it in his heart to tell me that there's no way my proposal would ever work, for lack of good test data.
So, the network intrusion detection community needs a better test dataset. Well, the Lincoln Labs DARPA project to create the IDEval dataset generated numerous PhD and MS theses, so surely this was a dissertation worthy endeavor. A month or so down that road and I said, "Okay, how do I validate that the traffic I'm generating looks like real network data?" This was, after all, the central problem with the DARPA IDEval dataset.
Nothing.
There were no established methods to do this. Long story short (I know, too late): a new dissertation proposal:
So I am now ABD (All But Dissertation). That's not to say that I'm completely without a dissertation -- just a completed one. I'm writing as I go along, because if I wait until I'm done, there's no way I'm going to remember what I did. You can see what I've done so far here, but be careful, it's big:
| Biography | Contact Information | PGP | Projects | Geek Code | Audio Visual |
| The Meaning of Zow | Global Thermonucular War |