The R&D Sketch: 2009

Wednesday, November 11, 2009

Recommendation letters

This post is the second episode featuring Prof. Mor Harchol-Balter's talk advising people applying to PhD programs in computer science or related areas.
[first episode - full article]

"Perhaps the most important part of your application is the letters of recommendation. You will need 3 letters of recommendation for the Ph.D. program, and typically 4 letters of recommendation for a fellowship.

Whom to ask for a letter
Ideally you would like to make all your letters of recommendation count. Consider the following two letters:
Letter 1: “I highly recommend student X for your graduate program. Student X received an A+ in my undergraduate algorithms class. He was ranked Number 2 out of 100 students. He got the highest score on the final. He worked very hard all semester, never missed a class, and was always able to answer the questions that I asked in class. This conscientious attitude makes him an excellent candidate for any graduate program. ”
Letter 2: “I highly recommend student Y for your graduate program. Student Y received a B in my undergraduate algorithms class. He was ranked Number 29 out of 100 students. Halfway through the semester we started working on network flows. Student Y seemed extremely excited by this topic. He disappeared for 4 weeks and even missed an exam. However when he came back, he showed me some work he had been doing on a new network flow algorithm for high-degree graphs. He had done some simulations and had some proofs. I’ve been working with student Y for the past couple months since then and he is full of ideas for new algorithms. I think student Y’s initiative makes him an excellent candidate for any graduate program.”
Which letter do you think is stronger? It turns out that Letter 2 is very strong. Letter 1 actually counts as 0. At CMU we mark all letters like letter 1 with the acronym D.W.I.C.. This stands for “Did Well In Class” which counts for 0, since we already know from the student’s transcript that he did well in class. By contrast, student Y’s letter gives us a lot of information. It explains that the reason student Y didn’t do better in class was that he was busy doing research. It also tells us that student Y started doing research on his own initiative, and that he is quite good at doing research. The professor was impressed enough with student Y’s ideas that he took him on as a student researcher despite student Y not having high grades. You want your letters to all be of type 2 (this doesn’t mean that you should skip class!). Remember that letters of type 1 will not count. You want words like 'self-motivated', 'strong research potential', 'own initiative', 'independent', and 'driven' to appear in your letters. These are the words that we circle when reading recommendation letters. You therefore want to ask letters from people who have seen you do research. These may be professors or employers.

One caveat: It makes some difference whom you ask for a letter. As a general rule (there are always exceptions due to people’s fame), letters from professors count the most. Next highest are letters from research scientists. After that come letters from lecturers, systems scientists, employers, or postdocs. Please do not get a letter from a graduate student. If you found yourself doing research where you were supervised by a graduate student or postdoc, you should ask the professor for whom they work if she can co-write the letter. The reason is simple: professors are the ones reading the letter, and they are most likely to know other professors.

There is an issue for students who have been working for a while. You will certainly want a letter from your employer, but you will also want two letters from professors. This was an issue for me when I applied to graduate school. What I did was to keep touch with a few professors during my time at work. When I was ready to apply to graduate school, I contacted the professors who knew me well and scheduled a meeting with them to discuss the research that I had done while I worked. I gave them each an oral presentation. I also gave them each writeups of each of my projects.

How to ask for a letter
Asking for a letter of recommendation won’t be a problem if you have been doing research with this person, but that won’t be possible in every case. Here’s a guideline which will maximize the contents of your letter. This works on the theory that professors have very little time and little memory (both of which are good assumptions):
Prepare a packet for each recommender. This packet should contain all the relevant information about you that could help the recommender. Be careful not to make the packet too large. Here’s what should be in it:
Your statement of purpose.
A summary of every research project you worked on and with whom, regardless of whether this was at a school or research lab. If you have published a paper, or have a technical report, please include that too.
A sheet of paper listing all math/cs/engineering/science classes you have taken with the names of professors and grades.
A list of extracurricular activities and awards/competitions.

At the top of the packet should be:
A recent photo of you – professors receive many such packets and don’t remember you the second after you leave the office.
Directions. E.g., please seal and sign and send to this address by Jan. 5. Put an earlier date than the real deadline – professors are notoriously late.
Confirmation information: Please send me email at blank address after you send this off. If I don’t hear from you by Jan. 5th, I will send you an email reminder. (You need this confirmation information because otherwise you’ll never know if the recommendation was sent and you’ll be sitting around biting your nails wondering.)
Go to your potential recommender with your packet and ask him/her the following question: “Do you feel comfortable writing a strong letter of recommendation for me to graduate school?” You need to phrase the question this way so that the potential recommender has a way out. Do not be upset if the potential recommender says no. It is good that he/she let you know. This is much better than getting a weak letter.
Check with the school to confirm that they have received a letter from each of your recommenders.
Remember to at least send your recommender a thank you card! It’s a lot of work to write a decent recommendation letter, and you may need more letters in the future."

Sunday, October 18, 2009

MSc then PhD?

Typical situation: Hamdy was a top ranked student in his university, in a developing country, and now he'd like to get a PhD from a good university in US/Canada (with financial aids). He has two options:

Apply for a PhD.

Apply for an MSc. And after getting the MSc degree, he applies for a PhD.

Why would Hamdy go for option 2:

Avoid frustration
MSc is less of a commitment than PhD. It's easier to work on, and takes much less time to get done. However, it gives a clear idea on how Hamdy's life is gonna be as a PhD student. Given that 50% of PhD students in some fields drop out before completing their degrees, this becomes a significant consideration. Hamdy prefers to go for the less risky degree (ie. MSc) and then decides whether he should continue for a PhD or get a good job in the industry with his MSc degree, from a good university in US/Canada. I know one person who left PhD for the industry, and many PhD students who consider doing the same.

Chances of acceptance
MSc is less of a commitment than PhD, this holds true even for the admission committee of the university he applies for. Several persons I know applied for a PhD and were offered only an MSc. So, applying for an MSc would increase Hamdy's chances of getting admitted for graudate education.

Improve credentials
What if Hamdy wants to get his PhD from the reputable university A, while his credentials qualify him only for the less-reputable university B. He can first apply for an MSc with B and get the MSc degree to improve his credentials, then apply for a PhD in A. He can then transfer his MSc credits from B to A (this is possible for many A-B universities pairs in US/Canada). Note that -if this is what Hamdy plans to do- he shouldn't apply for a PhD in the less reputable university (B), because his supervisor will be grooming and expecting him to work for him for ~5 years, and then Hamdy will let him down and -moreover- ask him for a recommendation letter to the more reputable university (Hamdy can't go to A without a recommendation letter from his supervisor at B). I'm not sure what the supervisor will write in the recommendation letter then about Hamdy's commitment.

Freedom
Younos Aboulnaga was the first to introduce me to the concept of "taking a break while studying". In his undergraduate study in Egypt, Younus paused before last year, spent a year working in Brazil, and then went back to earn his BSc degree. Almost all PhD students get fed up after a few years of working for it. They wish they could "take a break" just like Younus did. However, they can't just apply for any job and spend a year or two, then come back to continue the PhD. They have to find an internship inline with their research, competing with many PhD students for a limited number of internships, probably with low pays. If Hamdy goes for option 2, he could pause for as long as he wishes, then gets back to continue his graduate education when he feels he's ready to continue.

Why would Hamdy refrain from option 2:

Time
Option 2 usually takes more time, even though he will probably take the MSc in option 1 as well.

Commitment to research
Most of the rationales for taking option 2 assumes it's likely for the student to drop out. If Hamdy knows somehow he's comfortable with doing research and will withstand the pain it takes to get a PhD, these rationales weight nothing for him.

Over qualified
Some universities (e.g. UCSB) won't admit you for a masters program if you already have had a masters degree in the same disciple from another university.

Request: If you know something that contradicts with what I'm stating here, or complements it somehow, please do comment on the post or drop me an email at ammar DOT w AT acm DOT org, and I will be happy to update the original post reflecting your valuable point of view.

Monday, October 5, 2009

How to enhance runtime performance?

Since we do computer science research, we typically implement our explorations and research advances as computer programs. Many research fields have to deal with an immense amount of data (e.g. information retrieval, natural language processing, data mining), adding scalability as a major requirement when you develop a prototype/an exploration.

This post is all about how to enhance runtime performance of your program. It summarizes a discussion I had with some colleagues in this regard:

1. "Premature optimization is the root of all evil (or at least most of it) in programming", says Donald Knuth. Instead, use a profiler to identify which pieces of the code need your attention.

2. Optimize the algorithm first, not the code. Sometimes we think a problem is too trivial to analyze its algorithm, but it makes much sense to reconsider the 'trivial' algorithm when it runs millions of times. For example, this code

double metric1 = calculateMetric1(input);
double metric2 = calculateMetric2(input);
if(metric1 > threshold1 && metric2 > threshod2)
return true;
else
return false;

Assuming 'calculateMetric2' is an expensive procedure, refactoring the code as follows will significantly enhance performance:

double metric1 = calculateMetric1(input);
if(metric1 <= threshold1)
  return false;
double metric2 = calculateMetric2(input);
if(metric2 <= threshold2)
  return false;
else
  return true;

3. Concentrate on high level optimizations. Most popular compilers do a lot of optimization on your behalf. For example, Microsoft's Visual C compiler will replace the following code with "x = 27000000":

x = 0;
for(int i=0;i<300;i++)
  for(int j=0;j<300;j++)
    for(int k=0;k<300;k++)
      x++;

4. Consider changing the data structure that hosts your data (e.g. a hashmap instead of an array).

5. Caching. If there’s an expensive operation with inputs that are likely to repeat, it might be worth caching. Note that a cache with low hit-ratio might degrade overall performance.

6. Distribute the load on the multiple cores of your processor, or on multiple machines.

7. Get rid of redundant calls. For example, when the code aggressively uses case-insensitive string comparisons, lowercase your strings only once.

8. Sometimes, memory allocation is very demanding. The solution in these cases is usually to create your own memory allocator or use memory pools for certain objects.

9. Sometimes, initializing/resetting arrays is very costly. It’s not always necessary to initialize/reset an array before using/reusing it. For example, to compute the Levenshtein distance between two words, there’s a dynamic programming algorithm which uses a 2D array. Even though the same array is used for different word pairs, no need to zero the array since it’s built in a bottom-up fashion.

I'd like to acknowledge my friends/colleagues (Ahmed El Deeb, Ahmed Sabry & Diaa Samy) for the constructive discussion we had around this topic.

Monday, September 7, 2009

Life after the PhD

This post is quoted from a document written by Mor Harchol-Balter advising people applying to PhD programs in computer science or related areas. Mor has been in the committee of PhD admissions at CMU, Berkeley and MIT. I found the whole document very informative, but too large to include in a single post.

"When making a decision about the next 6 years of your life, it’s good to stop and think about what you might do when you finish. Most students upon completing a PhD either go into academia (research university or teaching school) and become a professor, or they go to a research lab. Some people never do research again after completing a PhD. For such people, the PhD was largely a waste of time.
If you choose to be a professor at a research university, your life will consist of the following tasks: (i) doing research on anything you like, (ii) working with graduate students, (iii) teaching classes, (iv) applying for grants, (v) flying around to work with other researchers and to give talks on your research, (vi) doing service for your department and school (like giving this talk). Note that I say “your life” rather than your job, because for new faculty, your life becomes your job. It’s a fantastic job/life for me because I love these activities, so I’m happy to work hard at all of them, but it’s not right for everyone.

If you choose to be a professor at a teaching college, your job will consist of the following: (i) teaching lots of classes, (ii) doing service for your department or school, (iii) occasionally advising undergraduates on undergraduate research, or doing a little of your own research.

If you choose to go to a research lab, your job will consist of the following: (i) doing research (half will be on whatever you want, half will be on whatever the company wants you to do), (ii) working with other people in the company, (iii) traveling around a little to give talks and work with others."

I'm going to publish other parts of Mor's talk, discussing specific aspects related to PhD, in subsequent posts on this blog. So, keep watching.

Saturday, August 29, 2009

How to write paper's abstract

I used to believe writing the abstract is the hardest part in a paper. It has to be complete but concise. It has to define your problem and drive readers interest in that problem. It has to give an idea on the approach you used to solve the problem. You should indicate what kind of results you got, and conclude the work. Too many requirements in too few words. Even though I have all the details in my head, I don't seem to find a way to write this nasty piece of text!

However, after doing a little bit of research on how they should be written, it seems to be one of my favourite sections writing a paper. It all happened when I realized that I don't need to achieve those many requirements in parallel. I can address each of them individually, in series, and end up with a good abstract.

Go over this checklist, provided by Philip Koopman 10 years ago, and write one (you can get away with two, maximum three) sentence on each of them:
- Motivation: Why is your problem interesting? This "sentence" becomes less important if you're doing incremental work on a problem that's widely recognized as important.
- Problem statement: What's the problem you're working on? Which piece of the problem are you trying to solve? Does your work target a special class of the problem?
- Approach: Give a very high level picture on your approach/algorithm.
- Results: If you evaluated your work, it's good to state the results here, in one sentence.
- Conclusion: What are the implications of your work?

Quick notes:
- Unless the conference/journal has specific requirements, an abstract word count of 150 to 200 is common.
- Try to appropriately introduce related keywords to make it more likely for your abstract/paper to appear in search results. However, avoid using too much jargon.

Examples:

Title: Word-sense disambiguation using statistical methods
Abstract: We describe a statistical technique for assigning senses to words. An instance of a word is assigned a sense by asking a question about the context in which the word appears. The question is constructed to have high mutual information with the translation of that instance in another language. When we incorporated this method of assigning senses into our statistical machine translation system, the error rate of the system decreased by thirteen percent.
Note: Word sense disambiguation is widely recognized as important in the NLP field.

Title: SeRLoc: secure range-independent localization for wireless sensor networks
Abstract: In many applications of wireless sensor networks (WSN), sensors are deployed un-tethered in hostile environments. For location-aware WSN applications, it is essential to ensure that sensors can determine their location, even in the presence of malicious adversaries. In this paper we address the problem of enabling sensors of WSN to determine their location in an un-trusted environment. Since localization schemes based on distance estimation are expensive for the resource constrained sensors, we propose a range-independent localization algorithm called SeRLoc. SeRLoc is distributed algorithm and does not require any communication among sensors. In addition, we show that SeRLoc is robust against severe WSN attacks, such as the wormhole attack, the sybil attack and compromised sensors. To the best of our knowledge, ours is the first work that provides a security-aware range-independent localization scheme for WSN. We present a threat analysis and comparison of the performance of SeRLoc with state-of-the-art range-independent localization schemes.

Title: Abbreviation Expansion Using Information Retrieval
Abstract: Abbreviation is a dynamic and widely used concept in modern languages. However, sometimes abbreviations become a hurdle -for both humans and machines- to understanding part of the text. Many abbreviations have multiple meanings depending on the scope of the text. In this paper we address the problem of selecting the correct expansion of an abbreviation given its context. We propose a novel information retrieval approach to find the expansion most relevant to the contextual text. Our approach achieves an accuracy of 98% compared with 96% achieved by the state-of-the-art approach in the literature, over the publicly available Reuters corpus.