The R&D Sketch: August 2010

Sunday, August 22, 2010

Failure is an orphan

An eye opener! That's the least to say about this inspiring article for Prof. Stan Szpakowicz in Computational Linguistics journal.

According to the article, one project out of a hundred produces results that justify the investment. However, we tend to count ourselves among the 1%. That's because we need to show we outperformed others in order to publish more in order to have a thriving career in research. That's a compelling reason for us NOT to invest more time on something that produced negative results.

But WAIT. "Suppose you have set up an experiment carefully and in good faith, but still it comes up short. That’s not a positive outcome. Maybe your intuition has let you down. Maybe this cannot work. Wait, maybe you can prove that it cannot work?" THAT would be a useful outcome. But you need to get it published.

The problem boils down to peer reviews which aggressively reject failures. "A forthright admission of the inferiority of one’s results—despite the integrity or novelty
of the work—is a kiss of death: no publication. There must be improvement ... conformance to reviewers’ expectations is an asset. Indeed, we write so they are likely to accept". But we're talking about ourselves. WE are the reviewers. If we -the authors- insist on striving for success and run away from every negative result, then we -the reviewers- will check papers for signs of success.

An experiment carefully thought out, a systematic procedure, an honest evaluation—these are the ingredients of good science. It is not mandatory for the results to be positive, though it certainly lifts one up if they are. In areas where empirical methods dominate (e.g. computational linguistics) people try things which fail at the experimental stage. This may be due to lack of rigor, but often there are deeper, unexpected, and intriguing reasons. We can learn a lot if we analyze scientifically why an intuitive and plausible experiment did not work. Then again, to know what leads to dead ends in research surely can warn others off paths which take us nowhere. Simply put, a negative result can be a useful lesson.

Q: Philosophy aside, I want to publish a serious, worthwhile negative result I've obtained. Where to go today?

A: Here's a non-comprehensive list in different disciplines

Journal of Interesting Negative Results in NLP and ML (http://www.jinr.org/)
Journal of Negative Results in BioMedicine (http://www.jnrbm.com/)
Journal of Negative Results – Ecology and Evolutionary Biology (http://www.jnr-eeb.org/)
Journal of Articles in Support of the Null Hypothesis (http://www.jasnh.com/)

Tuesday, August 17, 2010

The illustrated guide to a PhD

Here's how Prof. Matt Might (University of Utah) explains to fresh PhD students what a PhD is:

Imagine a circle that contains all of human knowledge:

By the time you finish elementary school, you know a little:

By the time you finish high school, you know a bit more:

With a bachelor's degree, you gain a specialty:

A master's degree deepens that specialty:

Reading research papers takes you to the edge of human knowledge:

Once you're at the boundary, you focus:

You push at the boundary for a few years:

Until one day, the boundary gives way:

And, that dent you've made is called a Ph.D.:

Of course, the world looks different to you now:

So, don't forget the bigger picture:

Keep pushing.

Taken from http://matt.might.net/articles/phd-school-in-pictures/ - licensed under the Creative Commons Attribution-NonCommercial 2.5 License.

Tuesday, August 3, 2010

Research competitions

Research competitions are designed to accelerate research on a particular topic. The entity which organizes a competition usually have a problem and wants to encourage researchers to find solutions of its problem.

Examples:

For example, Netflix, a popular US company which provides flat-rate DVD rentals and video streaming services, put a $1,000,000 prize for those who come up with the best collaborative filtering algorithm to predict user ratings for films based on previous ratings.
Text Analysis Conference (TAC) is a series of evaluation workshops organized by NIST to encourage research in Natural Language Processing.
Text REtrieval Conference (TREC) is a series of evaluation workshops organized by NIST to encourage research in Information Retrieval.
OpenMT is yet another evaluation series organized by NIST to encourage research in machine translation technologies.
Speaker Recognition Evaluation (SRE) is NIST's workshop to encourage research in speaker recognition.

Why should you participate?

Data: Organizers of a research competition provide participants with scarce data resources for free so that they can compete. It is very expensive to collect the data yourself. Sometimes, you can subscribe to get the data for (huge) fees, but even then, data catalogs are not made available until many years after the competition was held.
Evaluation: Normally, you need to prove your novel technique performs better than state-of-the-art techniques that handle the same problem as yours. First, you need to decide which other techniques you should compare to, which is not always an easy task. Then, you try to obtain the same data set used in their publication so that your results are comparable. Soon you find out they were not using standard data for training or testing. So, you decide to run the other technique on your data, but you can't find a readily available implementation of it. So, you have to implement it yourself. After all, the comparison may not be accurate because there are usually tons of details not mentioned in publications which make a big difference in results. When you participate in a research competition, you don't have to worry about all this painful overhead.
Exposure: Normally, when you do something great, no guarantee people will listen to you. Most prestigious conferences, for example, reject high quality papers because they have limitations on the number of papers they may accept. When you participate in such competitions and produce great results compared to other participants, they will listen and learn from what you did.
Publications: This is related to the previous point. Most competitions provide a good publication venue for participants to explain their systems and results.