Monday, October 5, 2009

How to enhance runtime performance?

Since we do computer science research, we typically implement our explorations and research advances as computer programs. Many research fields have to deal with an immense amount of data (e.g. information retrieval, natural language processing, data mining), adding scalability as a major requirement when you develop a prototype/an exploration.

This post is all about how to enhance runtime performance of your program. It summarizes a discussion I had with some colleagues in this regard:

1. "Premature optimization is the root of all evil (or at least most of it) in programming", says Donald Knuth. Instead, use a profiler to identify which pieces of the code need your attention.

2. Optimize the algorithm first, not the code. Sometimes we think a problem is too trivial to analyze its algorithm, but it makes much sense to reconsider the 'trivial' algorithm when it runs millions of times. For example, this code

double metric1 = calculateMetric1(input);
double metric2 = calculateMetric2(input);
if(metric1 > threshold1 && metric2 > threshod2)
  return true;
else
  return false;

Assuming 'calculateMetric2' is an expensive procedure, refactoring the code as follows will significantly enhance performance:

double metric1 = calculateMetric1(input);
if(metric1 <= threshold1)
  return false;
double metric2 = calculateMetric2(input);
if(metric2 <= threshold2)
  return false;
else
  return true;

3. Concentrate on high level optimizations. Most popular compilers do a lot of optimization on your behalf. For example, Microsoft's Visual C compiler will replace the following code with "x = 27000000":

x = 0;
for(int i=0;i<300;i++)
  for(int j=0;j<300;j++)
    for(int k=0;k<300;k++)
      x++;


4. Consider changing the data structure that hosts your data (e.g. a hashmap instead of an array).

5. Caching. If there’s an expensive operation with inputs that are likely to repeat, it might be worth caching. Note that a cache with low hit-ratio might degrade overall performance.

6. Distribute the load on the multiple cores of your processor, or on multiple machines.

7. Get rid of redundant calls. For example, when the code aggressively uses case-insensitive string comparisons, lowercase your strings only once.

8. Sometimes, memory allocation is very demanding. The solution in these cases is usually to create your own memory allocator or use memory pools for certain objects.

9. Sometimes, initializing/resetting arrays is very costly. It’s not always necessary to initialize/reset an array before using/reusing it. For example, to compute the Levenshtein distance between two words, there’s a dynamic programming algorithm which uses a 2D array. Even though the same array is used for different word pairs, no need to zero the array since it’s built in a bottom-up fashion.

I'd like to acknowledge my friends/colleagues (Ahmed El Deeb, Ahmed Sabry & Diaa Samy) for the constructive discussion we had around this topic.

2 comments:

  1. The students who are really connected witht he computer science they should have join to the research for their own good. click here to see more about the wrting tips.

    ReplyDelete
  2. Great opportunity, The quantifiable direction is basic to get some answers concerning the nonstop addition or decreasing in effectiveness, economy, enlightening level in genuine examination the quantitative and check in this https://www.graduatethesis.org/undergraduate-personal-statement/ site for subjective methods are generally used.

    ReplyDelete