Which yields faster code?

The true answer to your question is: “depends on the platform and compiler settings”.

Let us take the case of no optimizations.
There are 3 cases.

Case 1: Adding the variable 3 times.

The instructions:

  MOV Y, 0    ; Set Y to zero.
  ADD Y, Y, X ; Add X to Y and place result in Y.
  ADD Y, Y, X ; Add X to Y and place result in Y.
  ADD Y, Y, X ; Add X to Y and place result in Y.

The processor would fetch 4 and process 4 instructions. The bottleneck may be in the duration to fetch.

Case 2: Multipy by 2 and add once

The instructions:

  MOV Y, 0
  MUL Y, X, 2  ; Multipy X by 2 and store into Y.
  ADD Y, Y, X ; Add X to Y and place result in Y.

Note, there is one less instruction, but the multiplication will take longer. Hard to tell if the multiplication is faster than a fetch.

If we use shifting instead of multiplying by 2:

  MOV Y, 0
  SHL Y, X, 1 ; Shift the bits in X left by one bit, place result in Y.
  ADD Y, Y, X ; Add X to Y and place result in Y.

This will be faster because shifting by one bit is faster than multiplying.
Is the savings significant?

Case 3: Multiplying by 3

The instructions:

MOV Y, 0
MUL Y, X, 3 ; Multipy X by 3 and place result in Y.

There are only 2 instructions, but the multiplication takes longer than shifting or adding. Is it faster than fetching? Don’t know, that is a performance measurement that is needed.

Conclusion:

The experiment needs to be profiled on different systems to get proper results. Multiplying reducing the number of instructions that are fetched from the processor’s cache. However, multiplying is a more complex operation than addition or shifting. If multiplying is faster than fetching from the cache, then there is some savings.

The big question, is how much execution time is saved? Not much. If we assume that the processor takes an average of 100ns to execute an instruction, best case you will have saved 2 instructions or 200 ns. Considering relative times, User Input is measured in seconds, I/O in milliseconds. You could gain more time by optimizing User Input or File I/O.

If you gain 200ns, it will be wasted by waiting for User Input or File I/O or for the OS to swap your program with another one.

The productivity lost with these micro-optimizations is substantial. The time spent researching and profiling this experiment could be better spent developing the remainder of the program and making the program correct and robust.

Leave a Comment