Finding different ways to solve a problem has been an interesting way to relax for many people. Not only they derive some kicks out of it, but as a side effect they develop much deeper insight into the problems enabling them to work out solutions under different constraints easily.
Here you will find discussion/hints to solve some common programming problems in efficient ways, supported with code that you can try.
Often support for many heavy arithmetic operations goes missing to reduce the chip cost for marketability reasons, or during chip development stages. And we programmers have to make for the lack of these.
This problem attracted my attention during guest DSP lectures by Mr. Ganesh Bhokare at PUCSD in later half of 2005. If only interest is to access most recent N items (results/computations) and use them for producing new item, it is not useful to consume more storage than needed for N items. Having hardware support to shift the contents of a memory region (within limits) by 1 or more units (byte, different size words) in increasing or decreasing direction would surely help towards most efficient program for the purpose. However that would make hardware bit more complex and costly as well.
Some of the the programming solutions to achieve above goal are as follows.
If neither N is a power of 2, nor MOD support is available? Will you go for extra space consumption (number of items in array = smallest power of two greater than N) or simulating the effect of MOD using comparisons, during every item access?
/* Pseudo Code. Array indexed as 0...(N-1) . */ NEXT_ROUND : index = start_index; do { Use Array[index] as needed to compute new item; index ++; if (index == N) { index = 0; } /* Achieving effect of MOD */ } while (index != start_index); /* Place new computed item in buffer.*/ Array[start_index] = new item; /* Move start_index. */ start_index ++; if (start_index == N) { start_index = 0; } /* Achieving MOD effect */ Go for NEXT_ROUND;
Wait! A simple solution exists that solves the problem efficiently by splitting the access into two semi-open ranges [start_index, N) and [0, start_index). It is even more efficient than the previous solutions with MOD operation or bitwise ANDing because it eliminites the requirement of MOD operation altogether in effectively internal loop.
/* Pseudo Code. Array indexed as 0...(N-1) . */ NEXT_ROUND : index = start_index; do { Use Array[index] as needed to compute new item; index ++; } while (index != N); index = 0; while (index < start_index) { Use Array[index] as needed to compute new item; index ++; } /* Place new computed item in buffer.*/ Array[start_index] = new item; /* Move start_index. */ start_index ++; if (start_index == N) { start_index = 0; } /* Achieving MOD effect */ Go for NEXT_ROUND;
Dividing an unsigned integer by 3 without using division and multiplication can be achieved with the help of shift, add/subtract operations and table lookup. A reasonable compromise between number of steps and table sizes is achieved by manipulating number in base 16 representation.
Any integer N can be rewritten as "M*(N/M) + (N%M)". Similarly any number can be represented as "((...((bn*X + bn_1)*X + bn_2)*X + ...)*X + b1)*X + b0" in base X. These are the basics behind the equalities utilised in provided code.
In case of base 16, above equalities will be simplified to -
Since a and b are less than 16, and maximum value of "integer % 3" can be 2. This means that the maximum value of M in "M/3" in above expressions, or the largest index in the quotient and remainder tables for division by 3, will be 17 and 47 respectively.
Following discussion numbers the LSBit as 0 - every subsequent bit towards MSBit (Left) gets a higher number than previous one, and considers the case when there is atleast one bit with desired property in the binary representation of given number. N indicates the number of bits used to represent the number.
(i) Determining position of first `1' bit from left : The basic idea is to move the first `1' bit from MSBit towards MSBit position and simultaneously isolate it a little more from other `1' bits in each step. Logic terminates in log-base-2 (N) steps, when the desired bit moves to MSBit position and all other bits become `0'.
shiftcount = number_of_bits_used_to_represent_the_number; pos = number_of_bits_used_to_represent_the_number - 1; mask = all_bits_set_to_1; while ( all_bits_except_for_MSBit_set_to_0 != number) { shiftcount >>= 1; mask <<= shiftcount; if (0 == (temp = (mask & number))) { number <<= shiftcount; pos -= shiftcount; } else { number = temp; } }
(ii) Determining position of first `1' bit from right : Firstly we isolate the first `1' bit from right making use of the fact that when we subtract 1 from a number, all the bits from LSBit to first `1' bit from LSBit end, toggle in binary representation of the number.
number = (number ^ (number & (number - 1)));
Now we have got a number, whose binary representation contains exactly one bit set to `1', finding position of which doesn't involve masking operations. Logic terminates in log-base-2 (N) steps when the desired bit moves to LSBit position.
shiftcount = number_of_bits_used_to_represent_the_number/2; pos = 0; while (1 != number) { if (0 != (temp = (number >> shiftcount))) { number = temp; pos += shiftcount; } shiftcount >>= 1; }
(iii) Determining position of first `0' bit from left : Flip all the bits in the number and it becomes the problem of finding position of first `1' bit from left.
(iv) Determining position of first `0' bit from right : One way to achieve it is to flip all the bits in the number and solve it as a problem of finding position of first `1' bit from right. Other way is to isolate the first '0' bit from right into a bit pattern where all bits except the one corresponding to it's position in number are set to 0. Previous discussion already tells you how to handle from this point onwards.
This isolation can be achieved making use of the fact that when we add 1 to a number, all the bits from LSBit to first `0' bit from LSBit end, toggle in binary representation of the number.
number = ((number + 1) ^ (number & (number + 1)));
The code optimisations discussed in this discussion will have significant impact when concerned code section is executed large number of times in short span of time. Consider the situations like following while loop -
int i = (smaller & 3); while (i-- > 0) result += bigger;
On the average, every execution of "result += bigger;" in above while is accompanied by a decrement, a compare and a jump operation. The equivalent switch-case below removes decrement operation thus resulting in fewer instructions execution, but code size increases because of a-sort-of loop unrolling (in exact terms you might not call it loop-unrolling). If above while loop execution count was bigger, instructions saving during execution will be bigger and so the code size.
switch (smaller & 3) { case 3 : result += bigger; case 2 : result += bigger; case 1 : result += bigger; case 0 : break; }
At higher level we would expect single switch decision taking us to appropriate case. This effect can be achieved if we maintain our own jump table. This will also result in slightly lesser code size than switch-case, and will take us to any case in constant time irrespective of number of switch cases.
static const int jumptab[4] = { (&&case_0 - &&case_3), (&&case_1 - &&case_3), (&&case_2 - &&case_3), (&&case_3 - &&case_3) }; /* some other code */ goto *(&&case_3 + jumptab[(smaller & 3)]); case_3 : result += bigger; case_2 : result += bigger; case_1 : result += bigger; case_0 : /* Just continue. */
You can make use of jump-table even in situations with default cases and where cases are not so contiguous as in this example. It is not difficult at all.
Consider the following implementation of ffol (find first `1' from left), where all the local variables, dependent on the type of argument `number' for their type, have been declared using "__typeof__(number)". Due to this, one does not need to change a single word in the function body when argument type is changed to uint16_t or uint64_t and so on.
int ffol (uint32_t number) { register __typeof__(number) mask = (~0); register __typeof__(number) temp; int shiftcount = (sizeof(number) * 8); int pos = ((sizeof(number) * 8) - 1); /* Is number non-zero i.e. does it have any '1' bit? */ if (0 == number) { return (-1); } /* * (((__typeof__(number))(~0) >> 1) + 1) generates the values 0x80, * 0x8000, 0x80000000 and so on for argument types of uint8_t, * uint16_t, uint32_t and so on respectively. */ while ( (((__typeof__(number))(~0) >> 1) + 1) != number) { shiftcount >>= 1; mask <<= shiftcount; if (0 == (temp = (mask & number))) { number <<= shiftcount; pos -= shiftcount; } else { number = temp; } } return pos; }
If you want versions of ffol with different argument types (ffol16, ffol32, ffol64 etc.) to exist simultaneously, then you will end up with many functions with same C body but different headers. If you are willing to take performance loss in ffol32 execution, you may implement only ffol64 and define ffol32 in terms of it, extending the 32-bit argument to ffol32 to 64 bits by settng top 32 bits to 0 in the extended number and passing it on to ffol64.
Layout of English alphabets in ascii character space [A-Z : (64+1) - (64+26)] and [a-z : (96+1) - (96+26)] can be taken advantage of for efficient string processing in many cases.
First, let us consider ASCII representation of English alphabets. If aim is to find occurence of English alphabet irrespective of the case, we can use a 32-bit integer or an array of 27 characters, and map [a-z] and [A-Z] to bit numbers or indices [1,26] as required. [1,26] mapping is advantageous over [0,25] mapping where an additional subtraction is required to compute the bit position or index corresponding to character under consideration.
/* presence_mask is 32 bit integer. ch is input character. */ presence_mask |= (1 << (ch & 0x1F));
Initially the presence_mask or presence_array is initialised to all zero. For each English alphabet character in string we set the corresponding bit or array cell to one. This way, at the end of one pass over string we have the required information.
/* NUM_ALPHABETS is 26 */ for (i = 1; i < (NUM_ALPHABETS + 1); i++) if ((1 << i) & presence_mask) printf("%c ", (96+i));
The solution involving bit_mask is on-the-average advantageous in case of very large strings with random distribution of alphabets, where all the alphabets may be seen atleast once, much before the string end. In this situation we don't get any new information going over remaining string.
/* ALL_ALPHABETS_PRESENT is 0x07FFFFFE */ if (ALL_ALPHABETS_PRESENT == presence_mask) stop processing remaining string.
We can achieve the goal efficiently in one pass over string, even with EBCDIC representation, where A-Z and similarly a-z are not contiguous. In this case we work with an array of 256 characters (or 256 bits presence_mask) and set the corresponding array cell (or bit) to one for each character in the string. Character's 8 bit unsigned value is directly used as index (or bit position). Main difference wrt ASCII case, is in how the characters presence information is processed at the end.
char engalpha[] = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz"; /* Code to set the counts array with characters presense indications. */ for (i = 0; i < (2 * NUM_ALPHABETS); i += 2) if (counts[(int)engalpha[i]] | counts[(int)engalpha[i+1]]) printf("%c \n", engalpha[i]);
The EBCDIC solution will work with ASCII representation of alphabets also.
Counting the alphabets occurence is similar to finding which English alphabets occur in the string except that we work with required unit (integer, long) counts array, and for each (English alphabet) character in the string we increment the corresponding array cell count by one.
Applications :
Initial announcement (v0.01) of library was made on 12th May 2006 at Freshmeat.net. This version provided routines to -
Further development of this project can be tracked via utilfuncs homepage at Sourceforge.net.