The markv.nl bloghttp://markv.nl/blog/rss/+storageFollow all the posts and links tagged #storageenMon, 25 Oct 2021 06:20:04 +0000Why databases use ordered indexes but programming uses hash tableshttp://markv.nl/blog/link/29/page shared by Markhttp://markv.nl/blog/link/29/Optimal counting system using fingershttp://markv.nl/blog/optimal-counting-system-using-fingersIt's sometimes suggested -perhaps jokingly- that we should use binary to count using fingers, as we'd get all the way from 0 to 1023! Much better than the 0 to 10 we get 'normally'. Of course, it's rather difficult for most people to know at a glance which number is represented by .|..| ||.|. (probably 256+32+16+8+2=314). This way of counting is most often suggested by programmers, because using this way is just using base 2, just like computers do. A better way But I think we can do much better! And by better I mean we can theoretically represent a more useful range of numbers with 10 bits of information. For human understanding, it's much worse. The first observation is that we can represent only 1024 sequences. So if every sequence represents a distinct number, that's 1024 number. But do we really care about the difference between 967 and 968? Maybe, but let's assume that the difference between 1024 and 1015776 is more important. I'd like to sacrifice some precision to have higher numbers. Note that I do care a great deal about the difference between 2 and 3 - we can't just space all the numbers by 100. Goals So here is what I want: 1024 distinct whole numbers Express high numbers approximately - at least a million Express all low numbers - at least 0 to 10 should be available and exact The pattern is probably not easy, but it should at least be a closed formula that fits on one line The foundation The main trick is blatantly stolen from floating point numbers. Check out this simulator of floats to get familiar with those. Tricky but great! Essentially, some bits represent a linear number (matissa), and some other bits represent an exponential multiplier (exponent). For example: a * 2^b Say we use one hand for a (the mantissa) and one hand for b (the exponent). A hand has five bits, so we can get to 2^5-1=31 for each, or 31*2^31=66,571,993,088. Not bad! We also have the numbers 0 to 31 exactly (when`b=0`). But there is a problem. Doubling a is the same as adding 1 to b, so we get a lot of duplication. For example, there are 5 ways to represent 32: ` 16* 2^1` ` 8* 2^2` ` 4* 2^3` ` 2* 2^4` ` 1* 2^5` And all 32 numbers with a=0 represent 0. We have a lot of duplicates, which means we can express less than 1024 numbers! An addition Let's examine the sequence for b=0. This is simply the numbers from 0 to 31. For b=1 it's the even numbers from 0 to 62. And for b=2 it's 0, 4, 8, ..., 120, 124. So the first half of each such sequence (at fixed b) overlaps with the previous sequence! We'd need to introduce a shift to the b=1 series so it starts halfway. We need to add 32. Is the shift just 64 for b=2? It is not, because the +32 also shifts the last value for b=1 to 94 instead of 62. So for b=2 we need to shift by 64+32. And for b=3 by 128+64+32 since the highest number with b=2 is 220. This sequence of shifts can be expressed as 2^(b+5)-2^5 Using these shifts, the highest number for b is lower than the lowest number for b+1, so all numbers are distinct. Conclusion Putting these together, let us agree that the left hand is the binary representation a and the right hand is the binary representation b. Together these shall express a number a * 2^b + 2^(b+5) - 2^5 Does that satisfy the goals? 1024 distinct whole numbers: Yes! All numbers are unique, since `b` series are disjoint. Express high numbers approximately: We can reach `135,291,469,792`, about the number of people that ever lived Express all low numbers: Every number up to `32` and every even number up to `94` The pattern is probably not easy, but it should at least be a closed formula that fits on one line: Yes to both We cannot represent the 967 or 968 that we initially talked about. In fact, the closest we can get is 960 and 976. They're in the sequence for b=4, so the spacing is 2^4=16. Negative numbers I could imagine that maybe you have more use for the number -1 than for the number 135,291,469,792 (or 18)... If we want negative numbers, the easiest way is to sacrifice one bit from either a (precision) or b (range) to indicate positive or negative. Probably b makes most sense, which will reduce our maximum to roughly the square root of what it was. To be exact, a measly 2,064,352. The only problem with this, is that we end up with both +0 and -0. We'll have to settle for 1023 distinct numbers, or shift by 1. If we want more positive numbers than negative ones, we could try something fancy, but we wouldn't want people getting confused by the system, would we?http://markv.nl/blog/optimal-counting-system-using-fingers