- plot5.ggb – Geogebra notebook
- geo5.sh – awk one-liner to format array to input parameters to Geogebra
Every time I wrote I should have written . This will be corrected in the final version.
The pigeonhole upper bound is perhaps better written , since it arises as , where is the totient function.
Quality of the data
After commenting, Tomás Oliveira e Silva sent a follow-up email prompting me to belatedly flesh out this post a little.
After computing the first 100000 values independently, he found one discrepancy. 221000.dat has
53829 1.035945566231253e-8 6045 19754 30497 38849
but Tomás has
53829 0.0000000043404367309945397 0 0 15622 31244 31244
which agrees with the output of search.c
53829 4.3404373281240489e-09 0 15622 31244 31244
So what’s going on here? The answer necessarily includes the fact that search.c was not used to produce that line of 221000.dat. The history is that the data set and the search program developed together over time. The final version of search.c is (hopefully) better than the earliest versions in several ways, including a considered approach to avoiding zero sums rather than an optimistic one, and a change in the enumeration order to notice when we’re about to generate a lot of very long sums and skip over them. The very earliest searches weren’t even done in C; I used Haskell, which is my usual go to for combinatorics, but wasn’t the best fit for this naturally expressed brute force loop. In any case, I didn’t rerun the whole search using the final program because of the cost of producing the data. I’ve been at peace with the risk of errors because I was using the data to suggest patterns to investigate rather than relying on it for any proofs, but I should have been clearer about what I did.
Putting my archaeologist’s hat on, we can tell that the line in 221000.dat isn’t from the latest version of search.c due to the different output format. In fact, there’s a break between the two formats at .
119999 1.6533522139157386e-9 15853 43211 70569 86422 120000 3.4249738497730593e-10 20224 40448 77495 82953 120001 8.4593861872462649e-10 10962 39803 71356 79549 120002 1.4113956883942480e-10 19401 45243 71085 90486 120003 1.3630084760844623e-10 10265 44351 65134 85917 120004 5.3309237833505616e-10 9345 39651 69957 79302 120005 2.1493219522622943e-09 6408 39397 67265 79033
This means that I might have used Haskell as far as 120000 (I might also have used C with a different output format), but I didn’t use it any later. In fact the incorrect line matches the output of the Haskell search exactly, so the error probably comes from a bug in the Haskell. My first guess, given the correct form of the optimal configuration for 53829, is that I had an off by one error in an index preventing me from looking at , but after revisiting the source code I don’t think that’s it. As an early version of the guard against pointless searching I have the line
bestCompletions bestGuess a b = if length > 2.0 then  else [t | t@(_,a,b,c,d) <- candidates''', not $ zero a b c d]
The error here is that the cutoff for “too long” needs to be 2 plus the shortest thing seen so far, not 2. (Not counting the triple-prime in a variable name as an error.) The correct configuration for 53829 is exactly a situation in which the sum of the triple has length just over 2, so the Haskell search couldn’t see it.
I’m going to generate the data from 1 to 120000 again in case there are more errors of this type.