Every time I wrote I should have written . This will be corrected in the final version.
The pigeonhole upper bound is perhaps better written , since it arises as , where is the totient function.
After commenting, Tomás Oliveira e Silva sent a follow-up email prompting me to belatedly flesh out this post a little.
After computing the first 100000 values independently, he found one discrepancy. 221000.dat has
53829 1.035945566231253e-8 6045 19754 30497 38849
but Tomás has
53829 0.0000000043404367309945397 0 0 15622 31244 31244
which agrees with the output of search.c
53829 4.3404373281240489e-09 0 15622 31244 31244
So what’s going on here? The answer necessarily includes the fact that search.c was not used to produce that line of 221000.dat. The history is that the data set and the search program developed together over time. The final version of search.c is (hopefully) better than the earliest versions in several ways, including a considered approach to avoiding zero sums rather than an optimistic one, and a change in the enumeration order to notice when we’re about to generate a lot of very long sums and skip over them. The very earliest searches weren’t even done in C; I used Haskell, which is my usual go to for combinatorics, but wasn’t the best fit for this naturally expressed brute force loop. In any case, I didn’t rerun the whole search using the final program because of the cost of producing the data. I’ve been at peace with the risk of errors because I was using the data to suggest patterns to investigate rather than relying on it for any proofs, but I should have been clearer about what I did.
Putting my archaeologist’s hat on, we can tell that the line in 221000.dat isn’t from the latest version of search.c due to the different output format. In fact, there’s a break between the two formats at .
119999 1.6533522139157386e-9 15853 43211 70569 86422
120000 3.4249738497730593e-10 20224 40448 77495 82953
120001 8.4593861872462649e-10 10962 39803 71356 79549
120002 1.4113956883942480e-10 19401 45243 71085 90486
120003 1.3630084760844623e-10 10265 44351 65134 85917
120004 5.3309237833505616e-10 9345 39651 69957 79302
120005 2.1493219522622943e-09 6408 39397 67265 79033
This means that I might have used Haskell as far as 120000 (I might also have used C with a different output format), but I didn’t use it any later. In fact the incorrect line matches the output of the Haskell search exactly, so the error probably comes from a bug in the Haskell. My first guess, given the correct form of the optimal configuration for 53829, is that I had an off by one error in an index preventing me from looking at , but after revisiting the source code I don’t think that’s it. As an early version of the guard against pointless searching I have the line
bestCompletions bestGuess a b = if length > 2.0 then [] else [t | t@(_,a,b,c,d) <- candidates''', not $ zero a b c d]
The error here is that the cutoff for “too long” needs to be 2 plus the shortest thing seen so far, not 2. (Not counting the triple-prime in a variable name as an error.) The correct configuration for 53829 is exactly a situation in which the sum of the triple has length just over 2, so the Haskell search couldn’t see it.
I’m going to generate the data from 1 to 120000 again in case there are more errors of this type.
]]>A quick search (and the existence of Rob Eastaway’s talk on the subject) reveals that a fair amount is known about Diffy. I have deliberately not read any of it in detail. I don’t even know why you always reach zero, which is not as obvious as I had assumed: consider , say, which reaches zero in four steps, but increases the sum of the four numbers along the way. The question I looked at was “What is the maximum number of steps this process can take, beginning with numbers from ?” Call this .
Rob begins by asking us to show that . ( by counting the cycles we see along the way rather than the number of steps we take.) There are only starting points so this takes no time at all if we break the spirit of the puzzle and use a computer. But there’s no need to be pointlessly inefficient. The length of a Diffy “game” is unchanged if we
By applying the first three observations we can say that there will be a witness for the value of using numbers from with the first number being , and the fourth number being at least as large as the second.
f' [x,y,z,w] = [abs (x-y), abs (y-z), abs (z-w), abs (w-x)]
g' xs = takeWhile (/=[0,0,0,0]) (iterate f' xs)
h' xs = length (g' xs) + 1
i' n = maximum [(h' [a,b,c,d], [a,b,c,d]) | a <- [1], b <- [a..n], c <- [a..n], d <- [b..n]]
(Not pictured: good choice of variable names.)
So an exhaust is down to using symmetry.
My next thought in these situations is that we’re doing an awful lot of recomputation. If I’ve already scored , why should I reevaluate it when scoring ? Why not store the score somewhere so we can look it up later?
f n = maximum (elems table) where
table = array ((0,0,0),(n-1,n-1,n-1)) [((a,b,c), g a b c) | a <- [0..n-1], b <- [0..n-1], c <- [0..n-1]]
g 0 0 0 = 1 -- represents c c c c for c > 0 the first time you reach it
g a b c | a > c = g c b a
| otherwise = (table ! normalise a (abs (b-a)) (abs (c-b)) c) + 1
normalise a b c d = let m = minimum [a,b,c,d]
in normalise2 (a-m) (b-m) (c-m) (d-m)
normalise2 0 b c d = (b,c,d)
normalise2 a 0 c d = (c,d,a)
normalise2 a b 0 d = (d,a,b)
normalise2 a b c 0 = (a,b,c)
This is what I ended up with. It isn’t pretty. The thing that wasn’t, but should have been, immediately obvious is that the states I encounter along the way won’t be in the nicely normalised form we’ve just decided we want to work with, so the naive lookup table increases our time back up to . On the other hand, normalising things like this is expensive compared to the computation we’re trying to save. It’s a very bad deal, and even if it weren’t I can’t afford memory for very large values of .
So we’re back to computing honestly for each in time . The next observation is that if we want for lots of we can do better: if is to be larger than then the pattern witnessing that had better use both and , taking us to for each new . In fact, we can say a bit more. The and would either have to be adjacent or opposite.
1 -- b 1 -- n | | | | d -- n d -- c
Then we still have some symmetries to play with with. In the opposite case, all of the edges are related by symmetries so we can assume that is the smallest difference. In the adjacent case there is less symmetry, but we can still assume that .
direct n = maximum ( [score 1 b n d | b <- [1..n`quot`2], d <- [b..n+1-b]]
++ [score 1 n c d | c <- [1..n], d <- [1..n+1-c]] )
score 0 0 0 0 = 0
score a b c d = 1 + (score (abs (a-b)) (abs (b-c)) (abs (c-d)) (abs (d-a)))
That’s not bad and will get you well up into the thousands without difficulty.
(Everything above is a mostly accurate representation of my progress through this problem, with light editing to fix errors and wholesale removal of dollar signs from my Haskell, since they confuse the LaTeX plugin. What follows is ahistorical, presenting an endpoint without all the dead ends along the way.)
So far we haven’t thought about the actual operation we’re iterating, so let’s do that now. Suppose that we’re in the opposite case with and strictly between and . Then replacing by and by produces a pattern that goes to the same place as the original pattern under two rounds of iteration, so such patterns aren’t interesting in our search; they were considered in previous rounds. Similarly, in the adjacent case where we can decrease and to obtain an equivalent-after-two-iterations pattern with a smaller value of , so we don’t need to consider it. Finally, if we’re in the opposite case but , say, then we could alternatively have viewed ourself as being in the adjacent case. So we can reduce our search to
direct n = maximum [score 1 a b n | a <- [1..n], b <- [a..n]]
for or, after further assuming that ,
direct n = maximum [score 1 a b n | a <- [1..(n+1)]`quot`2], b <- [a..n-a+1]]
for . (Once these computations are taking hours or days the constant factor improvements shouldn’t be undervalued.)
We’ve already seen that storing lots of information about previously computed results is not helpful, but we can store the known values of and its “inverse” , the least such that . Then when testing whether scores at least it might be worth checking whether , which is the absolute minimum requirement to last at least rounds. But our scoring function is so cheap that you don’t have to do very much at all of this sort of thing before it becomes too expensive, and in practice the optimal amount of checking seems to be zero.
Unless we can somehow do the checking without actually doing the checking? If we’re currently trying to check whether then we’d better have the smallest and largest differences differing by at least . That is the point at which I throw my hands up and switch to the interval rather than , which means you have to keep an eye out for off by one errors when comparing earlier results with what comes next. We already have
The largest difference at the beginning is , so we additionally require that either or . Rearranging, either or . This takes us to
newRecord n = maximum [score 0 a b n | b <- [..n], a <- h b]
where
gk = firstTimeWeCanGet ! (recordUpTo ! (n-1)) -- see full listing
h b = let top = min b (n-b) in
if n-gk < b-n+gk && n-gk < top
then [0..n-gk] ++ [b-n+gk..top]
else [0..top]
with a cheap improvement from doing the first round of the iteration by hand.
newRecord n = maximum [score a (b-a) (n-b) n | b <- [0..n], a <- f b] + 1
There’s at least one more idea worth considering. We haven’t used the fourth symmetry, of multiplying by non-zero constants. The practical use would be to divide out any common factors of , but doing the gcd each time is too expensive. I had greater hopes for checking that at least one of them is odd, which should save a quarter of the work half of the time, for a total saving of about 12%, but it doesn’t seem to help, even if you enforce it by never generating the bad pairs , the same way we are able to do for the size consideration in the current listing.
This will produce the pairs up to in 100 minutes on my 3.6GHz machine. It hasn’t found the next pair yet after a few days of searching. It’s possible that some of the optimisation considerations (especially for what should be the cheap cases like eliminating ) change for large as naive scoring becomes more expensive, but my haphazard trials have had inconsistent results, both algorithmically and in terms of apparently making the compiler stop favouring certain optimisations.
(0,0) (1,1) (2,1) (3,1) (4,1) (5,3) (6,3) (7,4) (8,9) (9,11) (10,13) (11,31) (12,37) (13,44) (14,105) (15,125) (16,149) (17,355) (18,423) (19,504) (20,1201) (21,1431) (22,1705) (23,4063) (24,4841) (25,5768) (26,13745) (27,16377) (28,19513)]]>
You might also be interested in a scan of my undergraduate lecture notes on the same topic.
]]>I’ve previously written about the Namer-Claimer game. I can now prove that the length of the game is with optimal play from each side, matching the greedy lower bound. The upper bound makes use of randomness, but in a very controlled way. Analysing a truly random strategy still seems like it will be very difficult.
The proof brings up a surprising connection to the Ramsey theory of Hilbert cubes.
]]>In Edge decompositions of graphs with high minimum degree, Daniela Kühn, Allan Lo, Deryk Osthus and I proved that the edge sets of sufficiently dense graphs satisfying necessary divisibility conditions could be partitioned into copies of an arbitrary graph . This result has since been generalised to other settings by various authors. In this paper we present a simplified account of the latest version of the argument, specialised to the case where is a triangle.
]]>I was explaining this problem to a colleague and they asked whether this graph was connected (it is) and whether that was still true if we restricted to rational coordinates. It turns out this was addressed by Kiran B.Chilakamarri in 1988, and the answer is the rational unit distance graph is connected from dimension onwards.
To see that is not connected, consider a general unit vector where is coprime to . Then .
Claim. is divisible by at most once.
Proof. Squares mod are either , or . If is divisible by then one of the is odd, hence squares to mod . But then cannot be divisible by , which is a contradiction.
So the entries of in their reduced form do not contain any ‘s in their denominator, and so the same must hold for all sums of unit vectors. Hence we can’t express, say, as a sum of unit vectors, and is not connected to .
Connectedness in dimension (hence also later) uses Lagrange’s theorem on the sums of four squares. We’ll show that can be expressed as a sum of unit vectors. By Lagrange’s theorem, write . Then
hence
is a sum of unit vectors.
]]>A hands on proof of the Erdős–Ko–Rado theorem use a tool called compression. A family is left-compressed if for every , any set obtained from by deleting an element and replacing it by a smaller one is also in . You can show by repeatedly applying a certain compression operator that for every intersecting family there is a left-compressed intersecting family of the same size. Thus it suffices to prove the Erdős–Ko–Rado theorem for left-compressed families, which is easy to do by induction.
There is a strong stability result for large intersecting families. The Hilton–Milner family consists of all sets that contain and at least one element of , together with itself. This is an intersecting family, and in fact is the largest intersecting family not contained in a star. The Hilton–Milner family has size , so any family that gets anything like close to the Erdős–Ko–Rado bound must be a subset of a star.
As part of an alternative proof of the Hilton–Milner theorem, Peter Borg partially answered the following question.
Let be an intersecting family and let . Let . For which is ?
Borg used that fact that this is true for to reprove the Hilton–Milner theorem. In Maximum hitting for sufficiently large I completed the classification of for which this is true for large . The proof used the apparently new observation that, for , every maximal left-compressed intersecting family in corresponds to a unique maximal left-compressed intersecting family of . In particular, the number of maximal left-compressed intersecting families for is independent of . For there are (OEIS) such families respectively. In the rest of this post I’ll explain how I obtained these numbers.
We want to count maximal left-compressed intersecting families of . The maximal part is easy: the only way to get two disjoint sets of size from is to take a set and its complement, so we must simply choose one set from each complementary pair. To make sure the family we generate in this way is left-compressed we must also ensure that whenever we choose a set we must also choose every set with , where means “ can be obtained from by a sequence of compressions”. The compression order has the following properties.
Here’s one concrete algorithm.
The following is a fairly direct translation of this algorithm into Haskell that makes no attempt to store the families generated and just counts the number of possibilities. A source file with the necessary import’s and the choose function is attached to the end of this post.
r = 5 simpleOptions = [a | a <- choose r [1..(2*r-1)], not [dollar-sign] a `simpleLeftOf` (simpleComplement a)] simpleLeftOf xs ys = all id [dollar-sign] zipWith (<=) xs ys simpleComplement a = [1..(2*r)] \\ a simpleCount [] = 1 simpleCount (a:as) = simpleCount take + simpleCount leave where -- take a -- all pairs with b < a or b^c < a are forced -- second case never happens as b^c has 2r but a doesn't take = [b | b <- as, not [dollar-sign] b `simpleLeftOf` a] -- leave a, and so take a^c -- all pairs with b < a^c or b^c < a^c (equivalently, a < b) are forced c = simpleComplement a leave = [b | b <- as, not (b `simpleLeftOf` c || a `simpleLeftOf` b)]
This will compute the number of maximal left-compressed intersecting families for in a fraction of a second. For it would probably find the answer in less than a month. I obtained the value for in a couple of days on a single core by using a better representation of the sets in our family.
The dream is to pack all of the elements of our list into a single machine word and perform each comparison in a small number of instructions. For example, we could encode an element of by writing each element as binary digits then concatenating them in increasing order to obtain a 24 bit word. But comparing two such words as integers compares the corresponding sets lexicographically rather than pointwise. Edward Crane suggested that as the lists are so short and the elements are so small we can afford to be quite a lot more wasteful in our representation: we can write each element of our set in unary! The rest of this section should be considered joint work with him.
The first iteration of the idea is to write each element of as a string of 1’s followed by 0’s, then concatenate these strings to obtain a representation of our set. This representation has the great advantage that we can compare sets pointwise by comparing strings bitwise, and we can do this using very few binary operations: is contained in if and only if .
Unfortunately this representation uses 72 bits in total, so won’t fit into a 64-bit machine word. Observing that we never use and encoding by 1‘s followed by 0‘s saves only 6 bits. But we can do even better by encoding each element of the set differently. The first element is always at least 1, the second is always at least 2 and so on. Similarly, the first element is at most 7, the second at most 8 and so on. Working through the details we arrive at the following representation.
Identify each element of by an “up-and-right” path from the bottom left to the top right corner of a grid: at the th step move right if is in your set and up if it isn’t. Then if and only if the path corresponding to never goes below the path corresponding to . So we can compare sets by comparing the regions below the corresponding paths. Recording these regions can be done using 36 bits, which happily sits inside a machine word. This representation also has the helpful property that taking the complement of a set corresponds to reflecting a path about the up-and-right diagonal, so the representation of the complement of a set can be obtained by swapping certain pairs of bits followed by a bitwise NOT.
The value for was obtained using this new representation and the old algorithm, with one minor tweak. It’s a bad idea to start with a lexicographically ordered list of sets, as the early decisions will not be meaningful and not lead to much of a reduction in the length of the the lists. Optimal selection of which pair to decide at each stage is probably a complicated question. As a compromise I randomised the order of the list at the start of the process, then took the first remaining pair at each stage.
The Haskell source is here. There are a few more performance tricks to do with the exact bit representation of the sets, which I’m happy to discuss if anything is unclear.
]]>Maryam spoke about this paper at this week’s combinatorics seminar.
The problem is as follows. Let be the number of -colourings of a subset of with no monochromatic sum . What is the maximum of over all ?
One possibility is that we take to be sum-free, so that . The maximum size of a sum-free set of is around , achieved by the set of odd numbers and the interval , so .
Another possibility is to choose sum-free sets and take all colourings of such that the elements of colour are contained in . There are
such colourings, where is the number of elements in exactly of the . For example, we might take half of the to be and half to be . Then the odd numbers greater than are in every set, and the evens greater than and the odds less than are in half of the sets, so the number of colourings is around
For this matches the previous lower bound; for it is larger.
It’s easy to see that this construction cannot improve the bound for : it only provides good colourings, but as elements contributing to are in , which must be sum-free.
What about ? Now we get good colourings. We also have that
But since we have
Moreover, if is not tiny then we are some distance off this upper bound, so the only good constructions in this family come from having all the substantially agree.
How can we get matching upper bounds? If there weren’t very many maximal sum-free sets then we could say that every good colouring arises from a construction like this, and there aren’t too many such constructions to consider. This is too optimistic, but the argument can be patched up using containers.
The container method is a relatively recent addition to the combinatorial toolset. For this problem the key fact is that there is a set of subsets of such that
We now consider running the above construction with each an element of . Since the containers are not themselves sum-free, this will produce some bad colourings. But because every sum-free set is contained in some element of , every good colouring of a subset of will arise in this way. And since there are most choices for the sets the number of colourings we produce is at most a factor greater than the biggest single example arising from the construction.
This is the big idea of the paper: it reduces counting colourings to the problem of optimising this one construction. For the authors are able to solve this new problem, and so the original.
]]>
If either problem has a finite optimum then so does the other, and the optima agree.
I do understand concrete examples. Suppose we want to pack the maximum number vertex-disjoint copies of a graph into a graph . In the fractional relaxation, we want to assign each copy of a weight such that the weight of all the copies of at each vertex is at most , and the total weight is as large as possible. Formally, we want to
maximise subject to and ,
which dualises to
minimise subject to and .
That is, we want to weight vertices as cheaply as possible so that every copy of contains (fractional) vertex.
To get from the prime to the dual, all we had to was change a max to a min, swap the variables indexed by for variables indexed by and flip one inequality. This is so easy that I never get it wrong when I’m writing on paper or a board! But I thought for years that I didn’t understand linear programming duality.
(There are some features of this problem that make things particularly easy: the vectors and in the conventional statement both have all their entries equal to , and the matrix is -valued. This is very often the case for problems coming from combinatorics. It also matters that I chose not to make explicit that the inequalities should hold for every (or , as appropriate).)
Returning to the general statement, I think I’d be happier with
My real objection might be to matrix transposes and a tendency to use notation for matrix multiplication just because it’s there. In this setting a matrix is just a function that takes arguments of two different types ( and or, if you must, and ), and I’d rather label the types explicitly than rely on an arbitrary convention.
]]>