by Michael Cain
(submitted by request)
A recent comment thread headed off into a discussion of the attractions of games and puzzles that involve combinatorial search, like Wordle or Sudoku or Freecell. Here's an example of a combinatorial puzzle. My daughter brought this home from math class when she was in eighth grade (long ago).
On the way home from work I stopped at the corner convenience store to pick up four items. The checkout clerk did things on the register and told me "$7.11, please."
"That seems too much. How did you calculate that?" I asked.
"I multiplied the four prices together."
"Aren't you supposed to add the prices?"
"Oh, right." After a moment he said, "Still $7.11."
What were the prices of the four items?
She told me the math teacher was explaining a technique he called guess and check: guess at the answer and check to see if it's correct. She thought it was stupid and clearly expected me to think the same. She was surprised when I said, "Cool! There's a whole bunch of neat math in there!" We talked about problems where you had to choose from a set of possibilities and had to find the right combination to solve the problem. That you often needed to find a clever strategy so you could find the right combination in a reasonable amount of time. We played around with this particular problem some, but didn't guess the right answer before it got tiresome. (No one else in the class guessed the right answer either.)
Some years after that I was working at an applied research lab that did lunch-time technical talks. I was asked to do one that had some math, some entertainment value, and that most of the staff would be able to follow. My recollection of the talk about the 7-11 problem is reproduced below the fold.
Oh, and open thread, because why not?
A first observation: solving combinatorial problems tends to be hard because they get big in a hurry. The 7-11 problem is simple to state, but the number of combinations of prices is 7114, about 255 billion. Sometimes, though, it's possible to take simpler non-combinatorial approaches and get the answer. Suppose a, b, c, and d represent the four prices. A different way to state the problem ignoring the constraint that the answer has to be exact in dollars and cents is
minimize abs(7.11-(a+b+c+d)) + abs(7.11-(a*b*c*d))
The minimum value will be zero when the variable values satisfy the sum and product requirement. Any set of values that doesn't satisfy those requirements will have a positive value. So, haul out your handy nonlinear minimization software and give it a go. "But Mike," I hear you say, "I don't have any nonlinear minimization software handy." Chances are better than you think that you do. The Windows version of Excel has included the nonlinear minimization tool Solver for most of 30 years. Many, many years ago when I was a graduate student I spent a semester working on the software that would eventually become the nonlinear optimization part of Solver. The quality of the code I inherited was so poor it led me to spend a morning and some amount of the department's money working my way through telephone operators and secretaries until I got a particular graduate student at Case Western University on the line so I could tell him, "Set foot in Austin, Texas and you're a dead man!" and hang up. But I digress...
I have my own piece of nonlinear minimization code that I've hauled around for decades. It's relatively simple-minded but can be coerced into doing useful things if the problem is not too big. The graph below shows how it progresses from an initial guess that all the prices are $1.50. It takes a while but converges to a set of prices whose sum and product are $7.11 with an error of less than 10-13. Unfortunately, none of the individual prices are anywhere close to dollars and whole cents. I don't have a copy of Excel on Windows any more, but these days Libre Office has a nonlinear optimization tool that uses a different algorithm than my code (or Excel's). It behaves exactly the same way as my code on this problem: $7.11 with high precision but fails the dollar-and-cents requirement.
Things are actually worse than that. The graph below is an estimate of the cumulative distribution function for the largest price in the solution based on 200 runs from different initial guesses using my software. The values are spread out over a wide range. They are spread out uniformly over a wide range. No clues here about where to look for the dollars-and-cents answer.
Back to combinatorial search we go.
When I was in graduate school I was notorious for solving homework-sized problems – also known as "toy" problems – by sheer brute force with a computer. From my point of view, it was a matter of how best to spend my time. Despite the reputation, I didn't actually use brute force all that often, but there were times when spending 15 minutes to write a program and five minutes to run it was a very attractive alternative to spending two hours trying to find and apply the particular bit of cleverness the professor wanted. Also, there's a certain perverse pleasure in being able to say, "The trickery finds only one answer to the problem, but not the other equally good answers that exist."
Below is the simplest brute force approach to this problem in pseudo-code. "But Mike," I hear you say again, "I was told there would be no code." Probably by someone who also told you there would be no math. Yet here you are, reading a post that's all math. This code generates every one of the 255 billion possible combinations of the four prices and tests to see if they come close to meeting the sum and product requirement. It's necessary to test for close rather than if the conditions are met exactly because computers (generally) only do approximate arithmetic with non-integer values. The goal here is to generate a small set of candidate solutions that can be checked quickly on a calculator. This code finds the correct answer: $1.20, $1.25, $1.50, and $3.16.
for (a=0.01; a<7.12; a+=0.01): for (b=0.01; b<7.12; b+=0.01): for (c=0.01; c<7.12; c+=0.01): for (d=0.1; d<7.12; d+=0.01): sum = a + b + c + d if (abs(sum-7.11) > 1e-6): next prod = a * b * c * d if (abs(prod-7.11) > 1e-6): next print "Candidate solution", a, b, c, d
It wouldn't have been much of a technical talk if it ended there. Back in the day, a challenge that was frequently issued in certain academic circles was "My code can find the answers faster than your code." At one time, I had the world's fastest code for solving a particular type of network optimization problem. I didn't try to claim the title because it was basically someone else's code. The people at the Naval Postgraduate Institute at the time were good at theory but bad at coding. Simply cleaning up a copy of their code increased the speed by 20%. The brute force code above found the answer in four minutes 50 seconds. Traditionally, one describes the infrastructure used. The actual code I used for timing for this post was written in C, compiled with "gcc -O3", running on an AMD Ryzen 5 processor clocked at 2.5 GHz, timed with the standard Linux time command. To suggest how long ago I did this 7-11 talk originally, the Linux machine I had beside my desk then had to run overnight to get the answer. And yes, the code from that talk all those years ago was still in my personal archive.
How can we speed things up? The general approach to solving combinatorial search problems faster is to eliminate as many of the combinations as possible by some sort of logic. For example, one that I won't use in this post is since every price has to be one cent or greater the largest price has to be $7.08 or smaller, not $7.11. That doesn't seem like a whole lot but it would eliminate four billion naive combinations.
One of the first things to notice is looping over the value of d is a waste of time. Once a, b, and c are set, d can be calculated directly from the sum constraint. The loop on c can be cut short once it produces a calculated value for d less than 0.01 (one cent). 0.005 is used to allow for the way the computer handles non-integer values. The number of combinations where the code gets to the multiplication test is reduced to 59 million. This code finds the correct answer in 0.102 seconds.
for (a=0.01; a<7.12; a+=0.01): for (b=0.01; b<7.12; b+=0.01): for (c=0.01; c<7.12; c+=0.01): d = 7.11 - (a + b + c) if (d < 0.005): last prod = a * b * c * d if (abs(prod-7.11) > 1e-6): next print "Candidate solution", a, b, c, d
If you could see the output, you would immediately notice the code actually finds the answer 24 times, once for each of the permutations of the four values. Another simplification is to limit the loops so only the single permutation with prices in order from smallest to largest is found, shown below. Note also that since b greater than or equal to a is now guaranteed then if a is 3.56 or greater a+b is greater than 7.11 and the overall sum requirement can't be satisfied. We can safely cut the upper bounds on the loops in half. At this point we have reduced the number of combinations considered from the original 255 billion to just over 2.5 million. This version of the code finds the solution in 0.012 seconds.
for (a=0.01; a<3.56; a+=0.01): for (b=a; b<3.56; b+=0.01): for (c=b; c<3.56; c+=0.01): d = 7.11 - (a + b + c) if (d < c - 0.005): last prod = a * b * c * d if (abs(prod-7.11) > 1e-6): next print "Candidate solution", a, b, c, d
As a general rule, computers perform operations on integers faster than on floating point (non-integer) values. We can restate the problem in terms of cents. That lacks the catchy 7-11 tagline in the original statement because the sum of the cents is 711 and the product is 711,000,000. The changes in the code necessary to find a solution in cents are shown below. A solution, if found, is no longer a mere candidate. Because the use of integers is exact, it's guaranteed to be a solution. This finds the solution in 0.006 seconds – six milliseconds. This is fast enough to ask different sorts of questions and get answers in a reasonable time. $6.44 is the smallest value between $1.00 and $10.00 that has a "four prices with the same sum and product" solution. $6.75 is the smallest value in that range that has two different solutions. There are multiple values under $10.00 that have five solutions. There are many prices in that range that don't have a solution. But is this as fast as we can go? (Hint: Betteridge's Law applies.)
for (a=1; a<356; a+=1): for (b=a; b<356; b+=1): for (c=b; c<356; c+=1): d = 711 - (a + b + c) if (d < c): last prod = a * b * c * d if (prod != 711000000): next print "Solution", a, b, c, d
In the integer version of the problem, we notice that all four prices satisfy two conditions: (1) they are integers in the range from 1 to 711, inclusive, and (2) they are factors of 711,000,000. Finding the list of those factors isn't much work. This is a much smaller list of numbers than any of the above versions use, so the number of combinations is enormously smaller. For this particular problem, there are 61 factors in the list. The number of combinations considered is reduced from the original 255 billion to less than 17,000. This is on the scale of problems that were solved by hand historically. See, for example, the orbital mechanics calculations done by Johannes Kepler's graduate students minions apprentices. The computer gets the solution in 46 microseconds.
# Calculate list of factors, values strictly increasing n = 0 for (i=1; i<712; i++): result = i * (711000000 / i) if (result == 711000000): factor[n++] = i # Search over combinations of factors for (a=0; a<n; a++): for (b=a; b<n; b++): for (c=b; c<n; c++): d = 711 - (factor[a] + factor[b] + factor[c]) if (d < factor[c]): last prod = factor[a] * factor[b] * factor[c] * d if (prod != 711000000): next print "Solution", factor[a], factor[b], factor[c], d
If we actually had to work the problem out by hand we wouldn't simply step through 17,000 combinations. We would take factorization one step farther and work out the prime factorization of 711,000,000 (and you thought learning prime factorization was a waste of time):
26 × 56 × 32 × 79
Then we might proceed like this:
- 79 can only be a factor in one of the four prices, Call that price a. a must be one of 79, 158, 237, 316, 395, 474, or 632.
- Suppose we guess 316 (22 × 79). Subtract that from 711 and reduce the problem to finding three prices b, c, and d that sum to 395 using the remaining prime factors (24 × 32 × 56).
- Six of the remaining factors are five. If those were distributed two each to b, c, and d, then each would be divisible by 25 and b+c+d would be divisible by 25. But b+c+d is 395, not divisible by 25, so the two-per-price distribution of the fives can't work. Then one of the three prices must have at least three prime factors of five, so is a multiple of 125. Assign that price to b, and there are only three possible values: 125, 250, or 375.
- For each of the three guesses there is a sub-problem with two prices. If we guess 375 for b then we are looking for c and d such that c+d is 20 and the remaining prime factors are 24 × 3 × 53. We can quickly show that there's no way to split those prime factors between two numbers so that those numbers add to 20. There is no solution to the two-price sub-problem, so there is no solution to the original problem where a is 316 and b is 375.
I've never worked through the whole thing (and am not going to do it here, much to everyone's relief, no doubt) but it can probably be done in an hour or two. The kind of logic done with the prime factors is something lots of people do pretty well – particularly people who work on math puzzles – but it's more difficult to code up for the computer compared to simply listing off several thousand combinations. There are always trade-offs, of course. If the problems are big enough, and have to be solved often enough, and fast enough, then the added programming complexity may be not just desirable but necessary.
Why does anyone reading this post care, aside from the possible entertainment value? Because combinatorial search and other optimization algorithms are everywhere these days.
Consider the example of a UPS truck leaving a facility in the morning. The label on every package has been scanned so the destination addresses are known. There are databases of all the roads in the area, updated with information about closures for repair and such. Calculating the best route for that truck to follow is a combinatorial problem. UPS developed a piece of software that can solve that problem quickly enough that every truck has a best route loaded into the truck's navigation device before it leaves. UPS says that it saves at least tens of millions of dollars on fuel and overtime every year by following the best routes.
The training process for deep neural-network machine-learning systems such as that underlying local favorite ChatGPT attempts to solve a complicated nonlinear optimization problem. Neural networks have been around for many years. They've become important today because computers are fast enough and algorithms are good enough to allow optimizing very large neural-network systems.
SpaceX has made launch to orbit and returning the booster for reuse look routine. Computers on the booster repeatedly solve a nonlinear optimization problem that flies to a specific precise location while staying within its fuel and maneuvering constraints. Interestingly, SpaceX does not use a general purpose solver because it wouldn't be fast enough. Their software development chain includes a system written at Stanford that takes a formal description of the specific type of problem to be solved and generates code for a custom solver that runs much faster than a general purpose solver will.
Autonomous cars, if/when we get them, will employ all of these tools. Trained neural networks to recognize the objects around the car. Combinatorial optimization guided by neural network systems to decide on local changes in trajectory. At a higher level, combinatorial optimization like the one at UPS to determine both the original route and a new route if there are necessary deviations.
Everyone ought to be interested in understanding at least a bit about the increasingly ubiquitous math and software that may kill you if it gets a sufficiently wrong answer.
wj: sometimes I forget that I can't post links under my formal handle, and have to use GftNC to do so. In those cases, I usually just repost. So what you have restored may be duplicates. In future, if a post is too long to do again, I'll just post a short plea here to have it rescued. So then, whatever else you find in the spam folder from me can be safely discarded.
Posted by: Girl from the North Country | January 30, 2023 at 05:27 AM
Eventually, we'll get all this sorted out and running smoothly. Eventually. Hopefully. :-)
Posted by: wj | January 30, 2023 at 08:26 AM
Sure, wj.
Right after the last bug is eradicated from Windoze.
Posted by: Snarki, child of Loki | January 30, 2023 at 01:26 PM
Typepad's version of Movable Type is not yet abandonware, but seems to be headed in that direction. No one who has messed with WordPress seriously thinks it will ever get truly sorted out. I spent part of yesterday reading people whining about Disqus.
Posted by: Michael Cain | January 30, 2023 at 01:34 PM
Right after the last bug is eradicated from Windoze.
Presumably you are not counting the possibility that they will just issue a blanket reclassification of any remaining bugs as "features"....
Posted by: wj | January 30, 2023 at 02:07 PM
Sometimes, the enthusiasm of some folks to shoot themselves in the foot is simply awesome.
Start with a tweet from Kari Lake of Arizona. It contains, as you can see, images of several voters signatures.
Arizona Secretary of State Adrian Fonte referred her to tha Arizona Attorney General.
She's a true discipline ot TFG.Posted by: wj | January 31, 2023 at 01:22 AM
I don't know the details about AZ's mail ballot system. In Colorado, and assuming the signatures in the tweet are in some fashion pairs of signatures that should match, they would get an envelope pulled out. Each election, about 2% of envelopes here fail. Many are "cured" by the election officials, either through phone calls, submission of additional paperwork, or even physical visits.
There is a court case in Colorado now to remove signatures completely from our mail ballot system. The lead plaintiff has ALS which makes it impossible for him to have a reproducible signature or travel to a polling place.
Posted by: Michael Cain | January 31, 2023 at 11:38 AM
My sense, from a distance, is that the Arizona law is intended to help prevent voter fraud. By not letting would-be fraudsters have an example signature to copy. (Naturally, preventing voter fraud is not high on Lake's list of priorities.)
The signature law in California doesn't make a whole lot of sense. Mail ballots have to have a signature on the envelope, and those get compared to the file signature before the ballot is counted. But while anyone voting in person does have to sign in at the polls, there is no process to check that signature against anything.** And, if someone were to check later, no way to identify and remove that particular ballot. So, why do we bother?
** And, by law, poll workers may not ask voters for ID. Technically, if the voter offers one when asked for their name, we aren't supposed to look -- although in practice we do look whenever we aren't certain how the name is spelled, so we can find it in the voter roll.
Posted by: wj | January 31, 2023 at 12:31 PM
And all of this because a nationaL ID card is considered blasphemy against the holy spirit ( https://en.wikipedia.org/wiki/Eternal_sin ) in the US...
(and I am talking about one that does not store vast amounts of personal data but one whose whole function is to determine ID whereever needed.)
Posted by: Hartmut | January 31, 2023 at 01:51 PM
a national ID card is considered blasphemy against the holy spirit
Including by people who don't seem to have a problem with the state departments which issue driver's licenses also issuing IDs for those who are not drivers. Which they do because everybody needs an ID at some point, so they are fulfilling a real need.
Consistency doesn't seem to be a core competency with these folks.
Posted by: wj | January 31, 2023 at 02:04 PM
I hear that some consider it even un-American to have a passport and that some (GOP) candidates got actually attacked for this sin by their rivals.
Sounds outright Chinese to me (there's nothing outside China worth any attention for a proper Chinese person)
Posted by: Hartmut | January 31, 2023 at 03:52 PM
Sounds outright Chinese to me (there's nothing outside China worth any attention for a proper Chinese person)
More, I think, along the lines of "How dare anyone outside the US expect us to carry identity documents?!?!? And how dare the US government ask for proof that we are 'Real Americans' (TM) when we come home?" (Unsurprisingly, they are unaware of the difference between a passport and a visa.)
Posted by: wj | January 31, 2023 at 08:54 PM