## Why didn’t Euler conjecture the prime number theorem?

It shouldn’t have been too hard, based on what he knew. Let us take a look.

Euler knew that the sum of reciprocal primes diverged, and he even knew the growth rate of $\sum_{p\le x} \frac{1}{p}$, where the sum is taken only over primes $p$. His method of discovering this was rather elegant, even if his lack of respect for divergent series may look horrifying to a modern mathematician. It was already known earlier that the harmonic series grows like $\log x$; that is,

$\sum_{1\le n\le x} \frac{1}{n}\approx \log x$,

or more precisely,

$\lim_{x\to\infty} \left(\sum_{1\le n\le x} \frac{1}{n}-\log x\right)=\gamma\approx 0.577$.

Euler had earlier worked out the famous Euler product for the zeta function:

$\sum_{n=1}^\infty \frac{1}{n^s} = \prod_p \left(1-\frac{1}{p^s}\right)^{-1}$

for $\Re(s)>1$. (This formula has a delightful interpretation as an analytic formulation of the fundamental theorem of arithmetic.) Since neither side of the above formula makes sense when $s=1$, Euler did the most logical thing possible and set $s=1$ anyway, to obtain

$1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\cdots = \left(1-\frac{1}{2}\right)^{-1}\left(1-\frac{1}{3}\right)^{-1}\left(1-\frac{1}{5}\right)^{-1}\left(1-\frac{1}{7}\right)^{-1}\cdots$.

Taking logarithms, he then obtained

$\log\left(1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\cdots\right)=-\log\left(1-\frac{1}{2}\right)-\log\left(1-\frac{1}{3}\right)-\log\left(1-\frac{1}{5}\right)-\log\left(1-\frac{1}{7}\right)-\cdots$,

and expanding out all the logarithmic terms on the right as power series and regrouping, the right side becomes

$\left(\frac{1}{2}+\frac{1}{3}+\frac{1}{5}+\frac{1}{7}+\cdots\right) + \frac{1}{2}\left(\frac{1}{2^2}+\frac{1}{3^2}+\frac{1}{5^2}+\frac{1}{7^2}+\cdots\right) + \frac{1}{3}\left(\frac{1}{2^3}+\frac{1}{3^3}+\frac{1}{5^3}+\frac{1}{7^3}+\cdots\right)+\cdots$.

Since all the terms except the first add up to something finite, they will not make a significant contribution in the limit, so Euler ignored them. (He could also have truncated after the first term when expanding the logarithms as power series, which would have had the same effect.) Hence, in the limit,

$\frac{1}{2}+\frac{1}{3}+\frac{1}{5}+\frac{1}{7}+\cdots = \log\left(1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\cdots\right) = \log\log\infty$.

Naturally, that should be interpreted as

$\sum_{p\le x} \frac{1}{p}\approx \log\log x$.

One can get rid of all the divergent series and weird expressions like $\log\log\infty$ by taking limits and being careful with error terms like a modern mathematician would do. But that wasn’t a necessary part of the culture in the 18th century.

At this point, Euler stopped, but I don’t understand why. It is now possible to make an estimate for the number of primes up to $x$, using a quick and easy computation. Since I’m a rebellious mathematician, I did the following computation on the front of an envelope.

Let us try to estimate the number of primes between $x$ and $kx$, for some $k>1$, and let us write $\pi(x)$ for the number of primes up to $x$. Then

$\sum_{x\le p\le kx} \frac{1}{p}\approx \log\log kx - \log\log x\approx \frac{\pi(kx)-\pi(x)}{x}$,

where the explanation for the last term is that, if $k$ is not much bigger than 1, then each summand on the left is roughly $\frac{1}{x}$. The goal is now to determine $\pi(kx)-\pi(x)$. We have

$\pi(kx)-\pi(x)\approx x(\log\log kx-\log\log x)=x\log\left(1+\frac{\log k}{\log x}\right)$.

Expanding the logarithm using a power series and truncating after the first term, we obtain

$\pi(kx)-\pi(x)\approx \frac{x\log k}{\log x}$.

Now, choosing $k=1+\frac{1}{x}$ and again taking the first term of the power series for the logarithm (that seems to be the name of the game today), we obtain

$\pi(x+1)-\pi(x)\approx \frac{1}{\log x}$.

Now, summing over $x$ gives

$\sum_{x=1}^{n-1} (\pi(x+1)-\pi(x))=\pi(n)\approx \sum_{x=2}^{n-1}\frac{1}{\log x}\approx\int_2^n \frac{dt}{\log t}$

(where I wrote 2 instead of 1 as the lower bound for the sum and integral to make them converge). And that’s one version of the prime number theorem. Nothing here would have caused Euler to break a sweat. But yet Gauss seems to have been the first one to conjecture the prime number theorem, and even he apparently did so on the basis of actually working out tables of primes and counting.

Posted in Uncategorized | 3 Comments

## The Wallis Product

I had seen the Wallis product, but I did not know a derivation of it until a few weeks ago, when I discovered this gem when reading the wonderful book Sources in the Development of Mathematics by Ranjan Roy. I’m storing this example to use next time I teach calculus, as it requires nothing more than integration by parts.

Like in the case of many other such things in mathematics, I don’t know what the Wallis product actually is; I can only discover it by going through a derivation. So, I won’t give away the answer until we get there. But the goal is to write down an infinite product for $\pi$.

In order to do this, we evaluate the integral $I_n = \int_0^\pi \sin^n x\, dx$. The first few are easy:

$I_0 = \int_0^\pi dx=\pi$

$I_1 = \int_0^\pi \sin x\, dx = 2$.

I can keep working out more of these, but now it’s time to tackle the general case, using integration by parts. We assume $n\ge 2$, and we compute $\int_0^\pi u\, dv$, where $u=\sin^{n-1} x$ and $v=\sin x\, dx$, so that

$I_n = \int_0^\pi \sin^n x\, dx = \left[-\sin^{n-1} x\cos x\right]_0^\pi +\int_0^\pi (n-1)\sin^{n-2} x\cos^2 x\, dx$ $= \int_0^\pi (n-1)\sin^{n-2}x(1-\sin^2 x)\, dx = (n-1)(I_{n-2}-I_n)$.

Solving for $I_n$, we get $I_n = \frac{n-1}{n}I_{n-2}$. Now it’s easy to compute more of the $I_n$‘s:

$I_2 = \pi\times\frac{1}{2}$

$I_3 = 2\times\frac{2}{3}$

$I_4 = \pi\times\frac{1}{2}\times\frac{3}{4}$

$I_5 = 2\times\frac{2}{3}\times\frac{4}{5}$,

and in general

$I_{2n} = \pi\times\frac{1}{2}\times\frac{3}{4}\times\cdots\times\frac{2n-1}{2n}$

$I_{2n+1} = 2\times\frac{2}{3}\times\frac{4}{5}\times\cdots\times\frac{2n}{2n+1}$.

Now, note that for any $n$, $I_{n-1}>I_n>I_{n+1}$, since $\sin^{n-1} x\ge \sin^n x\ge\sin^{n+1} x$ whenever $0\le x\le\pi$. Furthermore, since $\frac{I_{n+1}}{I_{n-1}}=\frac{n}{n+1}$, this means that eventually $I_{2n-1}$ and $I_{2n}$ get very close together. So

$\pi\times\frac{1}{2}\times\frac{3}{4}\times\cdots\times\frac{2n-1}{2n}\approx 2\times\frac{2}{3}\times\frac{4}{5}\times\cdots\times\frac{2n}{2n+1}$,

or

$\frac{\pi}{4}\approx \frac{2}{3}\times\frac{4}{3}\times\frac{4}{5}\times\frac{6}{5}\times\frac{6}{7}\times\cdots\times\frac{2n}{2n+1}$.

(If we wanted, we could be more precise, and give two inequalities for $\frac{\pi}{4}$, but it doesn’t add much value, in my opinion, to do that here.)

Taking the limit, we get

$\frac{\pi}{4}= \frac{2}{3}\times\frac{4}{3}\times\frac{4}{5}\times\frac{6}{5}\times\frac{6}{7}\cdots$.

This is the Wallis product.

## One interesting thing: Counting skew ternary trees

On Monday, I attended Don Knuth’s annual Christmas Tree Lecture, where he talks about the most interesting thing he has learned about trees in the past year. This year, he talked about ternary trees and planar graphs. Among other things, he showed us a beautiful theorem with a correspondingly beautiful proof, which I shall reproduce here.

A (nonempty) ternary tree is a tree consisting of a root node, and (possibly) several other nodes, so that each node has at most three children (denoted left, middle, and right). We draw ternary trees like this:

In this ternary tree, A is the root node, and it has a left child (B) and a right child (C), but no middle child. B has a left child (D) and a middle child (E), but no right child. C has a left child (F) and a middle child (G), but no right child. D has only a right child (H). Some nodes (E, F, G, and H) have no children.

If we have a ternary tree, we can assign ranks to the vertices, as follows: the root gets rank 0, and if a node has rank $n$, then a left child (if there is one) will have rank $n-1$, a middle child will have rank $n$, and a right child will have rank $n+1$. Hence, in the above tree, A and F have rank 0; B, E, and H have rank $-1$; D has rank $-2$; C and G have rank 1.

We call a ternary tree a skew ternary tree if all the ranks are nonnegative. The question we’ll answer is, what proportion of ternary trees with $n$ nodes are skew?

Whenever we have such a question and, there is only one correct way to start: calculate a few terms and look up the sequence on the Online Encyclopedia of Integer Sequences. Fortunately for me, after going to the talk, I was able to remember enough of the terms of the total number of ternary trees to find it without having to do any calculation: it is sequence A001764, and it begins 1, 1, 3, 12, 55, 273, 1428, 7752. It tells me that there are $\frac{1}{2n+1}\binom{3n}{n}$ of them. This isn’t too surprising, since the famous Catalan numbers $\frac{1}{n+1}\binom{2n}{n}$ enumerate binary trees (see Richard Stanley’s Enumerative Combinatorics Volume 2, Exercise 6.19(c)).

Unfortunately, the skew trees are not represented on OEIS, but since I remember the theorem, I can calculate them easily: the sequence begins 1, 1, 2, 6, 22, 91, 408, 1938. The theorem is as follows:

Theorem. The ratio of the number of skew ternary trees with $n$ nodes to the number of total ternary trees with $n$ nodes is $\frac{2}{n+1}$.

I think this is a rather remarkable theorem, because I don’t see any a priori reason to expect any nice relationship between these two numbers. On the other hand, perhaps it’s not so different from the classical Catalan situation: there are $\binom{2n}{n}$ ways to get from $(0,0)$ to $(n,n)$ by taking unit steps either up or to the right, and $\binom{1}{n+1}\binom{2n}{n}$ ways of doing this without every going below the diagonal line $y=x$.

Before moving on to the proof, I want to pose a question to think about while going through the proof: can a variation of this argument be used to write down a formula for the Catalan numbers? I don’t see it, but I’d love to hear it if you do!

In order to prove the theorem, we start by filling in all the potential children, so that every node has three children or phantom-children. In addition, all nodes except the root have parent nodes, so in order to make the root act more like the rest of them, let’s add a phantom-parent for the root. Let us now count the phantom nodes: each node has exactly four neighbors or phantom-neighbors, so there are a total of $4n$ edge-germs emanating from all the genuine nodes, put together. There a total of $n-1$ edges between genuine nodes, and each one counts as two edge-germs, so there are a total of $4n-(2n-2)=2n+2$ phantom edge-germs, or $2n+2$ phantom nodes.

Now, we start from the root and trace the outline of the entire tree. Since we are civilised mathematicians, of course we go counterclockwise. As we go, we write down a list of numbers, adding a new one every time we hit a genuine node. It doesn’t matter what number we start with, but Knuth chose $-2$, so I’ll follow his lead. Now, as we trace, we hit genuine nodes at various times. When we do, we either increase or decrease our running total by one. If we’ve just gone around a phantom node, then we increase the total by one, whereas if we have just come from a (different) genuine node, we decrease it by one.

Let’s see this in action with the example above. After we add the phantom nodes, we get the following diagram:

We start just before the phantom parent-node to node A with $-2$. Then we move around the phantom parent node. This increases the total by 1, to $-1$. Continuing along the outline, we next reach node B, which decreases our count to $-2$. Then we go on to node D, which decreases our count to $-3$, and so on. Here is a table with the first few values.

 A A B D D D H H H H -2 -1 -2 -3 -2 -1 -2 -1 0 1

We can also represent this table in pictorial form, just by keeping track of where we are in the running total after each step. Here is the picture for the above graph.

It’s a bit easier to keep track of what’s going on if we overlay this picture on five horizontal lines (also known as a musical staff), so we can see immediately where we are:

Now it’s easy to see that we start at $-2$ and end at $+2$ in this case. But in fact, that’s always true: we pass $2n+2$ phantom nodes and $2n-2$ edges between genuine nodes. The former decrease the count by one, and the latter increase it by 1, so we have a net $+4$ by tracing the entire tree. So, we always start at $-2$ and end at $+2$.

The next observation is that the root of the tree shouldn’t be thought of as being fixed in place. Instead, we can pick a new node to be the root and let everything dangle from there. But we should be a little bit more precise on this point: it’s not quite the root node that we want to choose, but the parent phantom node of the root. Once we have done that, if we keep all the other edges in the right order, we obtain another ternary tree. We can do this with each phantom node, and so we obtain a total of $2n+2$ different ternary trees in this way.

The next thing to work out is how the list of running totals changes when we choose a new parent phantom node. But this is not so hard: we see the same increases and decreases starting from some point, but then it must wrap around from the beginning at some point. For example, if we pick the new parent phantom node to be the right phantom child of F, we obtain the following chart (marked with the red rectangle):

So far, we haven’t said anything about what distinguishes skew trees from arbitrary ternary trees, but there is a very nice interpretation of them in terms of the counts: they are exactly the trees that never return to $-2$. With that observation in mind, the proof of the theorem is now quite easy: since each trace around the outline of the tree increases the count by 4, if we trace the outline of the tree many times, there will be several places where we encounter certain counts for the last time. These will certainly be before phantom nodes, rather than genuine nodes, since genuine nodes decrease the count. Those phantom nodes are exactly the phantom nodes that lead to skew ternary trees if we hang the tree from them. Since the count increases by 4 when we trace the outline once, there will be four such phantom nodes in each trace, out of a total of $2n+2$. Hence, the proportion of skew trees among any one such family is $\frac{4}{2n+2}=\frac{2}{n+1}$, and so the proportion of all ternary trees which are skew is also $\frac{2}{n+1}$. $\blacksquare$

So, that’s the proof. Can this method be used to come up with a formula for the Catalan numbers as well? It seems as though this should be possible: in a path from $(0,0)$ to $(n,n)$ with right and up steps, there are a total of $2n$ edges and $2n+1$ vertices. We can add in the phantom neighbors to obtain $4(2n+1)-2(2n)=4n+4$ phantom nodes. The goal would be to show that it is possible to hang the picture up from each one of the phantom nodes, so that each one leads to a suitable lattice lath, and of those, exactly 4 of them don’t go below the diagonal. But I don’t know how to do this, so I’d be interested to hear suggestions.

## Exchange sacrifices galore

Last weekend, I played the BayAreaChess Thanksgiving tournament. My play was pretty bad, but my results (2 wins, 1 draw, 2 losses against all players rated a bit higher than I) were okay, and I hit a new rating high of 2050 after the tournament. This game, the most exciting but definitely not the best-played, features a total of three exchange sacrifices, followed by a sharp endgame culminating in a pawn race. All in all, it was an amusing if inaccurate game.

Simon Rubinstein-Salzedo (2037) vs. Julian Lin (2106)

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Be3 e5 7. Nb3 Be6 8. f3
Be7 9. Qd2 O-O 10. O-O-O Qc7 11. Kb1

Unfortunately, I don’t know my lines in the Najdorf as well as I should. The main move is 11. g4, but I was under the impression that white is supposed to wait until black plays Nbd7 to play g4. (At least, that was my rough heuristic.) But my move is okay as well.

11…Nbd7 12. Qf2

But I still didn’t play g4 immediately, because I wanted to stop black from playing Nb6 first.

12…Rfc8 13. g4 Qd8

Black is now planning to follow up with Rxc3. So, at this point I had to decide how to evaluate the position after the exchange sacrifice. I thought that if I could isolate some of black’s pieces on the kingside and out of trouble, then black would fail to get full compensation. So, I was willing to encourage the sacrifice.

14. g5 Nh5 15. Rg1

15…Rxc3!

Exchange sacrifice #1. I was pretty optimistic here, because the knight on h5 isn’t actively helping in the attack for the time being, and the e7 bishop also isn’t currently terribly useful.

16. bxc3 Qc7 17. Qd2

An interesting idea is to clarify matters on the kingside and give up the c-pawn with 17. Qh4. My idea was to give up the dark-squared bishop when the knight goes to b6 (since allowing it to get to a4 is unpleasant) and not worry too much when the other knight goes to f4.

17…Nb6 18. Bxb6

Unfortunately, the e3 bishop was doing double duty: stopping one knight from getting to b6, and the other from getting to f4. But I thought that allowing the knight to get to a4 would lead to very bad things for me.

18…Qxb6 19. Bd3 a5 20. Ka1 a4 21. Nc1 Qc5

This was a crafty move, and it took me a little while to understand what was going on: black threatens to play 22…Qa3.

22. Ne2

The idea is to meet 22…Qa3 with 23. c4 b5 24. Qc3

22…Bd8!

Another nice move, bringing the bishop into the game.

23. Rb1?

After 23. c4, black is a bit better, but white is in better shape than in the game.

23…Ba5

24. Rb4

Exchange sacrifice #2. I was originally planning to play 24. Rb5, but after 24…Qa3, white is simply lost. For example, 25. Rb2 Nf4, and c3 falls. So, this exchange sacrifice was forced.

24…Bxb4 25. cxb4 Qf2 26. Qe1 Qxf3

That was the best I could do: go down a pawn, but my structure is somewhat improved, and at least for the time being, black’s attack has been slowed. Objectively, though, I think my position is lost. So, the goal is to do the best I can setting up a fortress and not making any new weaknesses.

27. h4 Qe3 28. Qc1 Qb6

I was mildly surprised by this decision. Okay, black has some attacking chances still, but I expected that my opponent would prefer an endgame with a clear extra pawn. And, to be honest, I was hoping that he would trade queens, given that I have worked quite hard on endgames recently and thought I might hold, even if I didn’t really deserve to.

29. a3 Rc8 30. Rf1 g6 31. Qd2 Bg4!

Good idea: my bishop is relatively useless, being blocked in by my own pawns. (Capablanca’s rule: when you have a bishop, you should put your pawns on the opposite colored squares.)

32. Rd1 Bxe2 33. Bxe2 Nf4?

This is what I expected, and it’s the most logical move, but 33…Ng3 simply wins another pawn. After this inaccuracy, I can put up some serious resistance.

34. Bg4! Rd8 35. c4

I had been counting on the strength of this pawn structure for some time. It’s no longer easy for black to break through, since d5 never works, b5 probably doesn’t work, a4 might be weak, and white might be able to play c5 in certain circumstances. At this point, I thought I had reasonable chances to hold.

35…Qc6 36. Qc2 Ra8 37. Rc1 b6

This attempt to get the rook back to the c-file was logical enough, but it’s not clear what the point is: I just keep my queen and rook on the c-file and shuffle my king, and how is black going to do anything?

38. Kb1 Kg7 39. Ka1 Ra7 40. Kb2 Ra8

So, he figured out that that idea wasn’t going to work. But what is happening next?

41. Ka1 Rh8

Yeah, that seems like a better plan. For some time, I had been thinking about playing h5 gxh5 Bf5, but now is my last chance to do it.

42. h5 gxh5 43. Bf5 h6 44. gxh6+ Kxh6? 45. Qh2?

A better move is 45. Rg2, trapping the king on the h-file. The idea is that, while there are several things black wants to do, it’s hard to do all of them: if the queen goes to a8 to prepare Rg8 and a rook trade, then I have Qd2, hitting the d-pawn. If the queen stays on c6, then I can play Qh2 and Qg3 (sacrificing the c-pawn), setting up threats on the g-file.

45…Rg8 46. Qh4 Rg5 47. Qh2 Rg2 48. Qh4 Rg5 49. Qh2

I’m not going to complain about a draw here. But he won’t have any of that.

49…Rxf5?

Exchange sacrifice #3. This is not the best way to continue though: 49…Rg2 50. Qh4 Kg7 51. Rc3 Ne2 is completely winning for black. His move might be winning too, but it gives me more chances. One point to notice is that I’m not seriously threatening anything: he could just take some time to put his king in a safer spot before continuing, and I wouldn’t be able to do anything in the meantime.

50. exf5 Qf3 51. Qa2 Qe3 52. Rd1 Qc3+?

It’s better to win the f-pawn first with 52…Kg5 and then push the h-pawn. At the very least, it buys time for a pawn race, which, as we shall see, is very important later in the game.

53. Kb1 Nd3?

It’s probably still a draw with best play, but now white has most of the winning chances.

54. Qd2+ Qxd2 55. Rxd2 e4 56. Re2 Kg5 57. Rxe4 Kxf5 58. Rd4 Ne5

I didn’t know what was going on here during the game, but I knew what I had to do: activate my queenside pawns before black’s kingside pawns get too dangerous. First plan of action: defend my c-pawn with my king so that I can win the d- and b-pawns, and try to do so quickly enough that I can still stop the black pawns on the kingside. But will it work?

59. Kc2 Kg5 60. Kc3

So far, we have stayed in drawing territory. But now black has a decision to make: which pawn to push? Is it better to have two relatively advanced pawns, or one very advanced pawn?

60…f5?

It’s sometimes hard to figure out what to do when you have two passed pawns, but here’s a pretty good heuristic: bet everything on the one that’s the most advanced. This heuristic would lead to the correct move in the case: 60…h4, which draws. For example 61. Rxd6 h3 62. Rd1 Kf4 63. Kd4 h2 64. Kd5 Ng4 65. Rh1 Ne3+ 66. Kc6 Kg3 67. c5 bxc5 68. b5 Nc4 69. Kxc5 Ne5, and black can stop the b-pawn with the knight, so the game should end in a draw.

61. Rxd6 h4 62. Kd4?

In turn, I’m not sure what I missed, but 62. Rxb6 h3 63. Rd6 h2 64. Rd1 is pretty clearly winning for white. I guess the pressure of a long game was taking a toll on both of us by this point.

62…Ng4?

Better is 62…Nf3+!, because there are too many forks lurking.

63. c5!?

This is probably winning, and it’s very exciting to go into a pawn race and get some new queens on the board, but 63. Rxb6 h3 64. Re6 is a much easier and less stressful way to win.

63…h3 64. cxb6

Not 64. c6?? h2 65. c7 h1Q 66. c8Q Qd1+ 67. Kc4 Qc1+, winning the queen.

64…h2 65. b7 h1=Q 66. b8=Q

I evaluated this as probably winning, but I’m not sure, and anyway with limited time remaining, I didn’t know what would happen. After the expected 66…Qc1, it’s still a tough game. Fortunately for me, though, he quickly went astray.

66…Qe4+

Maybe this isn’t a horrible move objectively, but it puts the queen in a vulnerable spot, and that’s not a smart thing to do in a complicated position after 5 hours with little time remaining.

67. Kc5 Ne5??

But this loses instantly.

68. Qd8+ Kg4 69. Rd4 1-0

My opponent deserves a lot of credit for extracting a lot from an exchange sacrifice that I didn’t initially believe in (Rxc3). I knew that in the Dragon, Rxc3 always works, but I was more skeptical about its strength in the Najdorf. I guess now I know better to allow it in the future. I also need to know my lines in the Najdorf better if I’m going to continue to play open Sicilians: it’s simply irresponsible to enter into such sharp lines without being well-prepared. (In fact, all three of my white games in this tournament featured the Najdorf. It was only in the third game that I responded relatively well.) Probably I would see the most benefit in my chess ability if I were to work on my openings and learn some systems properly. But how do I do that in the most effective way possible? I could memorize a bunch of lines blindly à la Moonwalking with Einstein using memory palaces or something similar, but I have fundamental objections to using such techniques, since they cause people to remember things for all the same reasons. Anyway, I’m constantly telling my students not to memorize things. Alternatively, I could play through a bunch of games and hope that the lines stick, but usually they don’t, as there is so much to learn. So, how do you learn openings?

## One interesting thing: Average polynomial time algorithms

The most celebrated problem in theoretical computer science is undoubtedly the P vs. NP problem. Let me explain what it is asking. (If you already know this, feel free to skip the next few paragraphs.) Suppose we have a general type of problem that we wish to solve. (Think of something like determining if a number is prime, or factoring it.) There are many instances of such a problem. (For example, there are lots of numbers out there, and we could ask for each number whether or not it is prime.) Now, suppose we have a computer program that solves this type of problem: it runs for a while, and eventually tells us the answer. The amount of time it takes to run depends on the complexity of the specific problem we give it: for example, we should expect it to take longer to determine whether a very large number is prime than it would take for a small number, although there are some exceptions. (It would be very fast to say that a large number is composite if it is even, for instance.)

For the most part, then, the amount of time it takes to run the program depends on how complicated the particular instance of the problem is. We’ll measure that in terms of the number of characters needed to enter it. So, we can ask how long it takes to determine if an $n$-digit number is prime, for example. (Answer: not very long, surprisingly.)

Normally, we aren’t particularly interested in exactly how long it takes to run in terms of number of seconds, because that changes from one computer to another. Instead, we’re interested in how the length of time grows as the length of the input grows. We say that a program runs in polynomial time if there is some polynomial $P(n)$ so that the amount of time to solve any instance of the problem with an input of length $n$ is at most $P(n)$. Observe that this notion does not depend on the speed of the computer; for instance, if we switch to a computer that runs half as fast, that’s equivalent to multiplying $P$ by 2. We call a problem a P problem if it has an algorithm that solves it in polynomial time.

Now, let us consider the notion of NP, which stands for nondeterministic polynomial time. What this means is that, if I magically tell you the answer to a problem (but I might be lying to you), you can check whether I’m telling you the truth in polynomial time. A classic example of this is factoring. If I ask you to factor 51445627, you are unlikely to be able to do that quickly (although there are some very clever algorithms for doing it). But if I tell you that the prime factors are 5689 and 9043, then you can check this quickly, in polynomial time in the length of the original number. Note that all P problems are also NP problems: if you can solve the problem on your own in polynomial time, you can still solve it in polynomial time if I tell you the answer, for example by ignoring my answer.

The P vs. NP problem asks if P = NP. That is, can every problem that can be checked in polynomial time also be solved in polynomial time? Most people believe the answer to be no, but we have made little progress toward a proof.

Note that the property of being in P depends on all instances of the problem. However, computer scientists are a bit different from mathematicians: they don’t usually need to prove things; they just want to get right answers. (Probably not all of them, but that’s my view of it from a mathematician’s world.) So, might it be that there problems that can’t always be solved in polynomial time, but usually can?

This is the subject of Russell Impagliazzo’s very nice survey paper “A Personal View of Average-Case Complexity.” He rather amusingly frames the problem in terms of the famous story about a very young Gauss being asked to sum the numbers from 1 to 100, and the teacher’s subsequent attempts to humiliate Gauss in revenge for his quick solution. (Note: I don’t do that to my students. In fact, I like it when they are clever.)

In it, he includes a definition for an average polynomial time algorithm, which essentially means that if we were to choose a random instance of the problem, we would expect to be able to solve it in polynomial time. But before we can make that more precise, it is necessary to explain what we mean by a random instance of the problem. In order to do that, it makes sense at first to restrict to a given input size $n$. In order to select a random instance of a problem of size $n$, we need to put a probability distribution $\mu_n$ on the instances of the problem of length $n$. Then, intuitively, to say that an algorithm is an average polynomial time algorithm is to say that there is a polynomial $P(n)$ so that the algorithm solves nearly all instances of the problem of length $n$, chosen according to $\mu_n$, in time at most $P(n)$, and the “exceptional” cases get increasingly rare as $n$ gets larger. (The real definition is more precise, but this captures the essence of it.)

Now, one might ask whether there are any problems for which there exist average polynomial time algorithms but not polynomial time algorithms. Since we don’t know whether P = NP or not, it’s hard to be sure, but one problem that is very strongly believed not to be a P problem is the Hamiltonian path problem. Let us explain what it says.

A path in a graph is a sequence of edges so that each edge (other than the first) starts where the previous one ended. (So that we can imagine walking on it if we were on the graph.) A path is called a Hamiltonian path if it goes through each vertex once, without any repetitions.

The Hamiltonian path problem asks if, given a graph, we can determine whether there is a Hamiltonian path. This problem is definitely an NP problem: if someone tells us that there is a Hamiltonian path and tells us what it is, then we can quickly check whether it actually is. (However, if someone tells us that there is no Hamiltonian path, it is not clear how we would verify that quickly, and indeed we expect not to be able to do so.)

It is widely believed that the Hamiltonian path problem is not a P problem; in fact, if it were, then we would also have P = NP, since it is possible to convert any NP problem quickly into a Hamiltonian path problem.

So, we believe that we can’t solve the Hamiltonian path problem quickly in general. But what about for most cases?

In order for that question to make sense, we need to put a probability distribution on graphs with $n$ vertices, so that we can sample them. One way of doing that is with the Erdős–Rényi model that I discussed last week: we pick a random graph by saying that any two of its vertices will be connected with probability $p$, independently of all the other pairs.

So, let’s fix a $p$ and let $n$ vary. Then, it turns out that there is an algorithm that can determine whether a graph has a Hamiltonian path in almost polynomial time. The algorithm is given in detail in Yuri Gurevich and Saharon Shelah’s paper “Expected Computation Time for Hamiltonian Path Problem,” but I’ll sketch the basic idea, which is very simple. (What they do is somewhat more subtle, but here’s my personalized version of it, which probably doesn’t actually work very well.) It runs in three steps:

1. Choose a potential starting point and a potential ending point. (There are only a polynomial number of pairs to choose, so we can run through all such pairs in polynomial time.) Try to make a Hamiltonian path by making a path from the starting point that’s as long as possible, just by trying to add one vertex at a time. Once we get into trouble, do the same thing starting with the ending point. Try to make minor modifications to splice these two pieces together.
2. If that doesn’t work, try to show there is no Hamiltonian path. To do this, identify troublesome vertices: the ones that have relatively small degree. See if there is a contradiction involving these problem vertices.
3. If that doesn’t work, simply check every possible path and see if it gives a Hamiltonian path.

Steps 1 and 2 can be done in polynomial time; it is only step 3 that can’t. But it turns out that we won’t reach step 3 very often, because most of the time the algorithm will have finished after step 1 or 2.

Okay, so it’s an example, although I don’t find it particularly satisfying: as $n$ gets large, almost all graphs chosen in this way have Hamiltonian paths, since they have so many edges! I’d be very likely to find one in Step 1, which just tries things at random. Thus, I could make an algorithm that says there is always a Hamiltonian path, and it will run very quickly and only very rarely be wrong.

So, I am curious to know whether there are more interesting examples of problems with average polynomial time algorithms but that are not expected to have worse-case polynomial time algorithms.

## One interesting thing: Erdős–Rényi random graphs

I’ve been thinking a lot about random graphs this week, so it is appropriate for my post this week to be about them. One of the most important models of random graphs is the Erdős–Rényi model. Here is the construction: we fix some parameter $p$ with $0, and we choose some positive integer $n$. Then, we draw $n$ vertices. Between each pair of vertices, we put an edge there with probability $p$, independently of all the other pairs. I’ve been thinking about problems relating to evolutionary dynamics on such graphs, but I won’t describe that today. (Maybe I will at some point, when I understand them better.)

Instead, I just want to talk about the graphs themselves, since there are a lot of questions one can ask about the structure of them. What do they look like? In particular, are they likely to be connected?

The answer is that it depends on how large $p$ is relative to $n$. There is a phase change at $p=\log(n)/n$. In fact, we have the following theorem (which is not quite the best possible):

Theorem (Erdős–Rényi). For any $\varepsilon>0$, we have the following:

1. If $p>(1+\varepsilon)\frac{\log(n)}{n}$, then the probability that the graph is connected goes to 1, as $n$ goes to infinity.
2. If $p<(1-\varepsilon)\frac{\log(n)}{n}$, then the probability that the graph is connected goes to 0, as $n$ goes to infinity.

I’m not going to give a proof here, but a proof can be found on this MathOverflow page.

But what happens when the graph is disconnected? To what extent does it fail to be connected? There is a very nice answer to this question as well:

Theorem. Suppose that $c=np$ is a constant. Then:

1. If $c<1$, then with high probability all the connected components are small: they have size $O(\log(n))$.
2. If $c>1$, then with high probability there is exactly one large connected component, of size $\beta(c) n+o(n)$, for some number $\beta(c)>0$, and the rest of the connected components have size $O(\log(n))$.

The large component when $c>1$ is called, suitably, the Giant Component. A proof can be find here.

There is a lot more than one can say about the Erdős–Rényi model for random graphs, but this will be enough for now.

Posted in Uncategorized | 1 Comment

## One Interesting Thing: The preponderance of 2-groups among finite groups

This is a preliminary discussion of my knowledge about 2-groups. There are still many questions I have, and I would appreciate any insights.

When we first learn about group theory, we learn some theorems about how to decompose finite groups, most notably the Sylow theorems, which tell us (roughly) how to decompose groups based on their maximal $p$-power subgroups. So, we might expect that for random finite groups, we can obtain a lot of information about them by analyzing what happens at each prime factor of the order.

And indeed, groups that appear “in nature” tend to have several prime factors, and their Sylow subgroups are typically interesting objects in their own right.  However, if we count groups not by their occurrences in nature but simply by their orders, then we see something very different: most finite groups are 2-groups: their order is a power of 2, so there is no information to be obtained by looking at other primes.

I was surprised at first to learn that the vast majority of groups are 2-groups: for example, there are 49,910,529,484 group of order up to 2000, and of those, 49,487,365,422, or more than 99%, have order $1024=2^{10}$. And the proportion of groups with 2-power order will only increase to 1 as we go further.

I don’t really understand why this is true (I have only barely skimmed the relevant papers, such as Eick and O’Brien’s “Enumerating $p$-groups” and Besche, Eick, and O’Brien’s “The groups of order at most 2000″), but here is my vague intuition. The Jordan-Hölder theorem tells us that we can assemble groups out of simple groups. Now, it is likely possible to glue two large simple groups together in quite a few ways, whereas there are fewer ways of assembling two small simple groups together. However, more relevant than the size of the pieces is the number of pieces: a Jordan-Hölder series for a group of order 1024 will contain ten factors, each isomorphic to $\mathbb{Z}/2\mathbb{Z}$. As we all learned when we were very young and trying to assemble jigsaw puzzles, there are lots of ways of putting together many small pieces, but there are not so many ways of putting together a few large pieces. And this is just as true of groups as it is of jigsaw puzzles.

But there’s a problem with this analysis: there are way more groups with order 1024 than there are with $1536 = 2^9\times 3$, even though they contain the same number of factors in their composition series. In fact, there are 408,641,062 of order 1536, which is only roughly 40 times as many as there are groups of order $512=2^9$ (as opposed to nearly 5000 times for 1024). I would like to understand this discrepancy, but I’m not there yet.

One obstacle to understanding this point for me is that there isn’t such a discrepancy for groups of order $p^nq$ versus $p^{n+1}$ when $n$ is small: for example, when $p=2$ and $n=3$, there are 14 groups of order 16, 15 of order 24, 14 of order 40, 13 of order 56, and so forth. Similarly, when $n=4$, there are 51 groups of order 32, 52 of order 48, 52 of order 80, and 43 of order 112. Since I don’t know any exciting examples of groups of order $2^n$ for $n$ large, it’s hard for me to gain intuition about what’s going on here.

There are lots of things to look into in the future, such as reading the papers by Besche and Eick. Perhaps I’ll report on them in the future.