The 3n+1 Problem (A Probabilistic Approach)

2 Conversations

Abstract

The Poisson probability distribution is used to model the number of cycles in the generalized Collatz problem, that is, the pn+q problem (here, p is set to 3). First, "interrelated" cycles are defined and used as a criterion in counting the cycles for a given q value. Initially, archived data in the mathematical literature (giving the known 3n+q cycles) is analyzed. For large samples, the Poisson probability distribution gives a poor fit of the data (there are too many cycles for the large x values). "Associated" cycles are defined and used as an additional criterion in counting cycles; this improves the data fit substantially. Some theory and empirical results are given in an attempt to explain the origin of this distribution of cycle counts. Also, degrees of freedom in probability distributions involving the difference between the number of odd and even elements in a cycle are shown to be a possible explanation for the apparent existence of a 3n+q cycle for every q value.


Let n be a natural number. If n is odd, let the next number in a sequence be 3n+1, otherwise let the next number be n/2. The Collatz conjecture states that starting with any initial n value, the sequence always ends in the cycle {4, 2, 1}. A standard technique in mathematics is to study generalizations of apparently intractable problems. Let d be an odd natural number not divisible by 3. In the following, the 3n+d sequence is considered. As is standard in the mathematical literature, the element after an odd element n is defined to be (3n+d)/2. Let K be the number of odd elements in a 3n+d cycle and L the number of even elements. For a given d value, cycles having the same length (K+L) and the same number of odd elements will be said to be interrelated. When counting the number of cycles for a given d value, only one of the interrelated cycles will be included. In this case, a Poisson probability distribution (P(x)=(λx/x!)e, x=0, 1, 2, ...) where λ=1 can be used to model the number of 3n+d cycles for a given d value. (For this parameter value, the x=1 count should equal the x=0 count, the x=2 count should equal half the x=1 count, the x=3 count should equal a third of the x=2 count, etc.) Belaga and Mignotte1 have tabulated the known primitive 3n+d cycles for d less than 20000 (a cycle is said to be primitive if its elements are not a common multiple of the elements of another cycle). They state that they believe these are the only such cycles that exist. For d less than 100, the Poisson probability distribution models the number of cycles quite well; there are 14 d values where there is only 1 cycle, there are 12 d values where there are exactly 2 cycles, there are 6 d values where there are exactly 3 cycles, and there is 1 d value where there are exactly 4 cycles (again, taking interrelated cycles into account). (They include d=-1 [not a natural number] and d=1 in their table. Here, the cycle counts for these two d values are lumped together.) Note that a cycle count of 1 corresponds to x=0, a cycle count of 2 corresponds to x=1, etc. In the remainder of this article, larger samples are considered and n is allowed to be an integer (positive or negative). (Consequently, the author2 re-computed the cycles up to 10000. There are new cycles that don't appear in Belaga and Mignotte's table.) There doesn't appear to be any mention of Poisson probability distributions in Lagarias'3,4 annotated bibliographies of the 3x+1 problem.



A parity vector gives the order of even and odd elements in a 3n+d sequence, a "1" for an odd element and a "0" for an even element. Let k0, k1, k2, ..., kK-1 denote the positions of the 1's in a parity vector containing K 1's and L 0's. Let Z denote 3K-12k0+3K-22k1+3K-32k2+...+302kK-1. A cycle exists if and only if dZ/(2K+L-3K) is an integer (see Bohm and Sontacchi's5 article). (Note that every parity vector containing at least one 1 corresponds to a cycle for some d value [duplicated parity sub-vectors correspond to duplicated sub-cycles].) For example, for the d=17 cycle of {2, 1, 10, 5, 16, 8, 4}, k0=1 and k1=3, Z=3121+3023=14, 119=27-32, and (17 ·14)/119 is an integer. There don't appear to be any factors of d left over after the division by 2K+L-3K. This has been confirmed experimentally (for the 7428 distinct (d, L, K) values of the 3n+d cycles for d<10000);

(1) A 3n+d cycle exists only if d divides 2K+L-3K.

If this proposition is accepted, Z need not be considered when searching for cycles; only the factors of 2K+L-3K need be considered.
When d=|2K+L-3K|, an arbitrarily constructed parity vector with a length of K+L and containing K 1's corresponds to a cycle (or possibly duplicated sub-cycles if L and K are not relatively prime), but the cycle is not guaranteed to be primitive. When reduced, such a cycle corresponds to a cycle for a d value that is a proper divisor of 2K+L-3K. In this sense, there is no problem of finding cycles; they're all well-defined and determined by the parity vectors. Even the number of interrelated cycles is determined by the combinatorics of generating parity vectors that are distinct under rotation. For example, for L and K greater than or equal to 1 and less than or equal to 10, the numbers of distinct parity vectors [where K increases from left to right and L increases from top to bottom] are;

1,1,1,1,1,1,1,1,1,1
1,2,2,3,3,4,4,5,5,6
1,2,4,5,7,10,12,15,19,22
1,3,5,10,14,22,30,43,55,73
1,3,7,14,26,42,66,99,143,201
1,4,10,22,42,80,132,217,335,504
1,4,12,30,66,132,246,429,715,1144
1,5,15,43,99,217,429,810,1430,2438
1,5,19,55,143,335,715,1430,2704,4862
1,6,22,73,201,504,1144,2438,4862,9252


There is, however, the problem of determining which d values the primitive cycles map to. For example, for K+L=1 and K=1, the parity vector is (1) (corresponding to the cycle {-1} for d=1), for K+L=2 and K=1, the parity vector is (0, 1) (corresponding to the cycle {2, 1} for d=1), and for K+L=3 and K=2, the parity vector is (1, 1, 0) (corresponding to the cycle {-5, -7, -10} for d=1). For K+L=11 and K=7, there are 30 distinct parity vectors, the first few of which are (0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0), (1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0), (1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0), ... (corresponding to cycles for d=139). The cycle corresponding to the first parity vector is not primitive (when d=139) and corresponds to the remaining known d=1 cycle of {-34, -17, -25, -37, -55, -82, -41, -61, -91, -136, -68}. Another example factorization will be given. For (K+L, K)=(8, 4), the distinct parity vectors are (1, 1, 0, 0, 1, 1, 0, 0), (0, 1, 0, 1, 0, 1, 0, 1), (1, 0, 1, 1, 0, 1, 0, 0), (0, 0, 1, 1, 1, 1, 0, 0), (0, 1, 1, 0, 1, 1, 0, 0), (1, 0, 0, 1, 1, 1, 0, 0), (0, 1, 0, 1, 1, 1, 0, 0), (1, 0, 1, 0, 1, 1, 0, 0), (1, 1, 0, 1, 0, 1, 0, 0), and (0, 1, 1, 1, 0, 1, 0, 0) (corresponding to cycles for d=175). The first parity vector corresponds to two duplicated primitive cycles for d=7, the second parity vector corresponds to four duplicated primitive cycles for d=1, the third parity vector corresponds to a primitive cycle for d=25, the fourth and fifth parity vectors correspond to primitive cycles for d=35, and the remaining parity vectors correspond to primitive cycles for d=175. Although 5 also divides 175, there are no cycles for this d value when (K+L, K)=(8, 4) (the parity vectors have been used up by the cycles for larger d values [and d=1]). Apparently, every d value is covered in this mapping. A more appropriate question to ask than why cycles don't exist for a given d value is why they do exist and what the expected number of cycles is. For the 1667 d values less than or equal to 4999, the number of d values having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11 cycles (counting only one of interrelated cycles) are 535, 607, 310, 115, 57, 22, 9, 5, 4, 2, and 1 respectively. This distribution has a mean of 1.235. For the 3333 d values less than or equal to 9997, the number of d values having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15 cycles (counting only one of interrelated cycles) are 1117, 1193, 591, 228, 104, 48, 25, 11, 5, 5, 3, 1, 0, 1, and 1 respectively. This distribution has a mean of 1.229. This resembles a Poisson probability distribution where λ=1, but the counts for the larger x values are too big.

Associated Cycles

This raises the question of whether there are other ways for cycles to be interrelated so that the data would more closely fit a Poisson probability distribution. There are frequently (L, K) values of the cycles for a given d value that are multiples of other (L, K) values. For example, for d=269, the (L, K) values of the cycles are (4, 5), (8, 10), (12, 15), and (16, 20). Also, there are frequently (L, K) values of the cycles for a given d value that are in arithmetic progression (and aren't multiples of other (L, K) values). For example, for d=1843, the sorted (L, K) values (by increasing L values) of the cycles are (6, 11), (12, 22), (18, 9), (24, 20), (30, 31), (36, 42), (42, 53), (48, 64), (54, 51), and (90, 117) ((18, 9), (24, 20), (30, 31), (36, 42), (42, 53), and (48, 64) are in arithmetic progression with an increment of (6, 11)). Let (L1, K1) denote the (L, K) value having the smallest L value. Cycles for a given d value with (L, K) values that are in arithmetic progression with an increment of (L1, K1) (and aren't multiples of (L1, K1)) will be said to be associated with each other. In the case of d=1843, the cycles with (L, K) values of (24, 20), (30, 31), (36, 42), (42, 53), and (48, 64) will not be counted when computing the frequency distribution of cycle counts. For d=4879, the sorted (L, K) values of the cycles are (11, 14), (22, 28), (25, 10), (33, 42), (36, 24), (44, 56), (58, 52), (66, 84), (69, 66), (77, 98), and (80, 80). Note that the (L, K) values that are in arithmetic progression need not be contiguous; (25, 10) and (36, 24) are in arithmetic progression and (58, 52), (69, 66), and (80, 80) are in arithmetic progression (with an increment of (11, 14)). In this case, the cycles with (L, K) values of (36, 24), (69, 66), and (80, 80) will not be counted. Let (L2, K2) denote the first (L, K) value in the sorted (L, K) values for which the arithmetic progression(s) begins. Of the 3333 d values less than or equal to 9997, associated cycles occur for 328 d values and of these cycles, L1>K1 and L2<K2 or L1<K1 and L2>K2 in all but 20 instances. For 17 of these cycles, there are only two (L, K) values that are in arithmetic progression (with an increment of (L1, K1)). For example, for d=1679, the sorted (L, K) values are (20, 28), (37, 32), (97, 116), and (117, 144) ((97, 116) and (117, 144) are in arithmetic progression with an increment of (20, 28)). For the remaining 3 cycles, there are two (L, K) values each that are in arithmetic progression. When d=2305, the sorted (L, K) values are (30, 17), (40, 38), (50, 59), (70, 55), and (80, 76) (the pairs of (L, K) values that are in arithmetic progression are [(40, 38), (70, 55)] and [(50, 59), (80, 76)] [note that (50, 59) occurs between (40, 38) and (70, 55) and 2·(40, 38)=(80, 76)]). When d=5113, the sorted (L, K) values are (39, 27), (42, 40), (45, 53), (48, 66), (81, 67), (84, 80), and (180, 212) (the pairs of (L, K) values that are in arithmetic progression are [(42, 40), (81, 67)] and [(45, 53), (84, 80)] [note that (45, 53) occurs between (42, 40) and (81, 67) and 2·(42, 40)=(84, 80)]). For d=6989, the sorted (L, K) values are (32, 50), (52, 55), (72, 60), (84, 105), (104, 110), (124, 115), and (188, 215) (the pairs of (L, K) values that are in arithmetic progression are [(52, 55), (84, 105)] and [(72, 60), (104, 110)] [note that (72, 60) occurs between (52, 55) and (84, 105) and 2·(52, 55)=(104, 110)]). For d=2305, 30>17 ((30, 17) equals (L1, K1)) and 50<59 ((50, 59) is the (L, K) value starting the second arithmetic progression), for d=5113, 39>27 ((39, 27) equals L1, K1)) and 45<53 ((45, 53) is the (L, K) value starting the second arithmetic progession), and for d=6989, 32<50 ((32, 50) equals (L1, K1)) and 72>60 ((72, 60) is the (L, K) value starting the second arithmetic progression). Even including the 20 cycles where L1>K1 and L2>K2 or L1<K1 and L2<K2, the L and K values in some (L, K) value in the arithmetic progression(s) are approximately equal or L1 is approximately equal to K1. For example, for d=601, the sorted (L, K) values are (1, 6), (13, 3), (14, 9), (15, 15), (17, 27), (29, 24), (32, 42), and (33, 48) ((15, 15) is in the arithmetic progression). For d=2331, the sorted (L, K) values are (30, 31), (48, 32), (54, 91), 60, 62), and (84, 122) (L1 is approximately equal to K1). These properties of associated cycles are the justification for not considering the cycles with (L, K) values that are in arithmetic progression to be random. For the 1667 d values less than or equal to 4999, the numbers of d values having 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10 cycles (counting only one of interrelated or associated cycles) are 535, 664, 324, 89, 39, 13, 1, 1, 1, and 0 respectively. This distribution has a mean of 1.092. For the 3333 d values less than or equal to 9997, the numbers of d values having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12 cycles (counting only one of interrelated or associated cycles) are 1117, 1319, 594, 185, 76, 31, 6, 2, 1, 0, 2, and 0 respectively. This distribution has a mean of 1.080. The associated cycles then account for most of the deviation of the distribution of cycle counts away from a Poisson probability distribution where λ=1. A few of the counts for large x values are still too big, but this is due to cycles where there are many (L, K) values that are multiples of other (L, K) values. For example, for d=7153, the (L, K) values are (7, 12), (14, 24), (21, 36), (28, 48), (35, 60), (42, 72), (56, 96), (63, 108), (292, 257), (341, 341), and (362, 377). This raises the question of what the distribution of cycle counts would be if for a given d value, cycles with (L, K) values that were multiples of other (L, K) values were also not counted. For the 3333 d values less than or equal to 9997, the numbers of d values having 1, 2, 3, 4, 5, 6, and 7 cycles would be 1724, 1218, 318, 62, 10, 1, and 0 respectively. This distribution has a mean of 0.626. For a Poisson probability distribution with this parameter, the expected numbers of d values having 1, 2, 3, 4, 5, 6, and 7 cycles would be 1783, 1115, 349, 73, 11, 1, and 0 respectively.

The Minimum |L-K| Value of the 3n+d Cycles for a Given d Value

For the d values less than or equal to 9997, there is only 1 cycle (counting only one of interrelated cycles) for 1117 d values and the mean and standard deviation of the L-K values are 1.2713 and 17.7289 respectively. For the d values less than or equal to 9997, there are exactly 2 cycles (counting only one of interrelated cycles) for 1193 d values and the mean and standard deviation of the L-K values of the smaller |L-K| values are 0.0821 and 9.1160 respectively. For the d values less than or equal to 9997, there are exactly 3 cycles (counting only one of interrelated cycles) for 591 d values and the mean and standard deviation of the L-K values of the smallest |L-K| values are 0.5093 and 6.4213 respectively. For the d values less than or equal to 9997, there are exactly 4 cycles (counting only one of interrelated cycles) for 228 d values and the mean and standard deviation of the L-K values of the smallest |L-K| values are -0.3553 and 4.5755 respectively. In the latter case, the numbers of d values for which L-K equals -11, -10, -9, ..., 13 are 1, 4, 4, 6, 4, 12, 8, 25, 10, 7, 12, 54, 13, 11, 13, 12, 8, 14, 1, 2, 1, 0, 1, 4, and 1 respectively. Other than being discrete-valued, this appears to be a normal probability distribution; the 95% confidence interval of the mean is (-0.9524, 0.2418) and the 95% confidence interval of the standard deviation is (4.1905, 5.0389). The corresponding standard deviations when there are 5, 6, 7, and 8 cycles (counting only one of interrelated cycles) are 3.4738, 2.9436, 2.9343, and 2.4008 respectively (when there are j cycles, the standard deviation is about 1/j times the standard deviation when there is 1 cycle). The decrease in standard deviation of these distributions as the number of cycles increases appears to account for the existence of a cycle for every d value (the probability that there are cycles where |L-K| is small is still relatively high). For example, for the 3333 d values less than or equal to 9997, there are 472 d values where L=K for at least one cycle. For the d values less than or equal to 9997, there are 1090 d values where |L-K|≤3 for at least one cycle.

The Number of Prime Factors of 2K+L-3K

In the following, duplicated prime factors of 2K+L-3K are also counted. For example, 210-38=-72·113 is considered to have 3 prime factors. For 1≤L, K≤10, the numbers of values of 2K+L-3K having 1, 2, 3, 4, and 5 prime factors are 37, 36, 22, 3, and 0 respectively. A Poisson probability distribution where λ=0.91 can be used to model this data. For 1≤L, K≤12, the numbers of values of 2K+L-3K having 1, 2, 3, 4, 5, 6, and 7 prime factors are 48, 54, 28, 8, 3, 1, and 0 respectively. A Poisson probability distribution where λ=1.06 can be used to model this data. For 1≤L, K≤14, the numbers of values of 2K+L-3K having 1, 2, 3, 4, 5, 6, and 7 prime factors are 56, 75, 41, 17, 4, 1, and 0 respectively. A Poisson probability distribution where λ=1.18 can be used to model this data. For 1≤L, K≤16, the numbers of values of 2K+L-3K having 1, 2, 3, 4, 5, 6, and 7 prime factors are 61, 100, 54, 28, 7, 4, and 0 respectively. A Poisson probability distribution where λ=1.34 can be used to model this data. These Poisson probability distributions model the number of prime factors of 2K+L-3K quite well. The number of prime factors of 2K+L-3K affects the number of d values covered by the parity vectors corresponding to (K+L, K).

References



(1) Belaga, Edward G. and Mignotte, Maurice (April 7, 2006). The Collatz Problem and Its Generalizations: Experimental Data. Table 1. Primitive Cycles of 3n+d-mappings. Retrieved March 15, 2012, from http://hal.archives-ouvertes.fr/hal-00129727



(2) Cox, Darrell (February, 2012). The 3n+1 Problem. Retrieved March 15, 2012, from http://home.graysoncable.com/dkcox ("test0a" in the "Software" section finds 3n+d cycles. The (L, K) values of the 3n+d cycles for d less than 10000 are given in "test23" [whether d divides 2K+L-3K is determined]. The associated cycles can be computed using "test23e". The prime factors of 2K+L-3K can be computed using "test23b".)



(3) Lagarias, Jeffrey C. (January 11, 2011). The 3x+1 problem: An annotated bibliography (1963-1999). Retrieved March 15, 2012, from http://arxiv.org/abs/math.NT/0309224



(4) Lagarias, Jeffrey C. (February 12, 2012). The 3x+1 problem: An Annotated Bibliography, II (2000-2009). Retrieved March 15, 2012, from http://arxiv.org/abs/math.NT/0608208



(5) Bohm, C., and Sontacchi, G., On the existence of cycles of given length in integer sequences like xn+1=xn/2 if xn even, and xn+1=3xn+1 otherwise , Atti Accad. Naz. Lincei, VIII Ser., Rend., Cl. Sci. Fis. Mat. Nat. LXIV (1978), 260-264.


Bookmark on your Personal Space


Entry

A87739212

Infinite Improbability Drive

Infinite Improbability Drive

Read a random Edited Entry


Written and Edited by

Disclaimer

h2g2 is created by h2g2's users, who are members of the public. The views expressed are theirs and unless specifically stated are not those of the Not Panicking Ltd. Unlike Edited Entries, Entries have not been checked by an Editor. If you consider any Entry to be in breach of the site's House Rules, please register a complaint. For any other comments, please visit the Feedback page.

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more