Factoring Integers

Posted at

(Warning: Extreme nerdiness follows. You are not expected to understand this if you are a family member whose last name is not "Sambol". Feel free to move along.)

So instead of doing anything useful, I spent all of my free time today designing and implementing and measuring and optimizing and refining an algorithm for determining the prime factors of an integer. Here's the final result (written in Ruby):

def factor(n)
    def primes_up_to (n)
        r = (2..n).to_a
        primes = []
        while r.first**2 < n
            primes.push(r.first)
            r.reject! { |i| i % primes.last == 0 }
        end
        primes + r
    end
    factors = {}
    for p in primes_up_to((n**0.5).ceil)
        while n % p == 0
            factors[p] ? factors[p] += 1 : factors[p] = 1
            n /= p
        end
    end
    factors[n] = 1 if n != 1
    factors
end

I got the urge to do this after incidentally running across a largish number in the middle of a random sysadmin task, and idly wondering whether or not it was a power of 2. The actual factorization algorithm was fairly easy to come up with and not all that interesting: just test whether every eligible prime number divides the integer evenly and, if so, count how many times it does so. The interesting part is generating the list of prime numbers (which is the job of the "primes_up_to" subroutine).

I vaguely remembered that some ancient Greek guy had come up with a simple method for finding all prime numbers smaller than a given number, and I preferred to look this up rather than reinvent that particular wheel. Some googling revealed that the "sieve of Eratosthenes" was what I had in mind. It goes a little like this: start with the list of all the integers from 2 to whatever; note that the first member of this list is prime and remove it and all its multiples from the list; keep repeating that last step until the list is empty. The code for doing exactly that is actually a bit different from the function mentioned above:

def primes_up_to (n)
    r = (2..n).to_a
    primes = []
    while not r.empty?
        primes.push(r.first)
        r.reject! { |i| i % primes.last == 0 }
    end
    primes
end

The final version differs from this one primarily in the test for terminating the iteration. By tracing out the way that the list changed during each iteration of the sieve, I discovered that this initial implementation did quite a bit of unnecessary filtering. After a certain point relatively early on, there is nothing but prime numbers left in the list, so there's no point in wasting energy trying to weed out their multiples. A serendipitous consultation of my senior-level algorithms textbook unearthed the (unproved) assertion that, "Trial division by all integers up to B is guaranteed to factor completely any number up to B2." After ruminating on why this must be a true theorem [1], it quickly became obvious that a corollary is that an array of the integers from 2 to B2 will contain only primes if you eliminate multiples of the primes from 2 to B [2]. This meant that I could terminate the while loop in primes_up_to as soon as the first member in the list is larger than the square root of the largest integer in the list. At that point, all the numbers left in the list can be dumped into the list of primes that's been built up; no further testing needed.

My curiosity led me to another succinct and elegant implementation of Eratosthenes' sieve:

def primes_up_to (n)
    (2..(n**0.5).ceil).inject((2..n).to_a) { |r, i|
        r.select { |j| j == i || j % i != 0 }
    }
end

I'm charmed by the functional style exhibited here. I've yet to see any code that uses Ruby's inject properly that doesn't exude a fanciful cleverness. Unfortunately, measurements clearly showed that the running time for this version is much worse than that of my imperative version. Intractably so, as far as I can tell. It wastes time performing a filtering run for each of the integers up to the square root of the maximum integer, regardless of whether they're actually prime.

Perhaps I'll more formally analyze the running time of these two implementations of Eratosthenes' sieve, but that will have to wait for another day.


[1] If B2 has a prime factor greater than B (let's call it A), then A can only form B2 by multiplying with a number smaller than B (let's call it C), else A*C would be too large. Since C is less than B, it's already been tested as a factor, which simultaneously reveals A as a factor (B2/C = A).

[2] If the list contained a composite number at this point (let's call it X), then that composite number would have only factors larger than B. Since the factors are larger than B, then the result of multiplying the factors (i.e. X) would have to be larger than B2. But the list doesn't contain numbers larger than B2, by definition, so X can't be in the list.


Comments

Comment from Phil! at

Note that because you're dividing the number by its factors you don't necessarily need all the primes up to the square root of the original number. The approach I've taken in the past requires a primality-testing function and is basically

def prime-factorize (n):
    d = 2
    while n > 1:
        if n % d = 0:
            add-divisor(d)
        else:
            d = next-prime(d)

def next-prime (n):
    if n < 2:
        2
    else if n = 2:
        3
    else if even?(n):
        next-prime(n - 1)
    else:
        candidate = n + 2
    while true:
        if prime?(candidate):
            return candidate
        else:
            candidate = candidate + 2

This approach mostly depends on having a good prime testing function. (Actually, with a really good function, the 'while n > 1' can be replaced with 'until prime?(n)', but the simplest version of 'prime?' would involve testing divisibility against all primes below sqrt(n), and if you're doing that, you might as well just build the list once and use it in your algorithm.)

My primality-testing can be seen at

http://aperiodic.net/phil/lisp/primes.lisp

(yes, Common Lisp) and it's probably a bit heavyweight for one-offs like your question. I have it lying around for Project Euler things.

Comment from Daniel Pearson at

I gotta pick on your syntax errors: you seem to have forgotten that "?" and "-" aren't legal characters for Python identifiers.

Comment from Steve at

Hurray old dead Greek guys!  Eratosthenes, rah rah rah!  Anaximenes, bis boom bah!

Comment from Daniel Pearson at

"But they're like the people chained up in the cave in the allegory of the people in the cave by the Greek guy."

Comment from Phil! at

It's not Python.  It's pseudocode (based on Lisp, but with Scheme naming conventions) that just happens to look like Python.  So there.  :P

(My original code is in Common Lisp, and has some memoization going on, too, so I figured I'd render it in a way that's more readable to people who might not necessarily know Common Lisp.  I really did think, "Well, I'll make it pseudocode.  Hey, everyone's always saying how Python looks like pseudocode; I'll make pseudocode that looks like Python.")

Comment from Phil! at

And today I saw this link:

  http://article.gmane.org/gmane.comp.lang.haskell.cafe/19470

It's a fairly clever implementation that doesn't actually do any modular division (which is cool, of course, because division, even just to get a remainder, is a relatively expensive operation).  It also takes advantage of some Haskell-specific things like lazy evaluation, but I stole some ideas from it and wrote a version of my own that ends up being quite fast compared to the other implementations I was playing with.

The algorithms I've played with:

1) The simplest implementation of your algorithm in Common Lisp: build a list of odd integers from 3 to n, walk through it, picking off the head element then removing all multiples of that number.  The primes to 1,000,000 took 3.7 seconds.  The primes to 10,000,000 took 86.7 seconds.

2) The same algorithm, with some tricks to avoid consing (memory allocation, mostly because allocating memory just to throw it away can be slow): build a list of odd integers from 3 to n, walk through that list, removing multiples of the current element from later in the list by splicing together the non-multiple elements of the list (rewriting the pointers in the linked list, basically).  The primes to 1,000,000 took 3.4 seconds.  The primes to 10,000,000 took 80.4 seconds.

3) My "list primes up to n" function, which doesn't use the Sieve of Eratosthenes: check all odd integers from 3 to n for primality, collect the ones that are prime.  The primes to 1,000,000 took 0.160 seconds.  The primes to 10,000,000 took 23.6 seconds.  (I precache the primality of all integers from 0 to 2^22, so when n=10^6, all prime testing is just table lookups.  When n=10^7, it has to actually start doing work, so it gets slower.)

4) The algorithm with ideas stolen from the Haskell code: make an array of bits with indices from 0 to n, set all bits on, for each odd integer from 3 to n, if its bit in the array is still on, hit every multiple of that number (by iterative addition) in the array and turn its bit off, finally create a list of 2 plus all the odd numbers that still have their bits on.  The primes to 1,000,000 took 0.076 seconds.  The primes to 10,000,000 took 1.256 seconds.  The primes to 100,000,000 took 13.8 seconds.  The primes to 2^29 (max length of an array in my Common Lisp implementation) took 83.4 seconds.

You can see my code at

  http://aperiodic.net/phil/lisp/eratosthenes.lisp