Next: , Previous: , Up: Division Algorithms   [Index]

#### 15.2.3 Divide and Conquer Division

For divisors larger than `DC_DIV_QR_THRESHOLD`, division is done by dividing. Or to be precise by a recursive divide and conquer algorithm based on work by Moenck and Borodin, Jebelean, and Burnikel and Ziegler (see References).

The algorithm consists essentially of recognising that a 2NxN division can be done with the basecase division algorithm (see Basecase Division), but using N/2 limbs as a base, not just a single limb. This way the multiplications that arise are (N/2)x(N/2) and can take advantage of Karatsuba and higher multiplication algorithms (see Multiplication Algorithms). The two “digits” of the quotient are formed by recursive Nx(N/2) divisions.

If the (N/2)x(N/2) multiplies are done with a basecase multiplication then the work is about the same as a basecase division, but with more function call overheads and with some subtractions separated from the multiplies. These overheads mean that it’s only when N/2 is above `MUL_TOOM22_THRESHOLD` that divide and conquer is of use.

`DC_DIV_QR_THRESHOLD` is based on the divisor size N, so it will be somewhere above twice `MUL_TOOM22_THRESHOLD`, but how much above depends on the CPU. An optimized `mpn_mul_basecase` can lower `DC_DIV_QR_THRESHOLD` a little by offering a ready-made advantage over repeated `mpn_submul_1` calls.

Divide and conquer is asymptotically O(M(N)*log(N)) where M(N) is the time for an NxN multiplication done with FFTs. The actual time is a sum over multiplications of the recursed sizes, as can be seen near the end of section 2.2 of Burnikel and Ziegler. For example, within the Toom-3 range, divide and conquer is 2.63*M(N). With higher algorithms the M(N) term improves and the multiplier tends to log(N). In practice, at moderate to large sizes, a 2NxN division is about 2 to 4 times slower than an NxN multiplication.

Next: , Previous: , Up: Division Algorithms   [Index]