I think that’s a good question. I’ve interviewed and hired a lot of people, and if someone answered the question this way, I think I would enjoy the ‘hard way’ answer.
It would depend on the style and framing of the answer as much as the substance. It’s easy to enjoy as a ‘let’s explore the bounds of this problem’ or a ‘let’s do this in an unnecessarily difficult generalization, just for fun/curiosity’. It would be useful in an interview if the candidate recognizes and states that this might be a bad engineering decision, even if it’s good math.
When I’m hiring, I want to find people that can generalize and explore a problem, see the larger picture in an interesting way. This article does that. And, just as important is finding people who know when not to do that in practice, who can recognize when and why it’s important to call something good enough, and move on to other problems.
The inclusion/ exclusion idea can actually be useful for harder project euler questions (deriving some sort of closed form sum to compute). Worthwhile to think about.
Im also reminded about the "semigroup resonance" way to solve fizzbuzz posted recently on hn [0]. Seems like another interesting method.
Isn't this like... really, really basic? I would have never considered iterating over all the numbers, not even when I first entered Project Euler back in 2011; inclusion-exclusion always was the obvious way. I'm probably biased since I'm a big Project Euler fan and I probably have a much more math-oriented way of thinking than the average software developer, so take my opinion with a grain of salt.
Most serious Project Euler problems require finding ways to reduce the problem complexity into something manageable. For example, problem 268 is a more complex and challenging version of problem 1, which can't be solved by brute force in a reasonable amount of time [0]. Also, whenever you solve a problem, don't forget to check the problem thread, accessible only to solvers, for many mathematical insights (for example, here's a hint: buried somewhere in problem 10's thread, in what I believe to be the highest rated comment found in any problem thread all over Project Euler, you can find a very useful function that can be used to solve a few, much later, problems).
Also, just yesterday Project Euler released the 700th problem (which is an easy one if you know basic modular arithmetic) [1].
Some people find Project Euler when they're new programmers. There are also plenty of programmers who lack a strong math background and do these challenges as a way to grow and learn.
I'm not sure what your standing was back in 2011, but when I first found Project Euler, I was a lowly high school student in a rural town with a horrible math system. I was also just beginning to learn programming. I iterated through every number. :)
which is an easy one if you know basic modular arithmetic
Have you solved it? It looks nontrivial to me.
4503599627370517 is a prime number which means you’re dealing with a cyclic group generated by all nonzero elements up to 4503599627370516. Since the problem chose 1504170715041707 as the generator, any number less than that is potentially an Eulercoin. Of course, only those in decreasing order within the cycle are allowed.
The smallest Eulercoin my naïve C loop could find was 19471 after over 5 minutes of CPU time, with the time to find each one increasing exponentially. This was compiled with clang -O3, running on my 2017 MacBook Pro.
Since we’re dealing with an additive cyclic group, the smallest element will be 0, which is the last Eulercoin. Finding the second-last (the smallest positive element) does not seem to be something you can brute force, as searching the entire cyclic group (4503599627370517 elements including 0) would take my laptop approximately 178 days.
It might be 'basic' but it's the starting point for most people! The fun things with these 'simple' problems is that you can solve them in so many ways. Two of my favourite posts I've recently read where they've solved beginner problems 'the Hard Way' are:
FizzBuzz with a domain specific language [1] and the
N Queens problem without declaring a value type [2]
Yeah, it's reasonably basic. For the problem as stated where there are only 1000 numbers to iterate over, though, I wouldn't have bothered wasting time thinking about it when the naive implementation is fast enough that it doesn't matter.
Why start with something more complicated (assuming you're going to program it rather than just solving it outright on paper) when the simple way works?
The problem is a bit underspecified, perhaps. 'n' is a positive integer, so 1, 2, 3, .... So the sequence is found by multiplying 1504170715041707 by 1, 2, 3, ... sequentially, then finding the remainder when you divide each of those products by 4503599627370517.
very cool write up. to me what it really illustrates is that dealing with arguably simple data and requirements, has different tiers of how to deal with the scaling quantity. At some point, a different language (math) becomes the required dialect, as programming language expressions become outmoded. Neat
Thank you. My original version 8 years ago included a short tour into "The Inclusion-exclusion principle" which is how I eventually derived the pattern to use even/odd sums of bits in the ABC=111 encoding. I cut it out because I felt it was distracting for most of my readers.
I eventually summarized it this way after a few wordy paragraphs that explained the way the equation worked:
> The sum of all the intersections would mean (if we had 4 items, A, B, C, D) adding together A, B and C and D. The second line would mean we subtract the sum of AB, AC, AD, BC, BD, and CD. The third line means we add the sum of ABC, ABD, and BCD, the fourth line means we subtract the value of ABCD. The pattern here is we alternate the operation (adding or subtracting a sum) based on the cardinality (how many items are in a set) of something. Even cardinalities are subtracted, odd cardinalities are added. We sum together all combinations of that cardinality.
> Looking at the code it’s quite obvious 3 and 5 are replaceable with any set of other numbers.
What's not obvious to me (I'm not a mathematician) is that both solutions answer the question correctly when the set of numbers is not pairwise coprime. For example, "Find the sum of all the multiples of 3 or 6 below 1000" is clearly just the same as "Find the sum of all the multiples of 3 below 1000", while the inclusion-exclusion algorithm will probably deliver a different answer. So, the set will need some pre-processing to remove common factors, which will then require adjusting the answers.
Next, for the problem "Find the sum of all the multiples of <M numbers> below <N>", the naive algorithm seems to have runtime of about O(NM), while the "sophisticated" algorithm seems to have runtime O(2^M) at least - so, as we increase M (the size of the test set), the "naive" algorithm will soon be faster, or not?
The inclusion-exclusion algorithm can be made more efficient by observing that you can stop as soon as the lcm's of your subsets exceed N. That should happen fairly early on assuming e.g. N << 2^M.
> you can stop as soon as the lcm's of your subsets exceed N
Actually if N < 2 * lcm, because you can't use the fact that the distribution of multiples is the same on (1, lcm) and (k * lcm+1, lcm(k+1)) while you still have to count on (N - N mod lcm + 1, N).
What's not obvious to me (I'm not a mathematician) is that both solutions answer the question correctly when the set of numbers is not pairwise coprime. For example, "Find the sum of all the multiples of 3 or 6 below 1000" is clearly just the same as "Find the sum of all the multiples of 3 below 1000", while the inclusion-exclusion algorithm will probably deliver a different answer.
You’re right. The solution for this is to replace “product of n and m” bij “least common multiple of n and m”. So, you would get (using 4 and 6 as an example that’s slightly better than 3 and 6):
Number of multiples of either 4 or 6
= Number of multiples of 4
+ Number of multiples of 6
- Number of multiples of lcm(4,6) = 12
and yes, if M gets larger enough, the “naive” algorithm can get faster. It will help if you bail out once you have found _a_ divisor, and, if your numbers are ‘large enough’ (1), to do division testing in some specific order (2).
(1) what is ‘large’ will be system dependent. In general, once you need bigint’s, but if your CPU doesn’t have a division instruction, it can come earlier.
(2) in general, smallest to largest, but if one of your test divisors is a power of 2, move those up front. Also, if you’re forced to use bigint’s, divisors of “2^register size +/- 1 and their factors may be easier (just as testing divisibility by 9 or 11 or 9’s factor 3 is easier in decimal written integers)
> The solution for this is to replace “product of n and m” bij “least common multiple of n and m”
Nice, and it works for my example, too:
Number of multiples of either 3 or 6
= Number of multiples of 3
+ Number of multiples of 6
- Number of multiples of lcm(3,6) = 6
= Number of multiples of 3
You could do it like in article, so for every nonempty element in powerset (combinations for each length) of those numbers calculate their lcm and add or subtract amount of multiplies of lcm according to length of element.
So for A, B, C you have (M-number of multipliers) M(lcm(A)) + M(lcm(B)) + M(lcm(C)) - M(lcm(B,C)) - M(lcm(A,C)) - M(lcm(A,B)) + M(lcm(A,B,C))
>...where I discover the hidden complexity of a simple programming problem.
I thought it was very ironic that soon after that sentence, the author claims the arithmeticSum method is O(1) when it is actually O(log(n) log(log(n))).
Many people seem to assume that multiplication is a constant time operation. There is actually immense "hidden complexity" in doing multiplication of arbitrarily large integers efficiently. David Harvey proved last year that multiplication of two n bit integers can be done in O(n log n). It is still an open conjecture that this is the best possible.
You’re right, and it’s a good point, but I think you’re too hard on the author and other people.
> the author claims the arithmeticSum method is O(1)
He really claimed that Gauss’ formula is O(1), where it’s reasonable to assume multiplication without a specific implementation is constant. After that he gave an implementation in JavaScript that is O(1). It’s only a larger complexity if you use numeric methods on computers that support arbitrarily large numbers.
> Many people seem to assume that multiplication is a constant time operation.
Multiplication is constant time for the built-in data types, for any 64-bit ints or doubles. It’s okay to call it constant until you use huge number methods.
I disagree. Why would it be reasonable to assume "multiplication without a specific implementation is constant"? By definition, Big-O complexity describes what happens as a certain parameter gets arbitrarily large. If we say "ok but if we restrict that parameter to common sizes, it's actually O(1)", then everything is O(1). There is some constant C where Bubble sort will sort any array that fits into your RAM within C seconds, so is it okay to call Bubble sort constant time until you use huge array methods that process arrays on your hard drive?
By definition, big-O is for predicting the run-time growth of a specific implementation. It doesn’t directly apply to formulas. And it’s not an abstract concept that applies to arbitrary inputs that the program you’re analyzing can’t process by design. You cannot assume arbitrarily large inputs.
The reason 64 bit mults are constant is because we have a hardware implementation of multiplication that has a constant predictable run time. When you use the built-in mults, you get to use O(1) for the purposes of your complexity analysis in return for a hard limit on the size of numbers you can multiply.
Your bubble sort example is contrived, but the answer is that it is okay to call a sort of 10 elements constant if that’s one component of a system and it doesn’t grow as the size of your input grows. The point of big-O is to understand the run time growth of your program, and if the sort inside doesn’t change as your input changes, then that piece is constant.
>By definition, big-O is for predicting the run-time growth of a specific implementation. It doesn’t directly apply to formulas. And it’s not an abstract concept that applies to arbitrary inputs that the program you’re analyzing can’t process by design. You cannot assume arbitrarily large inputs.
Time complexity is usually for specific algorithms, though it can also be studied for a general problem itself (e.g. any comparison sort is at least O(n log n), regardless of algorithm or implementation). It is precisely a concept that applies to arbitrarily large inputs, that condition is at the core of its very definition.
If I designed a hardware board that runs bubble sort on any array that fits into its 16GB of memory and gave documentation printing out its (large) constant predictable run time, that still wouldn't make it correct to say Bubble sort is a constant time operation.
>Your bubble sort example is contrived, but the answer is that it is okay to call a sort of 10 elements constant if that’s one component of a system and it doesn’t grow as the size of your input grows...if the sort inside doesn’t change as your input changes, then that piece is constant.
We aren't talking about running on 10 elements and it staying at 10 elements as the size of the input grows (that would indeed be O(1)). For the original example in the link, we are talking about a piece (arithmeticSum) whose input is n, which tautologically does grow as the size of the input grows.
> though it can also be studied for a general problem itself (e.g. any comparison sort is at least O(n log n)
Note that we call sort O(n log n) under the assumption that addition and comparison is O(1), just like the author did with multiplication. If you’re trying to make the point that arbitrarily large inputs have non-constant complexity, you should be consistent. No operations on arbitrarily large numbers are constant on the number of digits, but that is not a good model for predicting actual runtimes of actual programs that use doubles. When I use ints or longs or doubles, it is not just appropriate to use O(1) for the basic arithmetic operations, it is incorrect to assume larger complexity when that larger complexity does not apply to your program.
> Note that we call sort O(n log n) under the assumption that addition and comparison is O(1), just like the author did with multiplication.
If you’re trying to make the point that arbitrarily large inputs have non-constant complexity, you should be consistent.
I have been consistent. I agreed above that is buuble sort was one part of a program that you only call on elements of size up to 10 and the input
of the program, n, is something else then the bubble sort piece is O(1). But if n refers to the size of an array input into a bubble sort, then it is not O(1). Big-O considers what happens when the size of the _input_ grows. For a comparison sort we consider what happens where the _number_ of elements goes to infinity, but the elements themselves are assumed to bounded (e.g. 32 bit ints). This ensures comparison is O(1) not matter which two elements of the array you chosen from an arbitrarily large array. I don't see why you think the addition involved in a comparison sort wouldn't be O(1) as the addition addition that is required to increment pointers by 1.
> No operations on arbitrarily large numbers are constant on the number of digits, but that is not a good model for predicting actual runtimes of actual programs that use doubles. When I use ints or longs or doubles, it is not just appropriate to use O(1) for the basic arithmetic operations, it is incorrect to assume larger complexity when that larger complexity does not apply to your program.
You're describing the common situation when analysing a program is that the input of the program is some parameter (E.g. the size of an array) and all the integer arithmetic that arises during that program is on ints or doubles, so the program executes correctly _even as_ the input grows.
The key difference for this project Euler example is that there the input n is actually an integer that we do the main arithmetic on. The point of the program is to sum integers up to n. As I've explained before, if you then say "but practically we limit n to ints so it's O(1)" then _any_ function I write whose only input is an int is O(1) and the notion is meaningless.
You’re right, when a sort size depends on n, then it’s not considered O(1), I agree.
> I don’t see why you think the addition involved in a comparison sort wouldn’t be O(1)
Because you can’t have an infinite number of elements using 32-bit indices, you have to use indices that can represent up to n, and thus you must do arithmetic and comparsion operations on those indices that are not O(1). Considering that this was your original point, I’m totally confused why you’re now arguing against this point? It still seems inconsistent to me. Are you assuming that the size of the data elements to be sorted are 32 bits and that the only operations are on the data and not the indices? Bubble sort has to do both data comparisons and index comparisons, as well as index arithmetic. Adding 1 to a large n-bit number is worst case O(n).
> The key difference for this project Euler example is that there the input n is actually an integer that we do the main artithmetic on.
No, bubble sort has to do arithmetic on size n numbers too, as well as comparisons, the same as with Gauss’ formula.
> then _any_ function I write whose only input is an int is O(1) and the notion is meaningless.
No, this is both a straw-man and incorrect. A pure function on an int is only O(1) if it’s run-time doesn’t depend on the input. When the implementation is a constant number of 32-bit int operations on the input, and it doesn’t depend on what the value of the input is, then the function is O(1). Addition of two 32 bit ints on modern processors is O(1). Calculating whether an input int is prime, without having a lookup table in memory, for example, is not O(1), even though it’s a function of one 32-bit int.
True! That statement does follows the math formula, and by itself the statement is true regardless of complexity. It will work the same way for small numbers as large ones.
The 10e100 comment does precede the implementation, and this particular O(1) implementation of course might not support inputs in the range of 10e100 exactly.
This argument comes up all the time, and the answer is that big-O notation is very general, and only makes sense relative to a particular computation model. For example, you may describe a sorting algorithm's complexity by counting the number of comparisons, abstracting away details like memory caching or arbitrarily-sized elements. So it's perfectly reasonable for the original author to call arithmeticSum constant time, assuming that the numbers are bounded, and it's also perfectly reasonable for other researchers to say that multiplication is O(n log n), without that assumption.
[Deleted] | 6 years ago
appleiigs | 6 years ago
If i was shown the fizz buzz question in a job interview, and i answered using the “hard way”, how would they respond?
Would they be impressed by the math? Or would they complain about the number of lines of code, readability for code review, etc?
dahart | 6 years ago
I think that’s a good question. I’ve interviewed and hired a lot of people, and if someone answered the question this way, I think I would enjoy the ‘hard way’ answer.
It would depend on the style and framing of the answer as much as the substance. It’s easy to enjoy as a ‘let’s explore the bounds of this problem’ or a ‘let’s do this in an unnecessarily difficult generalization, just for fun/curiosity’. It would be useful in an interview if the candidate recognizes and states that this might be a bad engineering decision, even if it’s good math.
When I’m hiring, I want to find people that can generalize and explore a problem, see the larger picture in an interesting way. This article does that. And, just as important is finding people who know when not to do that in practice, who can recognize when and why it’s important to call something good enough, and move on to other problems.
ubercow13 | 6 years ago
What's the 'hard way'?
appleiigs | 6 years ago
The title of the post is the “Euler 001 the Hard Way”. The way i read it, the hard way would be “Solving all combinations with performance”.
foxes | 6 years ago
The inclusion/ exclusion idea can actually be useful for harder project euler questions (deriving some sort of closed form sum to compute). Worthwhile to think about.
Im also reminded about the "semigroup resonance" way to solve fizzbuzz posted recently on hn [0]. Seems like another interesting method.
[0] https://blog.ploeh.dk/2019/12/30/semigroup-resonance-fizzbuz...
JakeStone | 6 years ago
Something similar from 4+ years ago, but in C#
https://gist.github.com/RichardVasquez/6780214
gordaco | 6 years ago
Isn't this like... really, really basic? I would have never considered iterating over all the numbers, not even when I first entered Project Euler back in 2011; inclusion-exclusion always was the obvious way. I'm probably biased since I'm a big Project Euler fan and I probably have a much more math-oriented way of thinking than the average software developer, so take my opinion with a grain of salt.
Most serious Project Euler problems require finding ways to reduce the problem complexity into something manageable. For example, problem 268 is a more complex and challenging version of problem 1, which can't be solved by brute force in a reasonable amount of time [0]. Also, whenever you solve a problem, don't forget to check the problem thread, accessible only to solvers, for many mathematical insights (for example, here's a hint: buried somewhere in problem 10's thread, in what I believe to be the highest rated comment found in any problem thread all over Project Euler, you can find a very useful function that can be used to solve a few, much later, problems).
Also, just yesterday Project Euler released the 700th problem (which is an easy one if you know basic modular arithmetic) [1].
[0] https://projecteuler.net/problem=268
[1] https://projecteuler.net/problem=700
fiblye | 6 years ago
Some people find Project Euler when they're new programmers. There are also plenty of programmers who lack a strong math background and do these challenges as a way to grow and learn.
I'm not sure what your standing was back in 2011, but when I first found Project Euler, I was a lowly high school student in a rural town with a horrible math system. I was also just beginning to learn programming. I iterated through every number. :)
chongli | 6 years ago
which is an easy one if you know basic modular arithmetic
Have you solved it? It looks nontrivial to me.
4503599627370517 is a prime number which means you’re dealing with a cyclic group generated by all nonzero elements up to 4503599627370516. Since the problem chose 1504170715041707 as the generator, any number less than that is potentially an Eulercoin. Of course, only those in decreasing order within the cycle are allowed.
The smallest Eulercoin my naïve C loop could find was 19471 after over 5 minutes of CPU time, with the time to find each one increasing exponentially. This was compiled with clang -O3, running on my 2017 MacBook Pro.
Since we’re dealing with an additive cyclic group, the smallest element will be 0, which is the last Eulercoin. Finding the second-last (the smallest positive element) does not seem to be something you can brute force, as searching the entire cyclic group (4503599627370517 elements including 0) would take my laptop approximately 178 days.
macintux | 6 years ago
Please don’t discourage people who missed that insight. This would have been a much better comment using the 10000 attitude.
https://www.xkcd.com/1053/
[Deleted] | 6 years ago
exdsq | 6 years ago
It might be 'basic' but it's the starting point for most people! The fun things with these 'simple' problems is that you can solve them in so many ways. Two of my favourite posts I've recently read where they've solved beginner problems 'the Hard Way' are:
FizzBuzz with a domain specific language [1] and the N Queens problem without declaring a value type [2]
[1] https://themonadreader.files.wordpress.com/2014/04/fizzbuzz....
[2] https://aphyr.com/posts/342-typing-the-technical-interview
sorokod | 6 years ago
"These queens can stab in eight directions. You always assumed that was a metaphor"
Excellent.
its_dario | 6 years ago
> Isn't this like... really, really basic?
Yeah, it's reasonably basic. For the problem as stated where there are only 1000 numbers to iterate over, though, I wouldn't have bothered wasting time thinking about it when the naive implementation is fast enough that it doesn't matter.
Why start with something more complicated (assuming you're going to program it rather than just solving it outright on paper) when the simple way works?
fctorial | 6 years ago
From the problem 700:
Could you explain what 'n mod' means? Google doesn't seem to know what that is.stevelosh | 6 years ago
`n` is the position in the sequence:
philiplu | 6 years ago
The problem is a bit underspecified, perhaps. 'n' is a positive integer, so 1, 2, 3, .... So the sequence is found by multiplying 1504170715041707 by 1, 2, 3, ... sequentially, then finding the remainder when you divide each of those products by 4503599627370517.
pizzaknife | 6 years ago
very cool write up. to me what it really illustrates is that dealing with arguably simple data and requirements, has different tiers of how to deal with the scaling quantity. At some point, a different language (math) becomes the required dialect, as programming language expressions become outmoded. Neat
[OP] bdg | 6 years ago
Thank you. My original version 8 years ago included a short tour into "The Inclusion-exclusion principle" which is how I eventually derived the pattern to use even/odd sums of bits in the ABC=111 encoding. I cut it out because I felt it was distracting for most of my readers.
I eventually summarized it this way after a few wordy paragraphs that explained the way the equation worked:
> The sum of all the intersections would mean (if we had 4 items, A, B, C, D) adding together A, B and C and D. The second line would mean we subtract the sum of AB, AC, AD, BC, BD, and CD. The third line means we add the sum of ABC, ABD, and BCD, the fourth line means we subtract the value of ABCD. The pattern here is we alternate the operation (adding or subtracting a sum) based on the cardinality (how many items are in a set) of something. Even cardinalities are subtracted, odd cardinalities are added. We sum together all combinations of that cardinality.
FabHK | 6 years ago
Two quick remarks:
> Looking at the code it’s quite obvious 3 and 5 are replaceable with any set of other numbers.
What's not obvious to me (I'm not a mathematician) is that both solutions answer the question correctly when the set of numbers is not pairwise coprime. For example, "Find the sum of all the multiples of 3 or 6 below 1000" is clearly just the same as "Find the sum of all the multiples of 3 below 1000", while the inclusion-exclusion algorithm will probably deliver a different answer. So, the set will need some pre-processing to remove common factors, which will then require adjusting the answers.
Next, for the problem "Find the sum of all the multiples of <M numbers> below <N>", the naive algorithm seems to have runtime of about O(NM), while the "sophisticated" algorithm seems to have runtime O(2^M) at least - so, as we increase M (the size of the test set), the "naive" algorithm will soon be faster, or not?
kevinventullo | 6 years ago
The inclusion-exclusion algorithm can be made more efficient by observing that you can stop as soon as the lcm's of your subsets exceed N. That should happen fairly early on assuming e.g. N << 2^M.
chupasaurus | 6 years ago
> you can stop as soon as the lcm's of your subsets exceed N
Actually if N < 2 * lcm, because you can't use the fact that the distribution of multiples is the same on (1, lcm) and (k * lcm+1, lcm(k+1)) while you still have to count on (N - N mod lcm + 1, N).
Someone | 6 years ago
What's not obvious to me (I'm not a mathematician) is that both solutions answer the question correctly when the set of numbers is not pairwise coprime. For example, "Find the sum of all the multiples of 3 or 6 below 1000" is clearly just the same as "Find the sum of all the multiples of 3 below 1000", while the inclusion-exclusion algorithm will probably deliver a different answer.
You’re right. The solution for this is to replace “product of n and m” bij “least common multiple of n and m”. So, you would get (using 4 and 6 as an example that’s slightly better than 3 and 6):
and yes, if M gets larger enough, the “naive” algorithm can get faster. It will help if you bail out once you have found _a_ divisor, and, if your numbers are ‘large enough’ (1), to do division testing in some specific order (2).(1) what is ‘large’ will be system dependent. In general, once you need bigint’s, but if your CPU doesn’t have a division instruction, it can come earlier.
(2) in general, smallest to largest, but if one of your test divisors is a power of 2, move those up front. Also, if you’re forced to use bigint’s, divisors of “2^register size +/- 1 and their factors may be easier (just as testing divisibility by 9 or 11 or 9’s factor 3 is easier in decimal written integers)
FabHK | 6 years ago
> The solution for this is to replace “product of n and m” bij “least common multiple of n and m”
Nice, and it works for my example, too:
Cool. How to extend to more than 2 numbers?[Deleted] | 6 years ago
jarekkruk | 6 years ago
You could do it like in article, so for every nonempty element in powerset (combinations for each length) of those numbers calculate their lcm and add or subtract amount of multiplies of lcm according to length of element.
So for A, B, C you have (M-number of multipliers) M(lcm(A)) + M(lcm(B)) + M(lcm(C)) - M(lcm(B,C)) - M(lcm(A,C)) - M(lcm(A,B)) + M(lcm(A,B,C))
Ragib_Zaman | 6 years ago
>...where I discover the hidden complexity of a simple programming problem.
I thought it was very ironic that soon after that sentence, the author claims the arithmeticSum method is O(1) when it is actually O(log(n) log(log(n))).
Many people seem to assume that multiplication is a constant time operation. There is actually immense "hidden complexity" in doing multiplication of arbitrarily large integers efficiently. David Harvey proved last year that multiplication of two n bit integers can be done in O(n log n). It is still an open conjecture that this is the best possible.
dahart | 6 years ago
You’re right, and it’s a good point, but I think you’re too hard on the author and other people.
> the author claims the arithmeticSum method is O(1)
He really claimed that Gauss’ formula is O(1), where it’s reasonable to assume multiplication without a specific implementation is constant. After that he gave an implementation in JavaScript that is O(1). It’s only a larger complexity if you use numeric methods on computers that support arbitrarily large numbers.
> Many people seem to assume that multiplication is a constant time operation.
Multiplication is constant time for the built-in data types, for any 64-bit ints or doubles. It’s okay to call it constant until you use huge number methods.
Ragib_Zaman | 6 years ago
I disagree. Why would it be reasonable to assume "multiplication without a specific implementation is constant"? By definition, Big-O complexity describes what happens as a certain parameter gets arbitrarily large. If we say "ok but if we restrict that parameter to common sizes, it's actually O(1)", then everything is O(1). There is some constant C where Bubble sort will sort any array that fits into your RAM within C seconds, so is it okay to call Bubble sort constant time until you use huge array methods that process arrays on your hard drive?
dahart | 6 years ago
By definition, big-O is for predicting the run-time growth of a specific implementation. It doesn’t directly apply to formulas. And it’s not an abstract concept that applies to arbitrary inputs that the program you’re analyzing can’t process by design. You cannot assume arbitrarily large inputs.
The reason 64 bit mults are constant is because we have a hardware implementation of multiplication that has a constant predictable run time. When you use the built-in mults, you get to use O(1) for the purposes of your complexity analysis in return for a hard limit on the size of numbers you can multiply.
Your bubble sort example is contrived, but the answer is that it is okay to call a sort of 10 elements constant if that’s one component of a system and it doesn’t grow as the size of your input grows. The point of big-O is to understand the run time growth of your program, and if the sort inside doesn’t change as your input changes, then that piece is constant.
Ragib_Zaman | 6 years ago
>By definition, big-O is for predicting the run-time growth of a specific implementation. It doesn’t directly apply to formulas. And it’s not an abstract concept that applies to arbitrary inputs that the program you’re analyzing can’t process by design. You cannot assume arbitrarily large inputs.
Time complexity is usually for specific algorithms, though it can also be studied for a general problem itself (e.g. any comparison sort is at least O(n log n), regardless of algorithm or implementation). It is precisely a concept that applies to arbitrarily large inputs, that condition is at the core of its very definition.
If I designed a hardware board that runs bubble sort on any array that fits into its 16GB of memory and gave documentation printing out its (large) constant predictable run time, that still wouldn't make it correct to say Bubble sort is a constant time operation.
>Your bubble sort example is contrived, but the answer is that it is okay to call a sort of 10 elements constant if that’s one component of a system and it doesn’t grow as the size of your input grows...if the sort inside doesn’t change as your input changes, then that piece is constant.
We aren't talking about running on 10 elements and it staying at 10 elements as the size of the input grows (that would indeed be O(1)). For the original example in the link, we are talking about a piece (arithmeticSum) whose input is n, which tautologically does grow as the size of the input grows.
dahart | 6 years ago
> though it can also be studied for a general problem itself (e.g. any comparison sort is at least O(n log n)
Note that we call sort O(n log n) under the assumption that addition and comparison is O(1), just like the author did with multiplication. If you’re trying to make the point that arbitrarily large inputs have non-constant complexity, you should be consistent. No operations on arbitrarily large numbers are constant on the number of digits, but that is not a good model for predicting actual runtimes of actual programs that use doubles. When I use ints or longs or doubles, it is not just appropriate to use O(1) for the basic arithmetic operations, it is incorrect to assume larger complexity when that larger complexity does not apply to your program.
Ragib_Zaman | 6 years ago
> Note that we call sort O(n log n) under the assumption that addition and comparison is O(1), just like the author did with multiplication. If you’re trying to make the point that arbitrarily large inputs have non-constant complexity, you should be consistent.
I have been consistent. I agreed above that is buuble sort was one part of a program that you only call on elements of size up to 10 and the input of the program, n, is something else then the bubble sort piece is O(1). But if n refers to the size of an array input into a bubble sort, then it is not O(1). Big-O considers what happens when the size of the _input_ grows. For a comparison sort we consider what happens where the _number_ of elements goes to infinity, but the elements themselves are assumed to bounded (e.g. 32 bit ints). This ensures comparison is O(1) not matter which two elements of the array you chosen from an arbitrarily large array. I don't see why you think the addition involved in a comparison sort wouldn't be O(1) as the addition addition that is required to increment pointers by 1.
> No operations on arbitrarily large numbers are constant on the number of digits, but that is not a good model for predicting actual runtimes of actual programs that use doubles. When I use ints or longs or doubles, it is not just appropriate to use O(1) for the basic arithmetic operations, it is incorrect to assume larger complexity when that larger complexity does not apply to your program.
You're describing the common situation when analysing a program is that the input of the program is some parameter (E.g. the size of an array) and all the integer arithmetic that arises during that program is on ints or doubles, so the program executes correctly _even as_ the input grows.
The key difference for this project Euler example is that there the input n is actually an integer that we do the main arithmetic on. The point of the program is to sum integers up to n. As I've explained before, if you then say "but practically we limit n to ints so it's O(1)" then _any_ function I write whose only input is an int is O(1) and the notion is meaningless.
dahart | 6 years ago
You’re right, when a sort size depends on n, then it’s not considered O(1), I agree.
> I don’t see why you think the addition involved in a comparison sort wouldn’t be O(1)
Because you can’t have an infinite number of elements using 32-bit indices, you have to use indices that can represent up to n, and thus you must do arithmetic and comparsion operations on those indices that are not O(1). Considering that this was your original point, I’m totally confused why you’re now arguing against this point? It still seems inconsistent to me. Are you assuming that the size of the data elements to be sorted are 32 bits and that the only operations are on the data and not the indices? Bubble sort has to do both data comparisons and index comparisons, as well as index arithmetic. Adding 1 to a large n-bit number is worst case O(n).
> The key difference for this project Euler example is that there the input n is actually an integer that we do the main artithmetic on.
No, bubble sort has to do arithmetic on size n numbers too, as well as comparisons, the same as with Gauss’ formula.
> then _any_ function I write whose only input is an int is O(1) and the notion is meaningless.
No, this is both a straw-man and incorrect. A pure function on an int is only O(1) if it’s run-time doesn’t depend on the input. When the implementation is a constant number of 32-bit int operations on the input, and it doesn’t depend on what the value of the input is, then the function is O(1). Addition of two 32 bit ints on modern processors is O(1). Calculating whether an input int is prime, without having a lookup table in memory, for example, is not O(1), even though it’s a function of one 32-bit int.
rwbhn | 6 years ago
But the author follows that with "It works the same way for 100 as it does 10e100."
dahart | 6 years ago
True! That statement does follows the math formula, and by itself the statement is true regardless of complexity. It will work the same way for small numbers as large ones.
The 10e100 comment does precede the implementation, and this particular O(1) implementation of course might not support inputs in the range of 10e100 exactly.
evanpw | 6 years ago
This argument comes up all the time, and the answer is that big-O notation is very general, and only makes sense relative to a particular computation model. For example, you may describe a sorting algorithm's complexity by counting the number of comparisons, abstracting away details like memory caching or arbitrarily-sized elements. So it's perfectly reasonable for the original author to call arithmeticSum constant time, assuming that the numbers are bounded, and it's also perfectly reasonable for other researchers to say that multiplication is O(n log n), without that assumption.
jpxw | 6 years ago
`sumMultiplesOfTwoAndThree` should probably be `sumMultiplesOfThreeAndFive` right?
master_yoda_1 | 6 years ago
To Author: good for your weekend musing. But please don't torture interviewer with it (with time limit of 45 minutes).