Sum of 1 to 1000000000 in different programming languages

tsahyt · on Aug 5, 2013

Haskell

    foldl' (+) 0 [1..1000000000]

You could use sum, but that will eat up a lot of RAM because of the laziness.

EDIT: For the fun of it, I decided to do the same in a slightly more esoteric language, so here's a Prolog version (given that your stack is big enough)

    rangesum(0,0).
    rangesum(N,X) :- M is N - 1, rangesum(M,Y), X is Y + N.

    ?- rangesum(1000000000, X), write(X).

dons · on Aug 5, 2013

sum is fine in GHC. It is specialised for Integer. GHCi uses naive sum though.

cgag · on Aug 5, 2013

I tried sum in GHC and it ate up my 16 gigs of ram and then crashed. I'm a noob so maybe I did something wrong, but my code was:

    main = do
        putStrLn $ show  $ sum [1..1000000000]

TheCoelacanth · on Aug 6, 2013

Pass GHC the -O2 option to turn on optimizations. You need the strictness analyzer to run so that it can determine that sum is strict, otherwise you get a space leak.

cgag · on Aug 6, 2013

Thank you :)

tsahyt · on Aug 5, 2013

Good to know that. I was trying that code in GHCi, which explains a few things now.

TheCoelacanth · on Aug 6, 2013

GHC's strictness analyzer will recognize that sum is strict for Integer and optimize the laziness away. It will only cause a space leak when optimizations are turned off.

Aardwolf · on Aug 5, 2013

Someone replied the following in there:

"The key in this case is using C99's long long data type. It provides the biggest primitive storage C can manage (128-bits on 32-bit machines and 256-bits on a 64-bit machine) and it runs really, really fast."

Isn't long long "typically" 64-bit? (I know the C standard doesn't actually specify any actual size).

What platform does this long long type really give you the full 128 or 256 bits on?

And do 64-bit CPU's indeed support 256 bit integer types? If so, what can I do to play with it!! C does not provide it for me on Linux!

Thanks :)

longlonguserna · on Aug 5, 2013

If you want a guaranteed 64-bit type, put in your code:

#include <stdint.h>

then, use uint64_t for unsigned and int64_t for signed. If you want 128 bits, in gcc you can use __uint128_t (it has two extra underscores at the beginning because that size is nonstandard), but I don't think there is support for 256 bit integers.

Try a big integer library: http://stackoverflow.com/questions/124332/c-handling-very-la...

Moral_ · on Aug 5, 2013

int64_t is just a typdef'd long long in linux and osx:

typedef long long int int64_t;

plorkyeran · on Aug 5, 2013

long long is merely required to be at least 64-bit and in practice is 64-bit on every platform. AFAIK no general-purpose processors support 256-bit ints. AVX2 has 256-bit vector operations, but that is not at all the same thing.

pbsd · on Aug 5, 2013

Per the C99 standard, long long is guaranteed to be at least 64 bits, but is allowed to be larger.

pdw · on Aug 5, 2013

gcc supports '__int128' and 'unsigned __int128' on 64-bit systems. They're pretty limited as there's no libc support for them on any system I know of.

fooyc · on Aug 5, 2013

This issue happens on 32 bits builds of PHP and nodejs : The language switches to a floating point representation when the result of some operation exceeds INT_MAX.

In 64 bits PHP builds, the computation is done right.

mistercow · on Aug 5, 2013

32-bit or 64-bit won't matter for Node.js. The Number type in JS is specifically defined as using the 64-bit floating point format as defined by IEEE 754, except that all NaNs are coerced to a single value. In terms of the abstraction, there is never a cast when the value overflows; it should just always be considered a double. Under the hood, there may be differences in how the number is actually being treated.

ahoge · on Aug 5, 2013

Yep, there are only doubles in JavaScript. With doubles, integer values work fine up to 2^53. After that point, things get silly.

    >>> Math.pow(2, 53) === Math.pow(2, 53) - 1
    false
    >>> Math.pow(2, 53) === Math.pow(2, 53)
    true
    >>> Math.pow(2, 53) === Math.pow(2, 53) + 1
    true // derp

tootie · on Aug 5, 2013

I've heard this before and never understood why. Why?

mistercow · on Aug 5, 2013

Presumably because it allows you to do anything involving double precision or 32-bit integer arithmetic, and performance was not originally a major consideration. It's pretty rare to need more than 53 bits of precision (and was even rarer for JS's original intent), so it makes sense that the numeric type is kept simple. Edit: and to clarify, the advantage is that this makes basic implementation extremely simple. Only if you want to optimize your engine's performance do you have to worry about shuffling types around.

These days the solution for having more precision is to use an external library. I think that's generally fine, although I think performance is a concern. Financial applications aside, working with arbitrary precision is a good hint that you might be doing something processor intensive. It's certainly a case where I'd like the library to be compiled to target asm.js, and maybe optionally NaCL, once those have widespread adoption. Ideally, ECMAScript would also have a native implementation, but that won't eliminate the need for a library for shimming for years to come.

gsnedders · on Aug 6, 2013

Performance was always enough of a consideration that even BE's original implementation had both int32 and double types internally, though black-box unobservable (as in the black box, everything appears as a double).

mistercow · on Aug 6, 2013

That is interesting.

As a side note, I'll bet you could have actually observed the difference via timing at the time, assuming you knew what hardware you were working on. On an early Pentium, a floating point add would have taken up to 3 times as long as an integer add (depending on implementation), so by comparing in a loop, you might be able to tell if a given value was being treated as an integer or a double.

gsnedders · on Aug 7, 2013

You still can — now the dispatch overhead is ever closer to zero, the cost of the operation is even more apparent.

pygy_ · on Aug 5, 2013

> Why [+/- 52 bit] ?

Floating point numbers ("floats") work like the scientific notation (e.g. "12 * 10^-3" ). They have an mantissa and an exponent. Double precision (64 bit) floats have a 52 bits integer mantissa, a sign bit, and 11 bits used to represent the exponent, or to encode NaN values ("Not a Number"), like the result of "0/0".

> Why [only floats in JavaScript] ?

The language was supposed to have a low barrier of entry, and having a single number type was thought to be less complex to explain, even though floats have some weird corner cases regarding rounding when you don't understand how they are implemented. Beyond the loss of precision in large numbers, some rationals that have a finite representation in base 10 have an infinite representation in base 2.

For example, 0.2 in base 10 is 0.00110011... in base 2. It is thus rounded to 52 bits. Who says rounding says rounding error.

Here's a Node.JS session that demonstrate the behaviour:

    > e = 0.2
    0.2
    > e = e + 0.2
    0.4
    > e = e + 0.2
    0.6000000000000001
    > e == 0.6
    false
    > e = e + 0.2
    0.8
    > e = e + 0.2
    1
    > e == 1
    true

chewxy · on Aug 5, 2013

http://www.ecma-international.org/ecma-262/5.1/#sec-4.3.19

As for why? I have no clue. It's specified in the specification. That's all. There are ToInteger(), ToInt32(), ToUint16(), ToUInt32 functions defined as well, but I think that's for the host to implement

EDIT: oops. there isn't a ToInt64() function defined.

joe24pack · on Aug 5, 2013

I think I might be doing it wrong, because I didn't do any looping. I'm a bit too lazy and impatient for that, who wants to spend their afternoon adding all those numbers up even with a computer.

  [joe24pack@staropramen ~]$ python
  Python 2.6.6 (r266:84292, May  1 2012, 13:52:17) 
  [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> def gauss(x):
  ...   return (x+1)*(x/2)
  ... 
  >>> gauss(10)
  55
  >>> gauss(1000000000)
  500000000500000000
  >>>

Tibbes · on Aug 5, 2013

Hmmmm, make that:

    def gauss(x):
        return (x+1)*x/2

(consider gauss(11), for example)

You gotta admit, on an article about the difference between integer and floating-point arithmetic, that's pretty ironic!

joe24pack · on Aug 5, 2013

It's odd that the odd numbers slipped my mind. Thank you for your gracious correction.

jypepin · on Aug 5, 2013

According to the time it took to my macbook air to calculate it in ruby, it would be pretty interesting to have someone generate benchmarks for different languages :)

jdtw · on Aug 5, 2013

~500ms with optimized Common Lisp: http://stackoverflow.com/a/18065714/2423072

surfearth · on Aug 5, 2013

sum(range(1, 100000001)) took ~3.25 seconds on my air under python3.

gshubert17 · on Aug 5, 2013

Your number appears to be a factor of 10 smaller than the "one billion plus one" of the original.

I get similar times on my Mac; about 3.2 seconds for 108 and 31.4 seconds for 109. Both are about 5 times faster than Python 2.7 for me.

robomartin · on Aug 5, 2013

APL:

  +/⍳1E9

Try it yourself, download NARS2000 (free) here:

http://www.nars2000.org/

The "⍳" character is entered by typing ALT+i

Explanation:

  1E9 is 1,000,000,000
  ⍳1E9 generates a vector containing integers from 1 to 1,000,000,000 (inclusive)
  +/ is the sum of the vector

Any good APL interpreter will not actually generate a vector with a billion numbers but rather recognize the above expression and optimize the resulting operation for speed and minimal resource utilization.

EDIT:

Technically the "/" is the "reduction" operator acting along the last axis. In the case of a single dimensional array it acts along the only available axis. If, instead, it was acting on a matrix it would reduce along the columns. Here's a longer annotated example with output from the interpreter:

  Generate a vector from 1 to 10:
      ⍳10
  1 2 3 4 5 6 7 8 9 10

  Sum:
      +/⍳10
  55

  Generate a vector of ten one's and zero's, repeating until the end:
      10⍴1 0
  1 0 1 0 1 0 1 0 1 0

  Now use that vector to reduce the original 1 to 10 vector, grabbing every other
  element starting with the first.  The effective result is that you end up with 
  all the odd numbers between 1 and 10:
      (10⍴1 0)/⍳10
  1 3 5 7 9

  Same things, now grabbing the even numbers by flipping the 1 0 sequence to 0 1:
      (10⍴0 1)/⍳10
  2 4 6 8 10

  Sum of all odd integers between 1 and 10:
      +/(10⍴1 0)/⍳10
  25

    Sum of all even integers between 1 and 10:
      +/(10⍴0 1)/⍳10
  30

  One could create a single vector with the odd integers between 1 and 10 followed by
  the even integers between 1 and ten by simply concatenating the generating
  expressions (APL executes from right to left):
      ((10⍴1 0)/⍳10),(10⍴0 1)/⍳10
  1 3 5 7 9 2 4 6 8 10

  And then you can reshape ("⍴") the result into a matrix:
      2 5⍴((10⍴1 0)/⍳10),(10⍴0 1)/⍳10
  1 3 5 7  9
  2 4 6 8 10

  Finally, use the scan operator again, now applied to a matrix, to sum along the 
  columns and produce a result for each row.  The effect is to output a two element
  vector with the sum of the odd integers between 1 and 10 as the first element
  and the sum of all the even integers between 1 and 10 as the second:
      +/2 5⍴((10⍴1 0)/⍳10),(10⍴0 1)/⍳10
  25 30

If you want to try this type the lines immediately following my comments above right into the interpreter. The rho "⍴" or reshape operator is entered by typing ALT+r.

Hope this helps make sense of it. Of course, there are other ways to accomplish the same thing.

https://en.wikipedia.org/wiki/APL_syntax_and_symbols

swirepe · on Aug 5, 2013

Or J, if you're into that sort of thing

    +/i.1E9+1

You can get that here: http://www.jsoftware.com/stable.htm

robomartin · on Aug 5, 2013

Seeing that some are adding timings to the posted solutions.

The APL solution I posted above (+/⍳1E9) takes 56 microseconds to execute on my system (checked by solving it 100,000 times in a loop).

The interpreter is obviously not doing a huge memory and clock-cycle sucking expansion of a one billion element vector.

This highlights another advantage of a symbolic language: Idioms or patterns within code can be recognized replaced with equivalent highly-efficient, highly-tuned operations. The above example is obviously solved by having the interpreter swap it out for the well-understood mathematical solution.

Because of this the programmer can focus on the problem space rather than having to dork around with figuring out optimizations. Granted, this is a simple one for anyone with a decent math background, but it can get far more complex. Have a look at the famous Finn APL idiom library:

http://aplwiki.com/FinnAplIdiomLibrary

An interpreter designer might very well decide to detect a good number of these idioms and execute highly tuned code instead of the memory and CPU-cycle hogging expansions that might result from running the actual code as written.

In many ways I equate this to what happens when using a language like Verilog to design FPGA circuits. You are designing hardware, not software. FPGA compilers have inference engines that recognize certain structures to mean specific circuit constructs. It's a contract. We agree that when I write this I mean to ask that you instantiate that and everyone is happy.

neurostimulant · on Aug 5, 2013

Python 2.7 (Mid 2012 Macbook Pro, 2.5 GHz i5, 8GB, not using SSD)

sum + xrange (consumes ~20MB virtual memory):

    $ time python2.7 -c "print sum(xrange(1,1000000001))"
    500000000500000000
    python2.7 -c "print sum(xrange(1,1000000001))"  11.06s user 0.02s system 99% cpu 11.089 total

reduce + xrange (consumes ~20MB virtual memory):

    $ time python2.7 -c "print reduce(lambda a, b: a + b, xrange(1,1000000001))"
    500000000500000000
    python2.7 -c "print reduce(lambda a, b: a + b, xrange(1,1000000001))"  128.74s user 0.13s system 94% cpu 2:16.51 total

My machine swapping like crazy for more than an hour when I try using range(). I suspect it hasn't even finished allocating the list when I kill the process after it consumes >30GB virtual memory.

    $ time python2.7 -c "print sum(range(1,1000000001))"

GregWright · on Aug 5, 2013

3 milliseconds, man, the C optimizers blow my mind, they basically just cheat and stick the answer in there. :-)

[C]cat sum.c #include <stdio.h> int main(void) { unsigned long long sum = 0, i; for (i = 0; i <= 1000000000; i++) //one billion sum += i; printf("%lld\n", sum); //500000000500000000 return 0; }

[C]time ./a.out 500000000500000000

real 0m0.003s user 0m0.001s sys 0m0.001s [C]gcc -O3 -S sum.c [C]cat sum.s .section __TEXT,__text,regular,pure_instructions .globl _main .align 4, 0x90 _main: Leh_func_begin1: pushq %rbp Ltmp0: movq %rsp, %rbp Ltmp1: leaq L_.str(%rip), %rdi movabsq $500000000500000000, %rsi xorb %al, %al callq _printf xorl %eax, %eax popq %rbp ret

recursive · on Aug 5, 2013

C#: Enumerable.Range(1, 1000000000).Select(Convert.ToInt64).Sum()

christopheraden · on Aug 5, 2013

I asked a similar question for R about a year ago, and saw an interesting way to do it, taking advantage of the math. http://stackoverflow.com/questions/11623865/faster-modulo-or...

minimaxir · on Aug 5, 2013

Fun fact: in R, sum(1:1E07) will throw a warning; you have to use sum(as.numeric(1:1E07)) instead, which will indeed give the correct answer.

ChuckMcM · on Aug 5, 2013

So nobody does it?

  $sum = int($max/2) * ($max+1));
  $sum += int(($max+1)/2) if ($max & 1);

This is perl of course but it is exploiting the fact that the sum of integers is a sum of constants ($max + 1) with an additional term (the 'middle' integer) if the top number is odd.

gshubert17 · on Aug 5, 2013

SBCL: Commenter postfuturist said that this code:

(time (let ((sum 0)) (loop :for x :from 1 :to 1000000000 :do (incf sum x)) sum))

took about 3 seconds from his REPL with SBCL, with about 8.5 billion CPU cycles and 0 bytes consed.

Does anyone know why the same code on my version of SBCL (1.0.55.0-abb03f9) on a Mac took 156 billion cycles and consed 24 billion bytes?

jdtw · on Aug 5, 2013

That's an old(ish) version. They've probably done a bit of optimization. For example, with the (speed 3) optimization, and letting the compiler know that sum is a fixnum, I can get it down to <2 billion cycles: See my answer here: http://stackoverflow.com/a/18065714/2423072

waterhouse · on Aug 5, 2013

Consing sounds like it's allocating bignums. My guess is that you're using a 32-bit build of SBCL. In that case, fixnums only go up to something like 2^30, and arithmetic with larger numbers will allocate memory. Can you check?

  * (log most-positive-fixnum 2) ;on 64-bit
  
  62.0

gshubert17 · on Aug 5, 2013

You're right. I have a 32-bit build, since I get:

* (log most-positive-fixnum 2) 29.0

Thanks.

waynecochran · on Aug 5, 2013

  /*author: Gauss */
  var n = 1000000000;
  var sum = n*(n+1)/2;

cgh · on Aug 5, 2013

Not sure why you're mentioning this as it's in the SO question:

"The correct answer can be calculated using

1 + 2 + ... + n = n(n+1)/2"

ck2 · on Aug 5, 2013

Knowing how to use a language is critical to get expected results.

This gives the proper result in PHP by forcing the integer cast.

    $sum = (int) $sum + $i;

fooyc · on Aug 5, 2013

This casts only $sum to int. If $i is a float, the result is a float.

With 32 bits, $sum will be a float bigger than PHP_INT_MAX at some point, and the cast will truncate it, which is unlikely to give the right answer.

With 64 bits, which I assume is the platform you tested on, $sum is never bigger than PHP_INT_MAX, so it's never converted to a float, and the (int) cast does nothing (and the computation is done right).

ck2 · on Aug 5, 2013

The result they are showing is clearly on a 64bit machine, my 32bit windows php overflows into a signed integer.

I posted initially with casting (int) $i but I tested it without and it worked too.

In fact on my 64bit centos machine PHP 5.4 doesn't need the cast at all and the original code gives the correct result the first time as is.

ngoel36 · on Aug 5, 2013

Correction, properly learning about data types and binary representation is critical to "get" programming.

I've found that web developers often undervalue (or even scoff at) web developers with CS degrees, but any first-year CS/EECE student at a decent university (or even someone who has taken a couple of Coursera CS classes) would have instantly recognized this to be a floating point related issue. Being able to whip up some scaffolds in Rails is great and immensely useful, but I'm of the opinion that a great Rails programmer will need a proper understanding of the C & Assembly that's behind all the magic.

bradleyland · on Aug 5, 2013

It's both, really. Knowing the correct data type won't get you anywhere if you use a language like PHP and assume that it will always correctly guess the required data type for a given problem.

I find that when it comes to programmers who reach a certain level of proficiency, it doesn't matter whether they learned these things through a CS degree, or through hard knocks.

I started out with VBScript and ASP 1.0. I didn't get a CS degree, but I ended up learning about floating point errors rather quickly. This lead me to learn more about data types. Similar result, different path. I ended up paying for my error, much like someone would pay for an education.

There are certainly bad programmers without CS degrees who make these mistakes, but I know plenty of programmers who hold a CS degree and will consistently misuse float unless reminded to use something more appropriate.

pc86 · on Aug 5, 2013

It's been a long time since I've worked with PHP, I assume

    $sum += (int) $i;

will still convert to float once the size of $sum gets to the requisite size?

radiospiel · on Aug 5, 2013

One of the reasons to stay away from PHP. Requesting and int but getting a float regardless? That doesn't sit well w/me.

ars · on Aug 5, 2013

Yes. fooyc explained it correctly. ck2 did not.

deerpig · on Aug 5, 2013

It would have been interesting to see this problem solved in many different languages. But I guess that would kill the question on Stackoverflow.

arh68 · on Aug 5, 2013

Interesting, sure, but incredibly hard to create with an uncoordinated Internet community. It'd take 1 good programmer/writer, ideally.

Without a clear, quantitative standard as to which Python* implementation is best, you'll find: a one-liner that claims to be 'pythonic', a 150-liner that's slightly more efficient, and a 30-liner that looks like it was written in C. The more popular the language, the wider the variety of responses. The more helpful the language (that is, 'newbies' can contribute solutions to otherwise difficult problems), the more solutions. The more divisive the language, the more bickering you'll have.

The Programming Language Shootout doesn't suffer this fate because it is 0% subjective. Discussion is only interesting with subjectivity, but technical discussion is presumed to be mostly objective. StackOverflow treads the line carefully.

1 writer could do it. 2, if they don't disagree on anything.

* I don't mean to pick on Python. Python is great. Insert language-of-choice

VMG · on Aug 5, 2013

I don't think it would be that interesting - and I don't think we need to rediscover the fact that some languages use IEEE754 as the default number type over and over again

echohack · on Aug 5, 2013

I think demonstrating a set of features like this between a large number of langauges: say 20 to 50 would be a grand demonstration.

VMG · on Aug 5, 2013

That's what http://www.rosettacode.org is for.

I agree that stackoverflow.com sometimes is a little too trigger-happy when it comes to closing questions, but we really don't need a community-wiki top answer with 1845 upvotes that is basically a rosetta code page, spawing dozens of copy-cat questions with slight variations on the theme.

j-b · on Aug 5, 2013

Visual LANSA:

  begin_loop using(#int8) to(1000000000)
    #r_dvp += #int8
  end_loop

From which VL will then generate 2,050 lines of C++.

bitwize · on Aug 5, 2013

(display (/ (* 1000000000 1000000001) 2))

diego · on Aug 5, 2013

Scala:

(1L to 1000000000L).sum

epochwolf · on Aug 5, 2013

Ruby

     (1..1000000000).inject(:+)

RussianCow · on Aug 5, 2013

Python:

    sum(range(1000000000))

:)

dragonwriter · on Aug 5, 2013

Did you intend to demonstrate a very common off-by-one error in Python?

knome · on Aug 5, 2013

Probably should've used xrange as well. In the 2.x series, Python's range function returns an actual list.

I believe in 3.x range has been replaced with xrange.

RussianCow · on Aug 5, 2013

I was assuming Python 3, but yes, you are correct.

masklinn · on Aug 5, 2013

> I believe in 3.x range has been replaced with xrange.

Correct.

DerekL · on Aug 5, 2013

It's not right. You should add one to the bound because "range" uses half-open intervals. Also, it only works in Python 3; in Python 2, it creates a list of a 1000000000 integers, so you should use "xrange" instead.

RussianCow · on Aug 5, 2013

Yep, my bad. Didn't really think before I posted. And yes, I was assuming Python 3, so use xrange on Python 2.

fjcaetano · on Aug 5, 2013

Python is gold

NelsonMinar · on Aug 5, 2013

And Python's integer type is gold-plated.

gshubert17 · on Aug 6, 2013

Does that make pypy platinum?

time pypy -c "print sum(xrange(1000000001))"

is 2.0 seconds on my Mac; a C program with 8-byte ints (long or long long) takes 2.6 seconds.

terranstyler · on Aug 5, 2013

Posted on stackoverflow, what an irony...

recursive · on Aug 5, 2013

There's no stack overflow or even numeric overflow here. This is a floating point precision problem.

terranstyler · on Aug 5, 2013

Ted Hopp thinks it is transformed to float in order to avoid an SO. Unless you know better, the irony is valid!

pdappollonio · on Aug 5, 2013

Hahaha. Best. Comment. Ever. :-)

michaelochurch · on Aug 5, 2013

Clojure:

    (reduce + (range 1000000001))

Written that way, though, it takes a long time (108 sec, compared to 2.6 sec in C). So that's not fast, as elegant as it may be.

This is what I have for C. It's probably not great C code.

    int main() {
      long res = 0;

      int i;
      for (i = 0; i <= 1000000000; i++) {
        res += i;
      }

      printf("%ld\n", res);
      return 0;
    }

Faster than (naive?) C is this Clojure loop (no range object).

    user=> (time (loop [i 0 res 0] 
              (if (> i 1000000000) 
              res
              (recur (inc i) (+ res i)))))
    "Elapsed time: 1612.094 msecs"
    500000000500000000

buo · on Aug 6, 2013

In Julia:

    julia> @time sum(1:1000000000)
    elapsed time: 1.4527e-5 seconds (88 bytes allocated)
    500000000500000000

@time is a macro that prints the code's execution time and memory usage. This is inside a VM running Linux.

akurilin · on Aug 5, 2013

Do you know of any good guides for how to fine-tune Clojure in situations where you need to trade elegance for performance?

What you did that is great, so I'm wondering where I'd be able to find out more about such techniques.

michaelochurch · on Aug 5, 2013

I don't. I'm far from an expert on high-performance Clojure. (I'm really glad that there is such a thing, and that people focus on it, however.) Joy of Clojure and Programming Clojure get into optimizations a little bit, but I think that field is still fairly new.

Sometimes with seqs one can end up with a "holding head" problem; if you're doing stream processing but holding on to a seq, you can end up having the whole thing in memory, which would kill you. That's not what's happening there, though; a default-configured JVM can't hold anything close to a billion longs in memory.

One of the neat things is that, because the REPL actually compiles code (there's no interpreter) you get the same performance with the time macro as you would get in compiled code. What that means is that testing for performance can be done at the REPL and quickly.

To explain what I did and why, I figured that the tight loop would be optimized to Java-like performance. With the more elegant formulation, I didn't know what was going on in terms of types (how is +, a vari-aritied function with many type signatures, being handled)? If the loop performed poorly, I'd probably have put type hints on the arguments and replaced + with unchecked-add; but it performed well so I left it as it was.

hannibal5 · on Aug 5, 2013

Common Lisp:

  * (loop for i from 1 to 1000000000 sum i)

  500000000500000000
  *

hannibal5 · on Aug 5, 2013

;; let's time it * (time (loop for i from 1 to 1000000000 sum i))

  Evaluation took:
    2.374 seconds of real time
    2.372148 seconds of total run time (2.372148 user, 0.000000 system)
    99.92% CPU
    8,071,475,337 processor cycles
    0 bytes consed
  
  500000000500000000


  * (disassemble (lambda () (loop for i from 1 to 1000000000 sum i)))

  ; disassembly for (LAMBDA ())
  ; 02C21574:       BB02000000       MOV EBX,   2                 ; no-arg-parsing entry point
  ;       79:       31C9             XOR ECX, ECX
  ;       7B:       EB27             JMP L1
  ;       7D:       90               NOP
  ;       7E:       90               NOP
  ;       7F:       90               NOP
  ;       80: L0:   48895DF8         MOV [RBP-8], RBX
  ;       84:       488BD1           MOV RDX, RCX
  ;       87:       488BFB           MOV RDI, RBX
  ;       8A:       4C8D1C25E0010020 LEA R11,  [#x200001E0]      ; GENERIC-+
  ;       92:       41FFD3           CALL R11
  ;       95:       480F42E3         CMOVB RSP, RBX
  ;       99:       488BCA           MOV RCX, RDX
  ;       9C:       488B5DF8         MOV RBX, [RBP-8]
  ;       A0:       4883C302         ADD RBX, 2
  ;       A4: L1:   483B1D25000000   CMP RBX, [RIP+37]
  ;       AB:       7ED3             JLE L0
  ;       AD:       488BD1           MOV RDX, RCX
  ;       B0:       488BE5           MOV RSP, RBP
  ;       B3:       F8               CLC
  ;       B4:       5D               POP RBP
  ;       B5:       C3               RET
  ;       B6:       CC0A             BREAK 10  ; error trap
  ;       B8:       02               BYTE #X02
  ;       B9:       18               BYTE #X18                  ; INVALID-ARG-COUNT-ERROR
  ;       BA:       54               BYTE  #X54                  ; RCX
  ;       BB:       90               NOP
  ;       BC:       90               NOP
  ;       BD:       90               NOP
  ;       BE:       90               NOP
  ;       BF:       90               NOP
  ;       C0:       90               NOP
  ;       C1:       90               NOP
  ;       C2:       90               NOP
  ;       C3:       90               NOP
  ;       C4:       90               NOP
  ;       C5:       90               NOP
  ;       C6:       90               NOP
  ;       C7:       90               NOP
  ;       C8:       90               NOP
  ;       C9:       90               NOP
  ;       CA:       90               NOP
  ;       CB:       0000             ADD [RAX], AL
  ;       CD:       0000             ADD [RAX], AL
  ;       CF:       0000             ADD [RAX], AL
  ;       D1:       94               XCHG EAX, ESP
  ;       D2:       3577000000       XOR EAX, 119
  ;       D7:       0000             ADD [RAX], AL
  NIL