More 96 core benchmarks

Yesterday I spent some time with a 96 core ARMv8 server. On day two I figured out a couple more things about that server.

First and foremost, the extended path of installing Docker on the server I chronicled yesterday ended up being much easier today. A simple

apt-get install docker.io

did the right thing to bring Docker 1.12.1 into the system. Don’t do

apt-get install docker

on Ubuntu; you’ll get “docker - System tray for KDE3/GNOME2 docklet applications” instead. The original release announcement for Docker on Ubuntu explains why.

To characterize performance on real workloads, I compiled some packages from source. I did my best to force parallelism in the build. GNU Make supports a “-j” flag that tries to run more jobs at once, and that makes a big difference. It’s worth reading makefiles before you build on a system like this, because there may be special considerations for parallel builds.

Node.JS master built in 4.5 minutes, about a 38x speedup.

Go for linux/arm64 built in under 9 minutes instead of an hour.

With the help of the hosting provider, we also looked at power consumption of the server. Reported power used when idle was 212 watts, and the reported power under a heavy workload by the stress test harness was about 80 watts more, at 292 watts. Roughly speaking that’s 2 watts per core idle, and 3 watts per core when loaded. Stress is from Amos Waterland; it runs nicely on Posix systems.

There is a version of firefox for the system that installs with apt-get install firefox, and I forwarded X11 from the data center to the cafe to see how it worked. It loads fine, but visiting pages like umich.edu result in a failure. I didn’t find an easy bug repository to report that too, but it was repeatable. If I do this again, I’ll

The “build Docker from scratch” project resulted in the build failing with TestOverlay128LayerRead, which is an issue with the overlay2 filesystem. It turns out that’s a much more interesting bug than yesterday’s Docker bug, since we can pinpoint how many layers can be overlaid (61, but not 62) before the failure.

One of the things I tested is Facebook’s new “yarn” package manager for Node.JS - it seemed to work just fine, but I had hoped that it would have some kind of bonus speed from parallelism, and I didn’t see that. Node is notably not a parallel language.

All in all a good day of trying to understand what the high performance world looks like when you have lots and lots of little cores. This is obviously not a server for every workload, but for the ones that match it should be great.