Computer scientists at the Energy Department’s Sandia National Laboratories have simultaneously booted one million Linux kernels, all of which ran as virtual machines on the labs’ Thunderbird supercomputer. The researchers, Ron Minnich and Don Rudish, hope to use their million virtual-machine network to better understand how botnets operate.
A million virtual machine is the largest number that has ever been spun up on a single system, to the best of the researchers’ knowledge. Previously, they were only able to boot 20,000 virtual instances at once.
The Department of Energy’s Office of Science, the National Nuclear Security Administration’s (NNSA) Advanced Simulation and Computing (ASC) program all funded the two-year project. Dell and IBM contributed technical expertise to the experiments.
Thunderbird is a 4,480-node Dell-based computer cluster. Each node ran 250 Linux kernels. The host OS on each node is a stripped-down version of Linux kernel, compiled by the researchers themselves. It contains only the kernel core and and a start-up script that boots up the virtual machines. “The root file system lives out in Random Access Memory,” Rudish said.
For the virtualization, the system uses a hypervisor built into the Linux kernel, called Lguest, which was developed by the research arm of IBM. Although it is still in the development stage, Sandia chose Lguest because it is “very fast and very lightweight,” Minnich said. On Thunderbird, the start-up for each virtual machine is a fraction of second, Rudish said. “The bottleneck reading in the configuration file, which is a million lines long.”
The management software is OneSis, which was originally developed by Sandia. “OneSis is pretty key to making this thing work at all. It is good at managing thousands of thousands of nodes in a very easy way,” Minnich said. All the virtual machines are networked, through both virtual Linux-based routers and Sandia’s own backbone routers.
The researchers spun up the million nodes in late June, a job that consumed all of Thunderbird’s resources. Starting in October, the lab will use the virtual network of machines to study how Botnets operate. “This is essentially the preparatory work,” Rudish said.
Typically used by spammers, Botnets are made up of thousands or even millions of Internet-connected personal computers. The owners of such machine are typically unaware that their machines have been infected with secret programs that do the bidding of the botnet operator. Botnet operators tend to deploy their creations for spamming, distributed denial-of-service attacks and other nefarious activities.
Botnets are difficult to study in the wild, since the computers are geographically dispersed. By approximating the size of a good-sized botnet, the researchers can understand how they operate and the effects they have.
“If you want to take a look at what is really threatening the Internet, we have to talk about the scale of the network we are working with. One million gets us pretty close to understanding these botnets,” Rudish said.
The researchers say that the next step is to add into the virtual instances additional software to approximate the environment, such as e-mail or Web servers, and the Botnet client applications as well.
Beyond the study of botnets, the researchers maintain that their work will help in understanding how to manage large systems in general.
“Anything that scales to a million, it is impossible to watch any single thing. So you need to have this be a highly-automated self-maintaining system,” Minnich said. By 2018, new supercomputers coming online will have 100 million CPUs or more. “The lessons we’re learning for this project we’re pretty sure will feed into the supercomputers we’re building in 2018,” he said.