Get ready for the cloud!

A new paradigm in scientific computing

@jjmerelo

What is the cloud?

Cloud is where you run your programs

Virtualized resources on tap

Scaling out of the box

Infrastructure as code

Distributed, multi-vendor, computing

Reproducible configurations

Reproducible science

A new application development and deployment paradigm

.. designed around scaling

Why use the cloud in scientific computing?

✓ It's new!

✓ No sunk costs!

✓ It scales!

➡ It changes the algorithmic paradigm

♻ Let Nature be your guide

Cloud is about reproducible infrastructure

Let's containerize

Describe infrastructure: package.json

{
  "name": "hiffeitor",
  "scripts": {
    "test": "mocha",
    "start": "./callback-ea-HIFF.js"
  },
  "dependencies": {
    "nodeo": "^0.2.1",
    "winston": "^2.2.0",
    "winston-logstash": "^0.2.11",
    "winston-papertrail": "^1.0.2"
  },
  "devDependencies": {
    "flightplan": "^0.6.14"
  }
}

Introducing docker

Lightweight virtualization

Portable infraestructure

Using docker

docker pull jjmerelo/cloudy-ga

Containerizing through Dockerfile

FROM phusion/baseimage
MAINTAINER JJ Merelo "jjmerelo@gmail.com"
RUN echo "Building a docker environment for NodEO"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get upgrade -y
RUN apt-get install apt-utils -y 
[... more stuff ... ]
ADD https://github.com/JJ/cloudy-ga/raw/master/app/hiff.json app
WORKDIR /app
RUN npm i
RUN chmod +x callback-ea-HIFF.js
CMD npm start

Bring your own container

sudo docker build --no-cache -t jjmerelo/cloudy-ga:0.0.1

... and run it

 sudo docker run -t jjmerelo/cloudy-ga:0.0.1  
     -e "PAPERTRAIL_PORT=7777" 
     -e "PAPERTRAIL_HOST=logs77.papertrailapp.com"

Logging matters

Papertrail

It's not programming as usual

Reactive programming

Algorithm + stream = application in the cloud

Decoupled processing and data structures

Running in the cloud

Infrastructure as a service

Create instance

Starting
		       Azure instance

Set up with Ansible

- hosts: "{{target}}"
  tasks:
    - name: install prerrequisites
      command: apt-get update -y && apt-get upgrade -y
    - name: install packages
      apt: pkg={{ item}}
      with_items:
        - git 
        - npm
    - name: Create profile
copy: content="export PAPERTRAIL_PORT={{PAPERTRAIL_PORT}}" 
      dest=/home/cloudy/.profile

Run the playbook

Running ansible

Ready to run ✓

Running in  azure

But there's something missing here

Deploying to the cloud

Let's use FlightPlan

plan.target('azure', {
  host: 'cloudy-ga.cloudapp.net',
  username: 'azureuser',
  agent: process.env.SSH_AUTH_SOCK
});
// Local
plan.local(function(local) {
    local.echo('Plan local: push changes');
    local.exec('git push');
});

... And after setup

plan.remote(function(remote) {
    remote.log('Pull');
    remote.with('cd cloudy-ga',function() {
	remote.exec('git pull');
	remote.exec('cd app;npm install .');
    });
    remote.with('cd /home/azureuser/cloudy-ga/app',function() {
	remote.exec('npm start');
    });
});

IaaS have free tiers

But it generally is pay-as-you-go

Great if you do small amounts of computation

Make do without a server

Platform as a service

There's freemium PaaS

Heroku, OpenShift, IBM's BlueMix and Google AppSpot

it's openshift capture

Pool-based evolutionary algorithms: not so canonical any more

Detaching population from operations

Reactive programming.

pool schema

Three good things about pool-based EAs

1. Self-organizing clients

2. Fully asynchronous

3. Persistent population

Island models can be used too

Deploy server to PaaS ✓

Deploy clients to IaaS ✓

➡ And do science!

papertrail log

Logs glue everything together

Come together now!

Vagrant for orchestration

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/xenial64"
  config.vm.provision "shell", inline: <<-SHELL
     apt-get update
     apt-get upgrade -y
  SHELL
  config.vm.provision "main", type: "ansible" do |ansible|
    ansible.extra_vars = { target: "all" }
    ansible.playbook = "playbook.yml"
  end
  # and the rest...
end

All together

✓ Get servers ➡ PaaS, Loggers

✓ Create/provision boxes ➡ Vagrant + Ansible

✓ Deploy/run ➡ FlightPlan

Take this home

  1. Cloud is the new (grid|cluster)
  2. There is an (almost) free lunch
  3. Reactive programming
  4. We should ❤ logs

Questions?

Tweet out (of follow) @jjmerelo

Credits