Migrating Off Heroku (2022-09-02)

Starting in 2009, I used Heroku to give public access to demo sites for some libraries and applications I've created. Heroku worked well for me, not just because of the ease of use, but also because it was completely free. In over 12 years of use, I never paid Heroku a dime. As the saying goes, "all good things must come to an end", and on August 25, 2022, Heroku announced that they would stop offering their free tier in about three months, so I knew I had to migrate to another platform. I decided to run the demo applications on my own virtual machine (the same one that handles my inbound email). This describes the reasons behind that choice, and how I completed that migration.

Background

I started using Heroku in late 2009, and at one point, I had around 8 separate demo applications running on Heroku, all running on the free tier. Heroku made updating the applications simple. For the related repositories, I had a heroku git remote, and running git push heroku master would update the related demo application on Heroku. All applications had a related database, and at the time, there was a 5MB limit on database size, but no restrictions on the number of rows. This worked well for my demo applications. This was the easiest time I had using Heroku.

In 2012, Heroku changed the policies for their free PostgreSQL database tier, moving from a limit of 5MB to a limit of 10,000 rows. This affected one of my applications, which had significantly less than 5MB of data, but more than 10,000 rows. I had to switch one of my applications from storing some data as normal rows, to using an array of composite types, just to work around this limit. I ended up having to use a similar approach in other applications over the years. This was kind of annoying, but it did work fine.

In 2016, Heroku changed the policies for accounts to only allow a specific number of free hours per account per month. If you wanted to have the ability for one application to run on a free dyno for the entire month, you had to give them a credit card. Previously, I hadn't needed to provide a credit card, so I was never worried about being charged. Begrudgingly, I did give them a credit card, and to Heroku's credit, I was never billed on it.

As I mentioned, I previously had around 8 separate demo applications running on Heroku, and starting in 2016, I could only run a single application going forward. Thankfully, by that time, my demo applications all used Roda as the web framework. Unlike Ruby on Rails, you can easily run multiple Roda applications in the same process. So I decided to run all demo applications in the same Ruby process, using a simple Rack-based router to dispatch to the correct application based on the request host name. This required me to make sure each demo application was properly namespaced, which did take a little work, but not too much.

Since 2016, I added a couple more demo applications, so for the last few years I have had 10 Roda applications running in the same Ruby process, with a total slug size of 50MB, of which about 14MB was pictures for one of my applications.

Here are what the important parts of the config.ru file looked like for running all of those applications in the same process. First, I had to map the appropriate Heroku database URL environment variables to the application specific environment variables:

{
  'autoforme'=>'red',
  'falcomcds'=>'ivory',
  'forme'=>'aqua',
  'giftsmas'=>'olive',
  'kaeruera'=>'gray',
  'quinto'=>'violet',
  'rodauth'=>'pink',
  'spam'=>'teal',
  'lila_shell'=>'jade',
}.each do |k, v|
  ENV["#{k.upcase}_DATABASE_URL"] = ENV["HEROKU_POSTGRESQL_#{v.upcase}_URL"]
end
ENV["CSPVR_DATABASE_URL"] = ENV.delete("DATABASE_URL")

Then I had to require all of the applications. These applications used git submodules in the repository pushed to Heroku.

require_relative 'autoforme/demo-site/autoforme_demo'
require_relative 'falcomcdcatalog/falcomcdcatalog'
require_relative 'forme/demo-site/forme_demo'
require_relative 'giftsmas/giftsmas'
require_relative 'kaeruera/kaeruera_app'
require_relative 'quinto/lib/quinto/app'
require_relative 'rodauth/demo-site/rodauth_demo'
require_relative 'spam/spam'
require_relative 'lila_shell/lila_shell'
require_relative 'cspvr/app'

Then I had a simple Rack app that would dispatch incoming requests to the appropriate Roda app, based on the request host name:

apps = Hash.new(proc{|_| [404, {'Content-Length'=>'12'}, ["Invalid Host"]]}).update(
  'autoforme-demo.jeremyevans.net'=>AutoFormeDemo::App.freeze.app,
  'falcomcdcatalog.jeremyevans.net'=>Falcom::App.freeze.app,
  'forme-demo.jeremyevans.net'=>FormeDemo::App.freeze.app,
  'giftsmas-demo.jeremyevans.net'=>Giftsmas::App.freeze.app,
  'kaeruera-demo.jeremyevans.net'=>KaeruEra::App.freeze.app,
  'quinto-demo.jeremyevans.net'=>Quinto::App.freeze.app,
  'rodauth-demo.jeremyevans.net'=>RodauthDemo::App.freeze.app,
  'spam-demo.jeremyevans.net'=>Spam::App.freeze.app,
  'lilashell-demo.jeremyevans.net'=>LilaShell::App.freeze.app,
  'cspvr-demo.jeremyevans.net'=>Cspvr::App.freeze.app,
)

run(proc{|env| apps[env['HTTP_HOST']].call(env)})

Considering Options

When Heroku announced they were shutting down their free tier, I read about a bunch of other similar smaller companies with various free options. Some of those companies even had the ability to easily port applications from Heroku to their platform. However, I think that ultimately most of those companies will either go out of business, or go the same route as Heroku. The one option that I looked into was using Google Cloud, but while it could possibly host the compute for free, they don't offer a free PostgreSQL database.

I probably would have switched to one of the other companies offering a free service, if I didn't already have a barely-used virtual machine for receiving inbound email. Since I did have that virtual machine, I decided to move the applications to that virtual machine, which would not result in any additional cost.

Migrating the Applications

These applications aren't high priority, don't change often, and all are fairly similar in terms of architecture. I don't need Heroku's fancy features of automatically building containers during each repository push. I don't need to autoscale the number of webservers or the amount of processing power available to the database. I'm comfortable manually updating the applications in the virtual machine. That's probably not the situation most people are in, so I consider myself lucky.

The virtual machine runs OpenBSD, so I started off installing Git, PostgreSQL, and some gems that will be used. These OpenBSD ruby31-* packages include all of the gems that the applications use that include C extensions. This will also install Ruby 3.1, since that is one of the dependencies.

# pkg_add git postgresql-{server,contrib} ruby31-{bcrypt,pledge,puma,sassc,sequel_pg,sqlite3,subset_sum}

The Ruby 3.1 installation printed information about setting up symbolic links. Since Ruby 3.1 will be the default Ruby version for this virtual machine, I ran those commands:

# ln -sf /usr/local/bin/ruby31 /usr/local/bin/ruby
# ln -sf /usr/local/bin/bundle31 /usr/local/bin/bundle
# ln -sf /usr/local/bin/bundler31 /usr/local/bin/bundler
# ln -sf /usr/local/bin/erb31 /usr/local/bin/erb
# ln -sf /usr/local/bin/gem31 /usr/local/bin/gem
# ln -sf /usr/local/bin/irb31 /usr/local/bin/irb
# ln -sf /usr/local/bin/rdoc31 /usr/local/bin/racc
# ln -sf /usr/local/bin/rake31 /usr/local/bin/rake
# ln -sf /usr/local/bin/rdoc31 /usr/local/bin/rbs
# ln -sf /usr/local/bin/rdoc31 /usr/local/bin/rdbg
# ln -sf /usr/local/bin/rdoc31 /usr/local/bin/rdoc
# ln -sf /usr/local/bin/ri31 /usr/local/bin/ri
# ln -sf /usr/local/bin/typeprof31 /usr/local/bin/typeprof

Then I installed all of the pure Ruby gems (those without C extensions) that are used by the applications:

# gem install -N enum_csv erubi jwt mail rack-unreloader refrigerator roda roda-message_bus rotp rqrcode sequel thamble tilt

The -N flag is actually important here, because the virtual machine only has 512MB of RAM, and generating documentation for the mail gem can actually exceed that.

I used pkg_add to install OpenBSD packages for gems requiring C extensions, and installed other Ruby gems using gem install. This is mainly to make upgrading to newer versions of OpenBSD easier. I'll be upgrading this virtual machine to the newest version of OpenBSD every 6 months.

Then I needed to create a directory for the application. This directory will be owned by my own account (jeremy), but I'm going to run the application as a different user:

# mkdir /var/www/app
# chown jeremy /var/www/app

Then I changed to the directory I created, and checked out each application into a subdirectory of that directory. This is basically the same as the what I was doing on Heroku, except without using git submodules. I also limited access to each git directory to my own user using chmod, so it would not be accessible by the user running the application.

$ cd /var/www/app
$ for x in autoforme cspvr falcomcdcatalog forme giftsmas kaeruera lila_shell quinto rodauth spam; do
    git clone https://github.com/jeremyevans/$x
    chmod 700 $x/.git
  done

Then I setup a PostgreSQL database cluster, using the instructions at /usr/local/share/doc/pkg-readmes/postgresql-server:

# su - _postgresql
# mkdir /var/postgresql/data
# initdb -D /var/postgresql/data -U postgres -A scram-sha-256 -E UTF8 -W

The initdb command is going to ask for a password, so I generated that with openssl rand -base64 12, and pasted it in when asked. I also created a ~/.pgpass file with appropriate 0600 access permissions, so I don't need to past the password in every time.

After that completed, I started the database server using rcctl:

# rcctl start postgresql

I then created PostgreSQL users and databases for each of the applications, with each application having a database user that owns the related database, but is not a superuser. Because the PostgreSQL version in use is 14, I also locked down each database (this should not be necessary in PostgreSQL 15+):

$ for x in autoforme cspvr falcomcdcatalog forme giftsmas kaeruera lila_shell quinto rodauth spam; do
    createuser -U postgres $x
    createdb -U postgres -O $x $x
    psql -U postgres -c "GRANT ALL ON DATABASE $x TO $x;"
    psql -U postgres -c "REVOKE ALL ON DATABASE $x FROM public;"
    psql -U postgres -c "GRANT ALL ON SCHEMA public TO $x;" $x
    psql -U postgres -c "REVOKE ALL ON SCHEMA public FROM public" $x
  done

The Rodauth demo application needs the PostgreSQL citext extension installed, so I added that to the database that the Rodauth demo application uses.

$ psql -U postgres -c "CREATE EXTENSION citext" rodauth

The recommended way to run Rodauth is to have a multiple database users, so the database user the application uses does not have access to table containing the password hashes. However, as Heroku didn't support that approach, and I am going to import the database from Heroku, I won't be using that approach here.

I didn't create a password for each PostgreSQL account when creating the accounts, but since that's a good practice, I wrote a Ruby program for that named create-env.rb. This program sets a password for each account, and records the password created so I can set the appropriate environment variable in the app. It also generates session secrets for the applications using sessions. In addition, it adds a few additional environment variables needed by the applications (taken from the Heroku configuration). This program creates an .env.rb file with all of that information:

Dir.chdir '/var/www/app'

require 'securerandom'

raise "already ran" if File.file?('.env.rb')
File.binwrite('.env.rb', '')
File.open(".env.rb", "w") do |f|
  %w'autoforme cspvr falcomcdcatalog forme giftsmas kaeruera lila_shell quinto rodauth spam'.each do |user|
    password = SecureRandom.base64(48).gsub(/\W/, '')
    DB.run "ALTER USER #{user} PASSWORD #{DB.literal(password)}"
    f.puts "ENV['#{user.upcase}_DATABASE_URL'] ||= 'postgres://127.0.0.1/?user=#{user}&password=#{password}'"
  end

  f.puts ""

  %w'cspvr giftsmas kaeruera lila_shell quinto rodauth spam'.each do |user|
    session_secret = [SecureRandom.random_bytes(64).gsub("\x00"){((rand*255).to_i+1).chr}].pack('m')
    f.puts "ENV['#{user.upcase}_SESSION_SECRET'] ||= #{session_secret.inspect}.unpack1('m')"
  end

  f.puts ""

  f.puts "ENV['KAERUERA_DEMO_MODE'] ||= '1'"
  f.puts "ENV['KAERUERA_INTERNAL_ERROR_USER'] ||= 'demo'"
  f.puts "ENV['SPAM_DEMO'] ||= 'demo'"
  f.puts "ENV['RACK_ENV'] ||= 'production'"
end

I ran that Ruby file with the sequel command line tool (using a 31 suffix, since I installed the version for Ruby 3.1), which will set the DB variable when running the script:

sequel31 'postgres:///?user=postgres' create-env.rb

After running the create-env.rb, I was ready to import the data from Heroku. I used another script for that, named import-from-heroku.rb. It will take the application to import into, as well as the connection URL for the Heroku database (which I got from the Heroku config vars).

Dir.chdir '/var/www/app'
require './.env.rb'

app, heroku_url = ARGV

local_url = ENV.fetch("#{app.upcase}_DATABASE_URL")

system "pg_dump #{heroku_url.inspect} | psql #{local_url.inspect}"

I then ran the script for each application to import the data from the related Heroku database:

$ ruby import-from-heroku.rb autoforme postgres:///...
$ ruby import-from-heroku.rb cspvr postgres:///...
...

I then modified the Heroku config.ru file that I was using slightly. Here's what the config.ru file for the virtual machine looked like:

Dir.chdir('/var/www/app')
require './.env.rb'

$:.unshift('./forme/lib')
$:.unshift('./autoforme/lib')
$:.unshift('./rodauth/lib')

require_relative 'autoforme/demo-site/autoforme_demo'
require_relative 'falcomcdcatalog/falcomcdcatalog'
require_relative 'forme/demo-site/forme_demo'
require_relative 'giftsmas/giftsmas'
require_relative 'kaeruera/kaeruera_app'
require_relative 'quinto/lib/quinto/app'
require_relative 'rodauth/demo-site/rodauth_demo'
require_relative 'spam/spam'
require_relative 'lila_shell/lila_shell'
require_relative 'cspvr/app'

raise "::DB is defined and should not be" if defined?(::DB)

apps = Hash.new(proc{|_| [404, {'Content-Length'=>'12'}, ["Invalid Host"]]}).update(
  'autoforme-demo.jeremyevans.net'=>AutoFormeDemo::App.freeze.app,
  'falcomcdcatalog.jeremyevans.net'=>Falcom::App.freeze.app,
  'forme-demo.jeremyevans.net'=>FormeDemo::App.freeze.app,
  'giftsmas-demo.jeremyevans.net'=>Giftsmas::App.freeze.app,
  'kaeruera-demo.jeremyevans.net'=>KaeruEra::App.freeze.app,
  'quinto-demo.jeremyevans.net'=>Quinto::App.freeze.app,
  'rodauth-demo.jeremyevans.net'=>RodauthDemo::App.freeze.app,
  'spam-demo.jeremyevans.net'=>Spam::App.freeze.app,
  'lilashell-demo.jeremyevans.net'=>LilaShell::App.freeze.app,
  'cspvr-demo.jeremyevans.net'=>Cspvr::App.freeze.app,
)

run(proc{|env| apps[env['HTTP_HOST']].call(env)})

require 'nio'
require 'refrigerator'
Refrigerator.freeze_core

I then tested running the site with puma, and tested that specific sites work using curl:

$ puma31 &
$ curl -LH 'host: autoforme-demo.jeremyevans.net' http://localhost:9292
$ curl -LH 'host: falcomcdcatalog.jeremyevans.net' http://localhost:9292

That was good as a basic proof of concept, but this was not yet production ready. I added a puma.conf file for the puma configuration. This makes it use a fixed number of 5 threads, instead of starting with 0 threads and going up to 5. I had it use the production environment, instead of the default development environment. Additionally, I chose to run a single worker process, mostly so that if the worker process dies, it will be automatically restarted. This likely makes the application vulnerable to BROP attacks, since puma does not exec after forking, but the convenience of automatic restarting I considered worth the risk in this case. Puma warns if using a single worker process, so I'll explicitly silence that warning.

threads 5, 5
environment 'production'
workers 1
silence_single_worker_warning

When these apps were running in a container on someone else's cloud, security wasn't as much of a priority. However, since I'm now running this on my own virtual machine, security is more important.

I run my production applications with nginx and unicorn using unicorn-lockdown, which uses OpenBSD's pledge and unveil system calls (via ruby-pledge) to restrict allowed system calls and limit file system access. That approach works for my production systems, since they have about 500 times the amount of RAM as this small virtual machine. I would like to get similar security advantages here, while limiting how much memory is used, so I modified the bottom part of the config.ru file to also use pledge and unveil:

require 'nio'
require 'pledge'
require 'unveil'
require 'refrigerator'
require 'bcrypt'
Refrigerator.freeze_core

Pledge.unveil('.' => 'r', '.env.rb'=> '', 'mail' => :gem, 'rack' => :gem, 'message_bus' => :gem)
Pledge.pledge('rpath inet')

This only allows the process to read files and handle IP sockets (e.g. accept HTTP connections). It limits the places files can be read from to the current directory (except for the file containing the database connection information and secrets), as well as some gems that unfortunately use autoload to require files at runtime. If the puma process is attacked and the attacker tries to run arbitrary programs, or almost anything other than reading files and making IP socket connections, OpenBSD will terminate the process.

This is actually less locked down than it probably should be from a file system perspective. For my production applications, I lock down the application to only the directories and files the application actually needs (usually only the views directory). However, it's much better than having no file system access limiting.

Now that the web server is somewhat secured, I added an /etc/rc.d/puma file so that the rcctl program can be used to easily start and stop the server. This runs the application as the www user. It would probably be better to create a separate user, but since this is the only process running on the system as that user, it should be OK.

#!/bin/ksh

daemon_user=www
daemon="/usr/local/bin/puma31"
daemon_flags="-C /var/www/app/puma.conf /var/www/app/config.ru"
rc_bg=YES
rc_reload_signal=USR2

. /etc/rc.d/rc.subr

pexp="ruby[0-9][0-9]: puma .*"

rc_cmd $1

I then checked that rcctl can start and stop puma:

# rcctl start puma
# rcctl stop puma

That worked, but results in log information being printed to standard output. Let's setup log files for puma's stdout and stderr:

# mkdir /var/log/puma
# touch /var/log/puma/{stdout,stderr}.log
# chown www /var/log/puma/{stdout,stderr}.log

Puma has a stdout_redirect configuration parameter, but it does things like checking that the directory containing the log file exists before writing to it, which doesn't work well with unveil's file system access limiting. The easiest way to work around this is to change daemon_flags in puma's rc.d file to handle logging to a file:

daemon_flags="-C /var/www/app/puma.conf /var/www/app/config.ru >> /var/log/puma/stdout.log 2>> /var/log/puma/stderr.log"

I started puma again and made sure the logging worked correctly:

# rcctl start puma
# cat /var/log/puma/stdout.log
# cat /var/log/puma/stderr.log

Previously, I'd been testing access to puma on localhost. I needed to do that because the firewall rules on the virtual machine do not allow external requests to the port that puma is running on. So I needed to allow TCP connections to port 80, and redirect them to the localhost port 9292 (puma's default port). I also wanted to be sure that the user running the application (www) only has the ability to connect to PostgreSQL, and not make any other network connections. I edited the /etc/pf.conf file add the following firewall rules:

# remove set skip on lo
# After block return, add
pass on lo0
block out on {$if lo0} proto {tcp, udp} user www
pass out on lo0 proto tcp to 127.0.0.1 port 5432 user www

pass in on $if inet proto tcp to port 80 rdr-to 127.0.0.1 port 9292

I reloaded the firewall ruleset with pfctl to make puma available to the outside world:

# pfctl -f /etc/pf.conf

After that, I checked that connections from another machine to the default HTTP port on the virtual machine worked as expected:

curl -LH 'host: cspvr-demo.jeremyevans.net' http://vm.jeremyevans.net

From testing this using a real browser from the outside, I saw that a missed one step, not precompiling the assets for the applications. So I added a Rakefile that can precompile the assets for all applications. This is mostly taken from the Rakefile I was using on Heroku.

 require './.env.rb'

$:.unshift('./forme/lib')
$:.unshift('./autoforme/lib')
$:.unshift('./rodauth/lib')

namespace :assets do
  desc "Precompile the assets"
  task :precompile do
    ENV["ASSETS_PRECOMPILE"] = '1'
    require File.expand_path('../falcomcdcatalog/falcomcdcatalog', __FILE__)
    Falcom::App.compile_assets

    require File.expand_path('../giftsmas/giftsmas', __FILE__)
    Giftsmas::App.compile_assets

    require File.expand_path('../kaeruera/kaeruera_app', __FILE__)
    KaeruEra::App.compile_assets

    require File.expand_path('../spam/spam', __FILE__)
    Spam::App.compile_assets

    require File.expand_path('../quinto/lib/quinto/app', __FILE__)
    Quinto::App.compile_assets

    require File.expand_path('../cspvr/app', __FILE__)
    Cspvr::App.compile_assets
  end
end

I then ran rake to precompile the assets for all applications:

$ rake assets:precompile

After reloading puma and retesting from the outside, and seeing everything working, I set both PostgreSQL and puma to automatically start when the system boots:

# rcctl enable postgresql puma

Since I will be running the demo sites on my own virtual machine, I needed to have an appropriate backup strategy. I already a implemented a backup strategy when I originally setup this virtual machine. The backup strategy creates a .tar.gz file for the important files on the virtual machine. I needed to expand this strategy to handle the new PostgreSQL database, as well as the additional files.

First, I needed to make sure the PostgreSQL database cluster is backed up. That's easiest using pg_dumpall. So I added this to my make_backup_tarball script:

PGPASSFILE=/home/jeremy/.pgpass doas -u jeremy /usr/local/bin/pg_dumpall -U postgres > /home/jeremy/vm.pgdumpall

I also needed to add files to the list of files to backup (stored in /etc/backup_list):

etc/rc.d/puma
home/jeremy/.pgpass
home/jeremy/vm.pgdumpall
var/www/app/.env.rb
var/www/app/Rakefile
var/www/app/config.ru
var/www/app/puma.conf

Now that everything was setup, I switched the DNS records from pointing at Heroku to pointing at the virtual machine. After making the changes to DNS, I fairly soon started receiving traffic on the virtual machine.

The puma master process takes around 70MB of memory, and the worker process running all applications currently takes about 150MB of memory. The virtual machine only has 512MB total, and currently about 150MB are free. It is not as much breathing room as I would like, but hopefully it will be fine.

The final step of the migration was to turn on maintenance mode on the Heroku application. Sometime before the end of November I'll delete the Heroku application, but just in case I need to switch back, I'll leave it in maintenance mode.