Setting up budgets for cloud usage with Terraform

From time to time your team may want to use a new service from your cloud provider. That request may come with an estimated usage cost for the service and if it fits in the budget and seems good ROI it will be approved.  For most startup projects, that’s where the cloud cost control ends. With just a bit of extra effort, especially if resources are already being provisioned with Terraform, you can use budgeting tools offered by Amazon, Google etc. to ensure the actual cost aligns with expectations.

For the purposes of this example, I’ll use Google Cloud Budgets, but the analogous resources and APIs exist in AWS and Azure.

Goal: Add a budget to monitor the cost of a new Google Cloud Run service your team wants to deploy. 

Prerequisites: An operational knowledge of Terraform and editor access to a Google Cloud Project & Google Cloud Billing Account

Part 0 – Become familiar with your cloud provider’s budgeting tool

If you haven’t spent a few minutes creating a budget using the cloud console itself. The various parameters and options in Terraform will make a lot more sense if you’ve already got the context and perspective of how the budgeting process as a whole works. In Google Cloud budgets are under “Budgets & alerts” in the billing section.

Part 1 – Setup the cloud run project

This is just a sample directly from the terraform resource documentation

resource "google_cloud_run_service" "default" {
  name     = "cloudrun-srv"
  location = "us-central1"

  template {
    spec {
      containers {
        image = "us-docker.pkg.dev/cloudrun/container/hello"
      }
    }
  }

  traffic {
    percent         = 100
    latest_revision = true
  }
}

Part 2 – Find the service ID

When setting up a budget with Google Cloud you have the option to have the budget monitor cost of a specific service via a filter. The terraform resource for creating budgets with these filters requires you specify the service by the service’s ID. You can find the ID of various services in the cloud console UI as per the screenshot below.

Part 3 – Setup the budget

The below terraform code is lightly modified from the sample code in the google cloud budget terraform resource documentation

data "google_billing_account" "account" {
  billing_account = "000000-0000000-0000000-000000"
}

data "google_project" "project" {
}

resource "google_billing_budget" "budget" {
  billing_account = data.google_billing_account.account.id
  display_name = "Project X Cloud Run Billing Budget"

  budget_filter {
    projects = ["projects/${data.google_project.project.number}"]
    credit_types_treatment = "EXCLUDE_ALL_CREDITS"
    services = ["services/152E-C115-5142"] # Cloud Run
  }

  amount {
    specified_amount {
      currency_code = "USD"
      units = "100" # $100 per month
    }
  }

  threshold_rules {
    threshold_percent = 0.5 # Alert at $50
  }
  threshold_rules {
    threshold_percent = 0.9 # Alert when forecast to hit $90
    spend_basis = "FORECASTED_SPEND"
  }
}

You can even set up custom alerting rules so the teams that create new infrastructure at the ones notified if/when spend exceeds the amount forecast during planning and development.

Production SQL Server Checklist & Best Practices

Much ink has been spilled on which database you should use, or how to think about which database to use, for your project.  My aim in this post is not to sell you on any database paradigm, rather to serve as a reference guide and checklist for how to responsibly host a SQL server, be it MySQL, Postgresql or other, in production.

Before calling your SQL database plan ready-for-implementation, ask yourself if you’ve thought about all these requirements:

Read only replicas
Multi-zone hosting
☐ Automated daily backups
☐ One click rollback / backup restore
Event based audit logging / full table histories / log replication
☐ Automatic disk expansion
☐ High quality Migration tooling
☐ Connection/IP Security
☐ Local dev versions / easy ability to download prod data
Staging / recent-prod replica environments
CPU & Memory monitoring / auto scaling
Slow query monitoring
☐ High quality ORM / DB Connection library

In practice it’s very expensive or impossible to do all of these things yourself, your best bet is to chose a solution that comes with many of these features out of the box such as Google Cloud SQL or Amazon RDS; just make sure to enable the features you care about.

—————-

Read only replicas

More often than not a production SQL server will have use cases that can easily be divided between read heavy vs. write heavy.  The most common is perhaps the desire to do analytics processing on transaction data.  Generally this should be handled with a proper data pipeline/enterprise data warehouse, but having a real time readonly mirror is a good practice regardless even for your ELT tools.

Multi-zone hosting

If AWS us-east-1 goes down, will your application survive?  Have a plan to ensure data is replicated in real time between zones, or even better between datacenters entirely.  

Automated daily backups

Ideally you have at least daily, if not more regular, full backups that are sent off site.  Depending on your requirements perhaps that’s an exported zip file to a storage bucket with the same cloud provider, or perhaps it’s a bucket in an entirely different cloud.  Make sure that everything about this process is secure and locked down tight, these are entire copies of your database after all.  

This is a good use case for that realtime read only replica.

One click rollback / backup restore

Most cloud hosted SQL options will offer the option of one-click point in time restore.  At a minimum ensure you have an entirely automated way, that is tested regularly, to restore from one of your hourly or daily backups.  

Event based audit logging / full table histories / log replication

Different databases have different terminology for this, in PSQL they’re replication slots, in MSSQL it’s log replication.  The idea is you want CDC — change data capture — for every mutation to every table recorded in a data warehouse for your analytics team to do as they need.  Such data can be used to produce business audit logs, or run point-in-time analytics queries to ask questions for users such as “what was my inventory like last week?”

Automatic disk expansion

Nobody likes getting an alarm at 3AM that their database has hit its disk storage limit.  In case it’s not obvious, very bad things happen when a database runs out of disk space.  Make sure your SQL solution never runs out of disk by using a platform/tool that will expand automatically.  Ideally it shrinks automatically too.

High quality Migration tooling

Schema and data migrations are hard, don’t try and solve these problems yourself.  Use a tool or framework that will help you generate migrations and manage the execution of migrations across various environments.  Remember that your migration has to work locally for developers who have used this repository before and new developers, as well as in all staging, feature branch and production environments.  Don’t underestimate the difficulty of this challenge.

Connection/IP Security

Often you can get away with IP allowlisting access to a database, but in 2022 that’s going out of style (and will be flagged by PCI or SOC2 auditors). Nowadays your database should be in a private VPC with no internet access and networked/peered with your application servers. Keep in mind that this will make access for developers challenging, that’s a good thing!  It’s a good idea to have a strategy, either with a proxy or a bastion host, for emergencies though.

Local dev versions / easy ability to download prod data

You’ll want tooling to download a copy of sanitized production data for testing.  Something that runs well on a local machine with 1000 rows may be unacceptably slow in production with 2 million records.  Those 2 million records may cause trouble not just due to volume, but also data heterogeneity — real world users will hit edge cases your developers may not.  

CPU, Memory, Connection monitoring / auto scaling

Ensure you have monitoring and, ideally autoscaling, on cpu, memory and connection counts for a SQL database.  It should be somebody’s job to check from time to time that these values are within acceptable ranges for your use case.

Cost Monitoring

SQL databases are generally some of the more expensive parts of the stack. I recommend you set up a budget using tools in your cloud provider so you know how much you’re spending and can monitor growth.

Slow query monitoring

It’s easy to shoot yourself in the foot with SQL, whether using an ORM or writing raw SQL, and generate very expensive and slow queries.  You’ll want logging and ideally alerting for anything abnormally slow that makes it to production.

High quality ORM / DB Connection library

Don’t forget about developer experience!  Do you want to be writing raw SQL or using an ORM/DAL?  There are tradeoffs in both directions, think through your options carefully.  Does the ORM come with a migration tool?  Does it have built-in connection pooling?  

A code-free IaaC link shortener using Kutt and GKE

Goal: Deploy a link shortener on your own domain without writing any (non-infrastructure) code.

Prerequisites: An operational Kubernetes cluster, knowledge of Kubernetes & Terraform, basic knowledge of Google Cloud Platform

At every company I’ve worked at in the past decade we’ve had some mechanism to create memorable links to commonly used documents.  Internally at Google, at least when I was there around 2010, they used the internally resolving name “go”, e.g. “go/payroll” or “go/chromedashboard” would point to internal payroll or internal project dashboards.  I suspect an ex-Googler liked the idea enough to make it a business, as GoLinks is a real thing you can pay for.  Below I’ll walk through how to setup Kutt (an open source link shortener) with Terraform in your own Kubernetes cluster in Google Cloud.

Kutt has several dependencies, so let’s make sure we’ve got those in orderg

  • You need a domain name and ability to set DNS records.  For example go.mycompany.com
  • You’ll need an SMTP Server for authenticating emails to the link shortener, we’ll use this just for the admin user.  Have your mail_host, mail_port, mail_user and mail_password at hand.
  • Optionally: A google analytics ID
  • A Redis Instance (we’ll deploy one with terraform)
  • A Postgresql database (we’ll deploy one with terraform)

For starters, let’s setup our variables.tf file. There’s quite a few values here, and there are more configuration options that can be passed into Kutt via env vars down the road.

variables.tf

variable "k8s_host" {
  description = “IP of your K8S API Server”
}

variable "cluster_ca_certificate" {
  description = “K8S cluster certificate”
}

variable "region" {
  description = "Region of resources"
}

variable "project_id" {
  description = “google cloud project ID”
}

variable "google_service_account" {
  description = “JSON Service account to talk to GCP”
}

variable "namespace" {
  description = “kubernetes namespace to deploy to”
}

variable "vpc_id" {
  description = “VPC to put the database in”
}

variable "domain" {
  default = "go.mycompany.com"
}

variable “jwt_secret” {
  default = “CHANGE-ME-TO-SOMETHING-UNIQUE”
}

variable “smtp_host” {}

variable “smtp_port” {
  default = 587
}

variable “smtp_user” {}

variable “smtp_password” {}

variable “admin_emails” {}

variable “mail_from” {
  default = “lnkshortner@mycompany.com” 
}

variable “google_analytics_id” {}

Now let’s set up our database.  You can really do this anyway you like, but if we’re using Google Kubernetes Engine we likely also have access to Google Cloud SQL, so this is fairly straightforward.

database.tf

resource "google_sql_database_instance" "linkshortenerdb" {
  name             = replace("linkshortener-${var.namespace}", "_", "-")
  database_version = "POSTGRES_13"
  region           = "us-west2"
  project          = var.project_id
  lifecycle {
    prevent_destroy = true
  }

  settings {
    # $7 per month
    tier = "db-f1-micro"
    backup_configuration {
      enabled                        = true
      location                       = "us"
      point_in_time_recovery_enabled = false
      backup_retention_settings {
        retained_backups = 30
      }
    }

    ip_configuration {
      ipv4_enabled = false
      # In order for private networks to work the GCP Service Network API has to be enabled
      private_network = var.vpc_id
      require_ssl     = false
    }
  }
}

resource "google_sql_database" "linkshortener" {
  name     = "${var.namespace}-linkshortener"
  instance = google_sql_database_instance.linkshortenerdb.name
  project  = var.project_id
}

resource "random_password" "psql_password" {
  length  = 16
  special = true
}

resource "google_sql_user" "linkshorteneruser" {
  project  = var.project_id
  name     = "linkshortener"
  instance = google_sql_database_instance.linkshortenerdb.name
  password = random_password.psql_password.result
}

That’s it (finally) for prerequisites, now the fun part, setting up Kutt itself (with a Redis sidecar)

kutt.tf

provider "google" {
 project     = var.project_id
 region      = var.region
 credentials = var.google_service_account
}

provider "google-beta" {
 project     = var.project_id
 region      = var.region
 credentials = var.google_service_account
}

data "google_client_config" "default" {}

provider "kubernetes" {
 host                   = "https://${var.k8s_host}"
 cluster_ca_certificate = var.cluster_ca_certificate
 token                  = data.google_client_config.default.access_token
}

resource "random_password" "redis_authstring" {
 length  = 16
 special = false
}

resource "kubernetes_deployment" "linkshortener" {
 metadata {
   name = "linkshortener"
   labels = {
     app = "linkshortener"
   }
   namespace = var.namespace
 }

 wait_for_rollout = false

 spec {
   replicas = 1
   selector {
     match_labels = {
       app = "linkshortener"
     }
   }

   template {
     metadata {
       labels = {
         app = "linkshortener"
       }
     }

     spec {
       container {
         image = "bitnami/redis:latest"
         name  = "redis"

         port {
           container_port = 3000
         }

         env {
           name  = "REDIS_PASSWORD"
           value = random_password.redis_authstring.result
         }

         env {
           name  = "REDIS_PORT_NUMBER"
           value = 3000
         }
       }

       container {
         image = "kutt/kutt"
         name  = "linkshortener"
         port {
           container_port = 80
         }

         env {
           name  = "PORT"
           value = "80"
         }

         env {
           name  = "DEFAULT_DOMAIN"
           value = var.domain
         }

         env {
           name  = "DB_HOST"
           value = google_sql_database_instance.linkshortenerdb.ip_address.0.ip_address
         }

         env {
           name  = "DB_PORT"
           value = "5432"
         }

         env {
           name  = "DB_USER"
           value = google_sql_user.linkshorteneruser.name
         }

         env {
           name  = "DB_PASSWORD"
           value = google_sql_user.linkshorteneruser.password
         }

         env {
           name  = "DB_NAME"
           value = google_sql_database.linkshortener.name
         }

         env {
           name  = "DB_SSL"
           value = "false"
         }

         env {
           name  = "REDIS_HOST"
           value = "localhost"
         }

         env {
           name  = "REDIS_PORT"
           value = "3000"
         }

         env {
           name  = "REDIS_PASSWORD"
           value = random_password.redis_authstring.result
         }

         env {
           name  = "JWT_SECRET"
           value = var.jwt_secret
         }

         env {
           name  = "ADMIN_EMAILS"
           value = var.admin_emails
         }

         env {
           name  = "SITE_NAME"
           value = "MyCompany Links"
         }

         env {
           name  = "MAIL_HOST"
           value = var.smtp_host
         }

         env {
           name  = "MAIL_PORT"
           value = var.smtp_port
         }

         env {
           name  = "MAIL_USER"
           value = var.smtp_user
         }

         env {
           name  = "MAIL_FROM"
           value = var.mail_from
         }

         env {
           name  = "DISALLOW_REGISTRATION"
           value = "true"
         }

         env {
           name  = "DISALLOW_ANONYMOUS_LINKS"
           value = "true"
         }

         env {
           name  = "GOOGLE_ANALYTICS"
           value = var.google_analytics_id
         }

         env {
           name  = "MAIL_PASSWORD"
           value = var.smtp_password
         }
         readiness_probe {
           http_get {
             path   = "/api/v2/health"
             port   = 80
             scheme = "HTTP"
           }
           timeout_seconds = 5
           period_seconds  = 10
         }
         resources {
           requests = {
             cpu    = "100m"
             memory = "200M"
           }
         }
       }
     }
   }
 }
}

With the above you should now have a postgres server, a redis instance and a kutt deployment deployed and talking to eachother. All that’s left is to expose your deployment as a service and setup your DNS records.