Terraform Config for Multi-Cloud: Problem

Introduction

Terraform is the industry standard tool for infrastructure provisioning. It provides a unified language for interacting with any supported API. This enables developers to interact with a variety of platforms via knowledge of Terraform, and some measure of knowledge of the platform itself. The tool and language removes the need to learn multiple APIs and bindings, and provides a simplified and easy-to-use singular interface through a standardized language. However, the code usage is not unified for the platforms. In this article, we explore why Terraform config code is not unified for interactions with three major cloud platforms.

This article assumes familiarity with Terraform, basic infrastructure platform concepts, and basic coding.

Overview

Anyone with any experience working with Terraform and AWS will also be familiar with a basic resource for managing a single instance:

resource "aws_instance" "this" {
  ami           = "ami-abcdefg1234567890"
  instance_type = "t3.medium"
}

According to sales and marketing departments, you can then extend to managing multi-cloud infrastructure with Terraform via the following:

resource "aws_instance" "this" {
  ami           = "ami-abcdefg1234567890"
  instance_type = "t3.medium"
}

resource "gcp_instance" "this" {
  ami           = "ami-abcdefg1234567890"
  instance_type = "t3.medium"
}

resource "azure_instance" "this" {
  ami           = "ami-abcdefg1234567890"
  instance_type = "t3.medium"
}

Congratulations! You are now developing Terraform configs to manage multiple platforms! All that is required is a simple find-and-replace on the word “aws” with your cloud platform of choice.

This is, of course, not a valid Terraform config. However, there are (surprisingly?) many people who believe that is valid. Developing Terraform configs and modules to manage multi-cloud infrastructure is an intricate process. This is because of the underlying architecture in the communication between Terraform and platforms.

Interface Interaction Architecture

The Terraform interactions with supported platforms can be traced from config to provider to bindings to API. The communication is bi-directional, and begins and ends with Terraform, and the API in the middle:

config (REQUEST)→provider→bindings→API (RESPONSE)→bindings→provider→core (OUTPUT)

We will now examine these layers for the example of a simple cloud instance, and explain how each contributes to the problem of a non-unified Terraform config for managing multiple platforms.

API

According to the AWS EC2 API documentation for RunInstances, the authorized request for an instance with specified image and type would appear like:

https://ec2.amazonaws.com/?Action=RunInstances
&ImageId=ami-abcdefg1234567890
&InstanceType=m3.medium
&AUTHPARAMS

According to the GCP Compute Engine API documentation for instances, the request for an instance with specified image and type would appear like:

POST https://compute.googleapis.com/compute/v1/projects/MyProject/zones/europe-central2-b/instances

{
 "machineType": "zones/europe-central2-b/machineTypes/n1-standard-1",
 "disks": [
    {
      "initializeParams": {
        "sourceImage": "projects/debian-cloud/global/images/family/debian-10"
      }
    }
  ]
}

According to the very comprehensive Azure Compute Virtual Machine Create documentation, the request for an instance with specified image and type would appear like:

PUT https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/myResourceGroup/providers/Microsoft.Compute/virtualMachines/myVM?api-version=2021-11-01

{
  "properties": {
    "hardwareProfile": {
      "vmSize": "Standard_D2s_v3"
    },
    "storageProfile": {
      "imageReference": {
        "sku": "16.04-LTS",
        "publisher": "Canonical",
        "version": "latest",
        "offer": "UbuntuServer"
      }
    }
  }
}

We can see above that the request endpoints, structure, and parameter names for the instance type and image are completely different between the cloud APIs. Note that in the above examples the request JSON body is missing other required parameters, but we have restricted to image and type parameters for relevance and brevity.

Is unification possible here?

In theory yes, but in practice it is completely infeasible and borderline impossible. These endpoints and parameters are completely different. Unifying the APIs would require the platforms’ services future trajectory to also coincide in a monumental level of collaboration and cooperation. This is not the layer where unification is possible.

Bindings

According to the AWS Go SDK EC2 documentation for RunInstances, the abbreviated code for an instance with specified image and type would appear like:

ec2Session := ec2.New(session.New())
instanceInput := &ec2.RunInstancesInput{
  ImageId:      aws.String("ami-abcdefg1234567890"),
  InstanceType: aws.String("t3.medium"),
}

result, err := ec2Session.RunInstances(instanceInput)

According to the GCP Go SDK Compute Engine documentation for InstanceRequest, the abbreviated code for an instance with specified image and type would appear like:

ctx := context.Background()
instancesClient, err := compute.NewInstancesRESTClient(ctx)
defer instancesClient.Close()

req := &computepb.InsertInstanceRequest{
  InstanceResource: &computepb.Instance {
    Disks: []*computepb.AttachedDisk {
      InitializeParams: &computepb.AttachedDiskInitializeParams {
        SourceImage: proto.String("projects/debian-cloud/global/images/family/debian-10"),
      },
    },
  },
  MachineType: proto.String(fmt.Sprintf("zones/%s/machineTypes/%s", "europe-central2-b", "n1-standard-1")),
}

op, err := instancesClient.Insert(ctx, req)

The Azure documentation contains examples and quickstarts for the Go SDK instead of the normal documentation format, but we can adapt these for abbreviated code for an instance with specified image and type:

future, err := vmClient.CreateOrUpdate(
  ...
  compute.VirtualMachine{
    VirtualMachineProperties: &compute.VirtualMachineProperties{
      HardwareProfile: &compute.HardwareProfile{
        VMSize: compute.VirtualMachineSizeTypesBasicA0,
      },
      StorageProfile: &compute.StorageProfile{
        ImageReference: &compute.ImageReference{
          Publisher: to.StringPtr("Canonical"),
          Offer:     to.StringPtr("UbuntuServer"),
          Sku:       to.StringPtr("16.04-LTS"),
          Version:   to.StringPtr("latest"),
        },
      },
    },
  },
)

err = future.WaitForCompletionRef(ctx, vmClient.Client)

Note how the code for all three initializes a client and parameters structs for the instance creation. The client and parameters are then input to the API bindings method for creating an instance. While the code architecture is very similar, the implementation is very different with respect to argument type structure and naming. Note how the naming and structure ties in directly to the API request bodies from the previous section.

Is unification possible here?

In theory yes, and less infeasible then at the API level. The bindings provide a wrapper around the API for interfacing with Go, so the differences can be abstracted with code. However, this would still require a significant effort by each platform to abstract the specifics around their API to conform to a standardized set of Go functions, structs, etc. The bindings would need to be compatible with all APIs, and this effort is still too substantial to rationalize.

Provider

Examining the AWS provider code for instance, we can see the schema for the arguments and a basic example usage in the code.

"ami": {
  Type:         schema.TypeString,
  ForceNew:     true,
  Computed:     true,
  Optional:     true,
  AtLeastOneOf: []string{"ami", "launch_template"},
},
"instance_type": {
  Type:         schema.TypeString,
  Computed:     true,
  Optional:     true,
  AtLeastOneOf: []string{"instance_type", "launch_template"},
},
...
runOpts := &ec2.RunInstancesInput{
  InstanceType: instanceOpts.InstanceType,
}

runResp, err = conn.RunInstances(runOpts)

Examining the Google provider code for compute_instance, we can see the schema for the arguments and a basic example usage in the code.

"boot_disk": {
  ...
      "initialize_params": {
        ...
            "image": {
              Type:             schema.TypeString,
              Optional:         true,
              AtLeastOneOf:     initializeParamsKeys,
              Computed:         true,
              ForceNew:         true,
              DiffSuppressFunc: diskImageDiffSuppress,
              Description:      `The image from which this disk was initialised.`,
            },
      },
},
"machine_type": {
  Type:        schema.TypeString,
  Required:    true,
  Description: `The machine type to create.`,
},
...
if mt, ok := d.GetOk("machine_type"); ok {
  machineType, err := ParseMachineTypesFieldValue(mt.(string), d, config)
  machineTypeUrl = machineType.RelativeLink()
}

Examining the AzureRM provider code for linux_virtual_machine, we can see the schema for the arguments and a basic example usage in the code.

"size": {
  Type:         pluginsdk.TypeString,
  Required:     true,
  ValidateFunc: validation.StringIsNotEmpty,
},
"source_image_id": {
  Type:     pluginsdk.TypeString,
  Optional: true,
  ForceNew: true,
  ValidateFunc: validation.Any(
    computeValidate.ImageID,
    computeValidate.SharedImageID,
    computeValidate.SharedImageVersionID,
  ),
},
...
params := compute.VirtualMachine{
  VirtualMachineProperties: &compute.VirtualMachineProperties{
    HardwareProfile: &compute.HardwareProfile{
      VMSize: compute.VirtualMachineSizeTypes(size),
    },
  },
}

Note that the schema for each resource is completely different from each other, but also only loosely tied to the cloud Go bindings at a low level. The schema naming and structure is tied to the cloud bindings for convenience, efficiency, and readability. The code architecture for CRUD functions for the resource is independent of the bindings (and tied more to Terraform’s provider SDK), but must ensure proper usage of the bindings for interfacing with the APIs. We can see here that the providers have some measure of independence from the bindings layer, and some measure of unification with each other due to the Terraform provider SDK.

Is unification possible here?

Now we have reached the layer where unification is possible and also somewhat feasible. The first requirement for unification at this level would be to standardize the resource schemas among the providers. The next requirement would be for each provider to interact with the bindings accurately utilizing the standardized schema. This would require extra effort on each team’s part to broker the generalized schema with their individual bindings. It would also require extra effort between teams to communicate and collaborate on the standardized schema. Alternatively, some collaborative committee (similar to the purpose of the CNCF) could establish guidelines here. It is also worth noting that not all resources have an equivalent offering on each platform, so this really only applies to the common offerings such as instances.

Of course, even though unification is possible here does not mean it will occur. Some have theorized towards creating a provider that even interacts with multiple cloud platforms (which is also completely possible given the standardization and brokering discussed above), but this level of effort would be very substantial for the community to undertake. Unification is possible here, but it will likely not occur.

Config

We have already seen the basic resource for an AWS instance.

resource "aws_instance" "this" {
  ami           = "ami-abcdefg1234567890"
  instance_type = "t3.medium"
}

Below is the proper resource for a GCP instance.

resource "google_compute_instance" "this" {
  machine_type = "e2-medium"

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-10"
    }
  }
}

Also we have a valid resource for an Azure instance.

resource "azurerm_linux_virtual_machine" "this" {
  size = "Standard_F2"

  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "16.04-LTS"
    version   = "latest"
  }
}

Again required arguments and blocks have been omitted for the sake of brevity. The effects of the provider resource schema differences examined in the previous section are quite evident here. The instance type for each resource is a string type argument with differing names for arguments and values. The image for each resource is a string type for AWS and block (roughly map elem in the provider) type for GCP and Azure with different structure and names.

Is unification possible here?

Yes it absolutely is. We have control over developing Terraform configs, and therefore we can broker and wrap the interface accordingly.

Conclusion

In this article we explored the architecture of the interactions between Terraform and platforms. We examined each layer of the interaction for three example cloud platforms, and focused on a simple instance management for each. We were able to discern what would be required at each layer to unify the interactions between Terraform and the three cloud platforms. We also discussed the feasibility and level of effort at each layer. The conclusion is that we will need to unify at the Terraform config level. The next article will provide the solution for managing infrastructure in multiple cloud platforms within a single Terraform module.

If your organization is interested in vastly improving infrastructure and platform management for your systems, applications, or other software you develop or otherwise utilize, contact Shadow-Soft below.

  • This field is for validation purposes and should be left unchanged.