What We Realized from Constructing GovSlack

Slack launched GovSlack in July 2022. With GovSlack, authorities businesses, and people they work with, can allow their groups to seamlessly collaborate of their digital headquarters, whereas preserving safety and compliance on the forefront. Utilizing GovSlack consists of the next advantages:

  • Helps key authorities safety requirements, reminiscent of FedRAMP Excessive, DoD IL4, and ITAR
  • Runs in AWS GovCloud knowledge facilities
  • Allows exterior collaboration with different GovSlack-using organizations by Slack Join
  • Supplies entry to your personal set of encryption keys for superior auditing and logging controls
  • Permits permission and entry controls at scale by Slack’s enterprise-grade admin dashboard
  • Features a listing of curated purposes (together with DLP and eDiscovery apps) that may combine with Slack
  • Maintained and supported by US personnel
Image of the US flag from https://www.glitchthegame.com

Earlier than the large launch, the Cloud Foundations workforce spent virtually two quarters organising the infrastructure wanted to run GovSlack.

GovSlack is the very first service Slack launched on AWS Gov infrastructure. Subsequently we needed to spend a major period of time studying the variations between normal and Gov AWS and making adjustments to our tooling and the platform to have the ability to run on Gov AWS.

On this weblog put up, we’re going to take a look at how we constructed the AWS infrastructure wanted for GovSlack and challenges we confronted. In the event you’re excited about constructing a brand new service on AWS GovCloud, this put up is for you.

How are GovCloud accounts associated to business accounts?

Image of packing containers on cabinets from https://www.glitchthegame.com

Not way back, Slack began shifting from a single AWS account to youngster accounts. As a part of this mission, we additionally made vital adjustments to our international community infrastructure. You’ll be able to learn extra about this within the weblog posts Constructing the Subsequent Evolution of Cloud Networks at Slack and Constructing the Subsequent Evolution of Cloud Networks at Slack – A Retrospective. We had been in a position to make the most of most of our learnings into constructing the GovSlack community infrastructure.

To begin with, AWS Gov accounts should not have any billing functionality. The assets within the Gov accounts will propagate their billing right into a linked shell business AWS account. While you request a Gov AWS account, a linked shell business AWS account is routinely created. Subsequently the very first thing we needed to do was to request a Gov root AWS account utilizing our root payer business account. This was a prolonged course of, however not as a result of it was a technically tough factor to do—it was so simple as clicking a button on our root business AWS account. Nevertheless including the Gov Accounts to our present agreements with AWS did take just a few weeks. As soon as we had our Gov root account, we had been in a position to request extra GovCloud accounts for our service groups. It’s value mentioning that GovCloud youngster accounts nonetheless must be requested utilizing the business AWS API utilizing the create-gov-cloud-account name.

When a brand new GovCloud youngster account is created, you possibly can assume the OrganizationAccountAccessRole within the youngster account by way of the GovCloud root account’s OrganizationAccountAccessRole (this function identify could differ if you happen to override the identify utilizing –role-name flag).

Let’s take a look at what are these hyperlinks appear to be in a diagram:

As we are able to see above, all our GovCloud assets prices are propagated to our root business AWS account.

How did we create GovCloud accounts?

Image of many work on a wall from https://www.glitchthegame.com

As we mentioned above, we use the AWS organizations API and the create-gov-cloud-account name to request a brand new GovCloud youngster account. This course of creates two new accounts: the GovCloud account and the linked business AWS account. We use a pipeline on the business aspect for this portion of the method. Then the linked business AWS account is moved to a extremely restricted OU, so it’s blocked from creating any AWS assets in it.

We use a Jenkins pipeline within the AWS Gov partition to configure the GovCloud youngster account. We will assume the OrganizationAccountAccessRole of the brand new youngster account from the GovCloud root account as quickly as it’s created. Nevertheless our Gov Jenkins companies are positioned in a devoted youngster account. Subsequently there’s a step on this pipeline that can replace the kid account’s OrganizationAccountAccessRole’s belief coverage, so it may be assumed by the Jenkins staff. This step have to be accomplished first earlier than we are able to transfer on to different steps of the kid account configuration course of.

How can we separate GovDev and GovProd?

Image of a bridge connecting two bushes from https://www.glitchthegame.com

As talked about beforehand, one of many core compliance necessities for a GovCloud surroundings was that solely US individuals can be licensed to the manufacturing surroundings. With this requirement in thoughts we made the choice to face up two Gov environments, one being the manufacturing Gov surroundings, identified internally as “GovProd”, and a second surroundings, often called “GovDev”. The GovDev surroundings will be accessed by anybody and take a look at their companies earlier than being deployed to GovProd by US personnel.

To make sure now we have full isolation between these environments, now we have approached the construct out utilizing a full shared-nothing paradigm, which permits the environments to function in utterly completely different AWS organizations. The layer 3 networking mesh we use (Nebula) is totally disconnected, that means the networks are fully segregated from each other.

To archive this, we created two AWS organizations in GovCloud, and beneath every of those organizations, an equivalent set of kid accounts to launch our companies within the Dev and Prod environments.

Is that this actually remoted?

Image of a waterfall from https://www.glitchthegame.com

When a brand new youngster account is created, we have to use the Gov root account for assuming the OrganizationAccountAccessRole’s into it for the primary portion of the provisioning as we mentioned right here. Since solely US personnel can entry the Gov prod accounts, solely US personnel are in a position to entry the Gov root account, as this account has entry to imagine the OrganizationAccountAccessRole within the youngster accounts. Subsequently the preliminary provisioning of dev accounts additionally must run on Gov prod Jenkins, and US personnel are required to be engaged to kick off the preliminary a part of GovDev accounts creation.

Different challenges

Image of mountains from https://www.glitchthegame.com

GovProd additionally lacks some AWS companies, reminiscent of CloudFront and public zones in Route53. Moreover, once we are utilizing the AWS CLI in GovCloud, we should go within the –area flag or set the AWS_DEFAULT_REGION surroundings variable with a Gov area because the AWS CLI at all times defaults to a business area for API calls.

Route53 and ACM

Image of a room fabricated from steel from https://www.glitchthegame.com

A few of our Gov companies use AWS ACM for the load balancer SSL certifications. We keep away from utilizing electronic mail certificates validation as this doesn’t permit us to auto-renew expiring certificates. ACM DNS helps auto-renewal however requires public DNS data to take action. Subsequently, we use the identical devoted business DNS account for validating our ACM certificates as nicely. Entry to this business DNS account is restricted to US personnel.

Route53

Image of a practice station from https://www.glitchthegame.com

AWS GovCloud doesn’t assist public Route53 zones. Nevertheless non-public zones are allowed. We created a GovDev and Gov Prod DNS account for internet hosting non-public Route53 zones. The Cloud Foundations workforce creates VPCs in a set of accounts managed by us, then we use AWS Transit gateways to attach completely different areas collectively and construct a worldwide community mesh. Lastly these VPCs are shared into youngster accounts to summary the complexity of organising networks from software groups. You’ll be able to learn extra about how we do that in our different two weblog posts Constructing the Subsequent Evolution of Cloud Networks at Slack and Constructing the Subsequent Evolution of Cloud Networks at Slack – A Retrospective

The non-public Route53 zones we create are hooked up to the shared VPCs, in order quickly as a file is added to those zones, it may be resolved inside our VPCs.

Nevertheless since GovCloud doesn’t assist public DNS, we have to create these data on the business aspect. Subsequently, we created a devoted business AWS account for internet hosting public GovSlack DNS data. Entry to this business DNS account is restricted to US personnel.

How can we switch artefacts between business and GovCloud?

Image of a spaceship from https://www.glitchthegame.com

AWS doesn’t assist assuming roles between AWS normal and AWS GovCloud partitions. Subsequently we created a mechanism to compliantly go objects to GovCloud.

This mechanism ensures the objects are pulled into AWS GovCloud partition from the usual partition utilizing AWS IAM credentials. Credentials to entry the usual partition for pulling these objects are saved securely on the AWS GovCloud partition.

Terraform modules

Image of a conveyor belt from https://www.glitchthegame.com

We use Terraform modules for constructing our infrastructure as a set of interdependent assets reminiscent of VPCs, Web Gateways, Transit Gateways, and route tables. We needed to make use of the identical modules for constructing our Gov infrastructure so we are able to hold these patterns constant between AWS Gov and normal partitions. One key distinction between the business and Gov AWS assets are the assets ARNs. Industrial ARNs begin with arn:aws versus Gov ARNs begin with arn:aws-us-gov.

Subsequently we needed to construct a quite simple Terraform module referred to as aws_partition. Utilizing outputs of this module, we are able to programmatically construct ARNs and uncover which AWS partition we’re in.

Let’s take a look at the aws_partition module,

knowledge "aws_caller_identity" "present" 

knowledge "aws_arn" "arn_details" 
  arn = knowledge.aws_caller_identity.present.arn


output "partition" 
  worth = knowledge.aws_arn.arn_details.partition


output "is_govcloud" 
  worth = exchange(knowledge.aws_arn.arn_details.partition, "gov", "") != knowledge.aws_arn.arn_details.partition ? true : false

Now let’s take a look at a instance utilization,

module "aws_partition" 
  supply = "../modules/aws/aws_partition"


knowledge "aws_iam_policy_document" "instance" 
  assertion 
	impact = "Enable"

	actions = [
	  "s3:GetObject",
	]

	assets = [
	  "arn:$module.aws_partition.partition:iam::*:role/some-role",
	]
  


useful resource "aws_config_config_rule" "instance" 
  rely = module.aws_partition.is_govcloud ? 1 : 0

  identify = "example-rule"

  supply 
	proprietor             = "AWS"
	source_identifier = "S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED"
  

VPC endpoints

Image of a tube from https://www.glitchthegame.com

Over the past three years Slack has been working very laborious to utilize AWS’ VPC endpoints for accessing native AWS companies in our business surroundings. They scale back the latency and enhance the resiliency of our programs, whereas additionally lowering our networking prices.

With all these benefits, it’s very simple to imagine that it’s a easy transfer, however one evident problem that now we have present in each the business and GovCloud transfer to VPC endpoints is that AWS doesn’t at all times assist all companies in all AZs. Very often now we have discovered that we have to assist the flexibility for programs to entry AWS companies each with and with out VPC endpoints, which at occasions can create summary edge circumstances that may be laborious to account for.

Whereas AWS is consistently releasing these VPC endpoints at a AZ stage, we nonetheless haven’t reached 100% of companies enabled for 100% of the areas/AZs we run our service in.

AWS-SSO

Image of a cave from https://www.glitchthegame.com

Whereas we had been constructing out the Gov surroundings, we began by utilizing IAM customers to bootstrap the Gov surroundings, however this was solely ever going to be a short-term answer. AWS lately launched the AWS-SSO answer into their business surroundings and much more lately of their Gov surroundings. As this was an entire greenfield buildout it was a superb alternative to experiment with new applied sciences and enhance our present implementation.

Not like AWS’ normal IAM roles, AWS-SSO permission units are an org-wide international (throughout all the org, versus an account) useful resource, and this adjustments how we construct and deploy them.

Since deploying AWS-SSO within the GovCloud surroundings, now we have taken the learnings and back-ported it into our business surroundings. Whereas we already had an present SSO system in place for entry to everything of our business AWS surroundings, utilizing AWS-SSO has made this course of loads smoother and simpler.

So what have we realized?

Image of a flying being from https://www.glitchthegame.com

Rebuilding our total community infrastructure gave us the flexibility to check our tooling, processes,  and Terraform modules, and gave us the chance to make enhancements. We had been in a position to clear up a mess of hardcoded values and alter issues to be extra reusable. We had been additionally in a position to take a step again and have a deep dive into our processes, instruments, AWS footprint and acquire a higher understanding of our platform as this entire course of gave us a possibility to rebuild Slack from scratch.