How to use Neosync with developer branches

How to use Neosync with developer branches

Introduction

We talk to a lot of developers about their workflows and, along with our partners at Neon, we have been steadily building out more tooling to make it easier to use Neosync with database branching.

In this blog, we're going to walk through how to use Neosync's APIs and Terraform Provider to connect, anonymize, and sync data from a staging database to developer branches.

What is database branching

Every developer has some kind of local set up that include a database. Some developers have a database that they run locally on their laptop. Those databases are typically created as part of a setup script that starts a container with a Postgres image (for example) and then runs their migration scripts to bring the database up to the latest structure.

Other companies give their developers their own cloud databases and in those cases they might use something like Neon to manage their database branching which makes it really easy to create a branch with one click.

count

The challenge, however, typically arises in the data that the developer has in that database. Is it mock data that is created as part of their setup script or is it anonymized production data that they're pulling from staging?

In other blogs, we talk about the importance of good, representational test data that is modeled after production. This is where Neosync comes into the picture.

Setting up Neosync to work with Database branching

As part of our database branching workflow, we want to seed that database with anonymized production data so that the developer has safe, representational data to work with locally.

Depending on how you're creating your developer branches and corresponding databases, you'll either want to use the Neosync Terraform Provider or Neosync's APIs to create Connections and Jobs.

Using Neosync APIs

If you're using Neosync's APIs then your workflow will look something like this:

  1. Initialization script to stand up new environment/branch runs (including DB creation, migration scripts, other infra etc.)
  2. Neosync API is called to create a connection to the newly minted database
  3. Neosync API is called to either create a new job that syncs data to that database or a a new destination is added to an existing job
  4. Any clean up/post scripts are run

Here is an example what that code looks like in Python:

  schedule = "0 23 * * *"
  haltOnNewColAdd = True
  jobRes, err = jobclient.CreateJob(ctx, connect.NewRequest({
      'AccountId': accountId,
      'JobName': 'prod-to-stage',
      'ConnectionSourceId': prodDbResp['Msg']['Connection']['Id'],
      'DestinationSourceIds': [
          stageDbResp['Msg']['Connection']['Id'],
          s3Resp['Msg']['Connection']['Id'],
      ],
      'CronSchedule': schedule,
      'HaltOnNewColumnAddition': haltOnNewColAdd,
      'Mappings': [
          {
              'Schema': 'public',
              'Table': 'users',
              'Column': 'account_number',
              'Transformer': JobMappingTransformer.custom_account_number,
          },
          {
              'Schema': 'public',
              'Table': 'users',
              'Column': 'address',
              'Transformer': JobMappingTransformer.address_anonymize,
          },
      ],
  }))
  if err:
      raise Exception(err)

Using Terraform

If you're using Terraform to manage your infrastructure, then you can use Neosync's Terraform provider to manage this. The nice thing about Terraform is that it neatly packages all of your code together.

Wherever you have your database infrastructure set up, you can add in the Neosync Terraform code. It might look something like this:

resource "neosync_job" "staging-sync-job" {
    name = "prod-to-stage"
 
    source_id = neosync_postgres_connection.prod_db.id
    destination_ids = [
      neosync_postgres_connection.stage_db.id,
      neosync_s3_connection.stage_backup.id,
    ]
 
    schedule = "0 23 * * *" # 11pm every night
 
    halt_on_new_column_addition = false
 
    mappings = [
      {
        "schema" : "public",
        "table" : "users",
        "column" : "account_number",
        "transformer" : "custom_account_number",
      },
      {
        "schema" : "public",
        "table" : "users",
        "column" : "address",
        "transformer" : "address_anonymize"
      },
    ]
  }

Similar to the API code, you can configure a job and connections using the Neosync Terraform Provider.

Wrapping up

Now you have two options, depending on your infrastructure and set up of how to use Neosync with your database branching workflow. You can either use Neosync's APIs or Terraform provider as part of your branching workflow. Setting this up gives each developer a great local developer experience, allowing them to build and debug faster and more efficiently than ever. As we say, a great developer experience starts with great data.


Join our Community
Have questions about Neosync? Come chat with us on Discord!
dev
NeosyncLogo
soc2
Nucleus Cloud Corp. 2024
Privacy Policy
Terms of Service