How to use Neosync with developer branches

How to use Neosync with developer branches

Introduction

We talk to a lot of developers about their workflows and, along with our partners at Neon, we have been steadily building out more tooling to make it easier to use Neosync with database branching.

In this blog, we're going to walk through how to use Neosync's APIs and Terraform Provider to connect, anonymize, and sync data from a staging database to developer branches.

What is database branching

Every developer has some kind of local set up that include a database. Some developers have a database that they run locally on their laptop. Those databases are typically created as part of a setup script that starts a container with a Postgres image (for example) and then runs their migration scripts to bring the database up to the latest structure.

Other companies give their developers their own cloud databases and in those cases they might use something like Neon to manage their database branching which makes it really easy to create a branch with one click.

count

The challenge, however, typically arises in the data that the developer has in that database. Is it mock data that is created as part of their setup script or is it anonymized production data that they're pulling from staging?

In other blogs, we talk about the importance of good, representational test data that is modeled after production. This is where Neosync comes into the picture.

Setting up Neosync to work with Database branching

As part of our database branching workflow, we want to seed that database with anonymized production data so that the developer has safe, representational data to work with locally.

Depending on how you're creating your developer branches and corresponding databases, you'll either want to use the Neosync Terraform Provider or Neosync's APIs to create Connections and Jobs.

Using Neosync APIs

If you're using Neosync's APIs then your workflow will look something like this:

  1. Initialization script to stand up new environment/branch runs (including DB creation, migration scripts, other infra etc.)
  2. Neosync API is called to create a connection to the newly minted database
  3. Neosync API is called to either create a new job that syncs data to that database or a a new destination is added to an existing job
  4. Any clean up/post scripts are run

Here is an example what that code looks like in Python:

  schedule = "0 23 * * *"
  haltOnNewColAdd = True
  jobRes, err = jobclient.CreateJob(ctx, connect.NewRequest({
      'AccountId': accountId,
      'JobName': 'prod-to-stage',
      'ConnectionSourceId': prodDbResp['Msg']['Connection']['Id'],
      'DestinationSourceIds': [
          stageDbResp['Msg']['Connection']['Id'],
          s3Resp['Msg']['Connection']['Id'],
      ],
      'CronSchedule': schedule,
      'HaltOnNewColumnAddition': haltOnNewColAdd,
      'Mappings': [
          {
              'Schema': 'public',
              'Table': 'users',
              'Column': 'account_number',
              'Transformer': JobMappingTransformer.custom_account_number,
          },
          {
              'Schema': 'public',
              'Table': 'users',
              'Column': 'address',
              'Transformer': JobMappingTransformer.address_anonymize,
          },
      ],
  }))
  if err:
      raise Exception(err)

Using Terraform

If you're using Terraform to manage your infrastructure, then you can use Neosync's Terraform provider to manage this. The nice thing about Terraform is that it neatly packages all of your code together.

Wherever you have your database infrastructure set up, you can add in the Neosync Terraform code. It might look something like this:

resource "neosync_job" "staging-sync-job" {
    name = "prod-to-stage"
 
    source_id = neosync_postgres_connection.prod_db.id
    destination_ids = [
      neosync_postgres_connection.stage_db.id,
      neosync_s3_connection.stage_backup.id,
    ]
 
    schedule = "0 23 * * *" # 11pm every night
 
    halt_on_new_column_addition = false
 
    mappings = [
      {
        "schema" : "public",
        "table" : "users",
        "column" : "account_number",
        "transformer" : "custom_account_number",
      },
      {
        "schema" : "public",
        "table" : "users",
        "column" : "address",
        "transformer" : "address_anonymize"
      },
    ]
  }

Similar to the API code, you can configure a job and connections using the Neosync Terraform Provider.

Using the Neosync CLI

Neosync also comes with a CLI that developers can use locally to trigger jobs or even sync a local database.

Triggering a job through the CLI is as easy as running neosync jobs trigger <job-id>. This allows a developer to trigger a job directly from their terminal and sync to a local database or a staging database.

Alternatively, developers can even stream data from a staging database to their local database without having to configure a new connection in Neosync by calling neosync sync --connection-id <connection-id>. This is a great way to increase developer productivity and effinciency while working locally.

Wrapping up

Now you have two options, depending on your infrastructure and set up of how to use Neosync with your database branching workflow. You can either use Neosync's APIs or Terraform provider as part of your branching workflow. Setting this up gives each developer a great local developer experience, allowing them to build and debug faster and more efficiently than ever. As we say, a great developer experience starts with great data.


Introducing Free-Form Text Anonymization for AI and Machine Learning Workflows

Introducing Free-Form Text Anonymization for AI and Machine Learning Workflows

Use Neosync to detect and redact PII in free-form text such as LLM prompts and other workflows

December 13th, 2024

View Article