Securing Sensitive Data for AI Agents
A guide on how to protect your sensitive data when using AI agents
January 9th, 2025
We talk to a lot of developers about their workflows and, along with our partners at Neon, we have been steadily building out more tooling to make it easier to use Neosync with database branching.
In this blog, we're going to walk through how to use Neosync's APIs and Terraform Provider to connect, anonymize, and sync data from a staging database to developer branches.
Every developer has some kind of local set up that include a database. Some developers have a database that they run locally on their laptop. Those databases are typically created as part of a setup script that starts a container with a Postgres image (for example) and then runs their migration scripts to bring the database up to the latest structure.
Other companies give their developers their own cloud databases and in those cases they might use something like Neon to manage their database branching which makes it really easy to create a branch with one click.
The challenge, however, typically arises in the data that the developer has in that database. Is it mock data that is created as part of their setup script or is it anonymized production data that they're pulling from staging?
In other blogs, we talk about the importance of good, representational test data that is modeled after production. This is where Neosync comes into the picture.
As part of our database branching workflow, we want to seed that database with anonymized production data so that the developer has safe, representational data to work with locally.
Depending on how you're creating your developer branches and corresponding databases, you'll either want to use the Neosync Terraform Provider or Neosync's APIs to create Connections and Jobs.
If you're using Neosync's APIs then your workflow will look something like this:
Here is an example what that code looks like in Python:
schedule = "0 23 * * *"
haltOnNewColAdd = True
jobRes, err = jobclient.CreateJob(ctx, connect.NewRequest({
'AccountId': accountId,
'JobName': 'prod-to-stage',
'ConnectionSourceId': prodDbResp['Msg']['Connection']['Id'],
'DestinationSourceIds': [
stageDbResp['Msg']['Connection']['Id'],
s3Resp['Msg']['Connection']['Id'],
],
'CronSchedule': schedule,
'HaltOnNewColumnAddition': haltOnNewColAdd,
'Mappings': [
{
'Schema': 'public',
'Table': 'users',
'Column': 'account_number',
'Transformer': JobMappingTransformer.custom_account_number,
},
{
'Schema': 'public',
'Table': 'users',
'Column': 'address',
'Transformer': JobMappingTransformer.address_anonymize,
},
],
}))
if err:
raise Exception(err)
If you're using Terraform to manage your infrastructure, then you can use Neosync's Terraform provider to manage this. The nice thing about Terraform is that it neatly packages all of your code together.
Wherever you have your database infrastructure set up, you can add in the Neosync Terraform code. It might look something like this:
resource "neosync_job" "staging-sync-job" {
name = "prod-to-stage"
source_id = neosync_postgres_connection.prod_db.id
destination_ids = [
neosync_postgres_connection.stage_db.id,
neosync_s3_connection.stage_backup.id,
]
schedule = "0 23 * * *" # 11pm every night
halt_on_new_column_addition = false
mappings = [
{
"schema" : "public",
"table" : "users",
"column" : "account_number",
"transformer" : "custom_account_number",
},
{
"schema" : "public",
"table" : "users",
"column" : "address",
"transformer" : "address_anonymize"
},
]
}
Similar to the API code, you can configure a job and connections using the Neosync Terraform Provider.
Neosync also comes with a CLI that developers can use locally to trigger jobs or even sync a local database.
Triggering a job through the CLI is as easy as running neosync jobs trigger <job-id>
. This allows a developer to trigger a job directly from their terminal and sync to a local database or a staging database.
Alternatively, developers can even stream data from a staging database to their local database without having to configure a new connection in Neosync by calling neosync sync --connection-id <connection-id>
. This is a great way to increase developer productivity and effinciency while working locally.
Now you have two options, depending on your infrastructure and set up of how to use Neosync with your database branching workflow. You can either use Neosync's APIs or Terraform provider as part of your branching workflow. Setting this up gives each developer a great local developer experience, allowing them to build and debug faster and more efficiently than ever. As we say, a great developer experience starts with great data.
A guide on how to protect your sensitive data when using AI agents
January 9th, 2025
Use Neosync to detect and redact PII in free-form text such as LLM prompts and other workflows
December 13th, 2024
Nucleus Cloud Corp. 2025