Analytics Pipeline Best Practices medplum #support

Analytics Pipeline Best Practices

eva3448

11/05/2023, 12:42 AM

I'm exploring different ways to set up a simple analytics pipeline for my Medplum datastore. One idea is to set up a Medplum bot that uploads FHIR data to S3 on a schedule. S3 will then be my staging ground for ingesting data into other CDWs. Does this approach sound sane? Anyone have general advice around setting up an analytics pipeline?

rahul1

11/06/2023, 9:31 PM

Hi @eva3448 ! Best practices for data analytics pipeline often depend on what DW (data warehouse) you're targeting, but I'll take a stab. Writing to S3 is reasonable. One thing you'll have to decide is if you are trying to copy just a snapshot of each resource into S3, or the whole history of the resource. Other than S3, we've also seen implementations that write to a message queue, like Kafka/Kinesis. Users then set up a function then reads off the queue to ingest into the particular DW If you are exporting on a timer, you should probably use the Bulk Export API (https://medplum-www-git-kevin-communication-reasoncode-d-701071-medplum.vercel.app/docs/api/fhir/operations/bulk-fhir). This will give you a snapshot, and runs as a background job to avoid any timeouts for large requests. Instead of running on a timer, you can also use

Subscriptions

to create event-driven workflows that write to your DW as soon as an update has been made.

rahul1

11/06/2023, 9:31 PM

Might be worth asking the #1113936455954346005 channel to see if any community members have done this themselves and have guidance

eva3448

11/07/2023, 8:04 AM

Thank you so much for the thoughtful response. Just wanted to sanity check!

132 Views

Previous Next