Analytics Pipeline Best Practices
# support
e
I'm exploring different ways to set up a simple analytics pipeline for my Medplum datastore. One idea is to set up a Medplum bot that uploads FHIR data to S3 on a schedule. S3 will then be my staging ground for ingesting data into other CDWs. Does this approach sound sane? Anyone have general advice around setting up an analytics pipeline?
r
Hi @eva3448 ! Best practices for data analytics pipeline often depend on what DW (data warehouse) you're targeting, but I'll take a stab. Writing to S3 is reasonable. One thing you'll have to decide is if you are trying to copy just a snapshot of each resource into S3, or the whole history of the resource. Other than S3, we've also seen implementations that write to a message queue, like Kafka/Kinesis. Users then set up a function then reads off the queue to ingest into the particular DW If you are exporting on a timer, you should probably use the Bulk Export API (https://medplum-www-git-kevin-communication-reasoncode-d-701071-medplum.vercel.app/docs/api/fhir/operations/bulk-fhir). This will give you a snapshot, and runs as a background job to avoid any timeouts for large requests. Instead of running on a timer, you can also use
Subscriptions
to create event-driven workflows that write to your DW as soon as an update has been made.
Might be worth asking the #1113936455954346005 channel to see if any community members have done this themselves and have guidance
e
Thank you so much for the thoughtful response. Just wanted to sanity check!
132 Views