login stuck after redis cache restart medplum #support

oruchovets

03/17/2024, 1:42 PM

Hi @rahul1 and team , I am running a self hosted medplum on AWS. to be cost effective I am stopping RDS , ECS and Elastic Cache ( redis ) over night. after starting it back today I am encounter with the issue of login to a medplum app. I've made some RCA and got the logs: {"level":"ERROR","timestamp":"2024-03-17T13:06:56.592Z","msg":"Unhandled error","error":"MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to \"maxRetriesPerRequest\" option for details.","stack":["MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to \"maxRetriesPerRequest\" option for details."," at TLSSocket. (/usr/src/medplum/node_modules/ioredis/built/redis/event_handler.js:182:37)"," at Object.onceWrapper (node:events:633:26)"," at TLSSocket.emit (node:events:530:35)"," at node:net:337:12"," at TCP.done (node:_tls_wrap:657:7)"," at TCP.callbackTrampoline (node:internal/async_hooks:130:17)"],"requestId":"6cf79ec6-64d9-4a02-945a-6d533634b1ee","traceId":"09a54128-2573-4038-b81d-98a00241dbe8"} [ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND master.meb52fveq4kzjyj.ydyuwv.euw2.cache.amazonaws.com it looks like after restart app can't connect to redis cluster. how to fix the issue - I am hosting the stack via CDK ( following tutorial from the documentation ). should I update redis end point manually after stop / start redis cache ( by the way I am making a backup and restore for redis cache ). or what is the way to change / control backend pointing to the redis end pont? is there configuration or env variable? please advice Thanks.

rahul1

03/18/2024, 11:43 PM

Hi @oruchovets , the first step is to confirm that Redis connectivity is in-fact broken. can you post a screenshot of the output of your

/healthcheck

endpoint?

oruchovets

03/19/2024, 12:34 AM

hi @rahul1 , I made a progress with RCA about redis. When I am stopping redis , starting back it next day even recreating it from the backup is generating different connection endpoint. I checked with secrets url to connect redis and endpoint are different. I updated the endpoint with new redis endpoint and I got a new exception - connection timeout. it is probably different credentials. I am sure I am doing something wrong ( for example starting /stopping RDS works without any problem ). because in case redis servise crash I need to make so many manual changes. I am still stuck what could be a reason and how to accomplish redis. can you please share - in your dev environment stopping redis (it is only option to delete ) how you will recreate the working instance back next day. (manually or using CDK / any other way ). Please advice, I am dealing with it too much time without any progress

rahul1

03/19/2024, 5:43 AM

HI @oruchovets , unfortunately we don't have this kind of experience with stopping and restarting Redis. We don't do this kind of restart in our production environments

rahul1

03/19/2024, 5:44 AM

For development, our team primarily develops against a medplum server + redis + postgres running on

localhost

. Would that be viable for your team? This would not incur any AWS expense during development

oruchovets

03/19/2024, 6:08 AM

Hi @rahul1 . I do have local environment for development. I wanted to check how to operate the system on cloud. Suppose redis crashed in production. How to recover from the crash? It definitely should be a way to do it, right? please advice.

243 Views

Previous Next