login stuck after redis cache restart
# support
o
Hi @rahul1 and team , I am running a self hosted medplum on AWS. to be cost effective I am stopping RDS , ECS and Elastic Cache ( redis ) over night. after starting it back today I am encounter with the issue of login to a medplum app. I've made some RCA and got the logs: {"level":"ERROR","timestamp":"2024-03-17T13:06:56.592Z","msg":"Unhandled error","error":"MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to \"maxRetriesPerRequest\" option for details.","stack":["MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to \"maxRetriesPerRequest\" option for details."," at TLSSocket. (/usr/src/medplum/node_modules/ioredis/built/redis/event_handler.js:182:37)"," at Object.onceWrapper (node:events:633:26)"," at TLSSocket.emit (node:events:530:35)"," at node:net:337:12"," at TCP.done (node:_tls_wrap:657:7)"," at TCP.callbackTrampoline (node:internal/async_hooks:130:17)"],"requestId":"6cf79ec6-64d9-4a02-945a-6d533634b1ee","traceId":"09a54128-2573-4038-b81d-98a00241dbe8"} [ioredis] Unhandled error event: Error: getaddrinfo ENOTFOUND master.meb52fveq4kzjyj.ydyuwv.euw2.cache.amazonaws.com it looks like after restart app can't connect to redis cluster. how to fix the issue - I am hosting the stack via CDK ( following tutorial from the documentation ). should I update redis end point manually after stop / start redis cache ( by the way I am making a backup and restore for redis cache ). or what is the way to change / control backend pointing to the redis end pont? is there configuration or env variable? please advice Thanks.
r
Hi @oruchovets , the first step is to confirm that Redis connectivity is in-fact broken. can you post a screenshot of the output of your
/healthcheck
endpoint?
o
hi @rahul1 , I made a progress with RCA about redis. When I am stopping redis , starting back it next day even recreating it from the backup is generating different connection endpoint. I checked with secrets url to connect redis and endpoint are different. I updated the endpoint with new redis endpoint and I got a new exception - connection timeout. it is probably different credentials. I am sure I am doing something wrong ( for example starting /stopping RDS works without any problem ). because in case redis servise crash I need to make so many manual changes. I am still stuck what could be a reason and how to accomplish redis. can you please share - in your dev environment stopping redis (it is only option to delete ) how you will recreate the working instance back next day. (manually or using CDK / any other way ). Please advice, I am dealing with it too much time without any progress
r
HI @oruchovets , unfortunately we don't have this kind of experience with stopping and restarting Redis. We don't do this kind of restart in our production environments
For development, our team primarily develops against a medplum server + redis + postgres running on
localhost
. Would that be viable for your team? This would not incur any AWS expense during development
o
Hi @rahul1 . I do have local environment for development. I wanted to check how to operate the system on cloud. Suppose redis crashed in production. How to recover from the crash? It definitely should be a way to do it, right? please advice.
243 Views