Hello everyone,
I’m reaching out to the Power Automate community for assistance with a persistent Gateway issue that has been impacting our operations. When our Gateway crashes, all automations using SQL actions in Power Automate fail, which is critical for our workflows.
Issue Recap:
-
Initial Setup:
-
Initially, we had a single VM with a gateway named PowerAutomateRPAProd. This VM and gateway managed all our Power Automate workflows and also updated all Power BI dashboards. Some of these dashboards are consumed by the RPA team, while others are used by business areas.
-
Starting in August, we noticed intermittent Gateway crashes. RAM usage would spike to over 95%, and the Gateway service needed a manual restart to clear RAM and restore functionality. To mitigate this, we updated the Gateway to the latest version. This provided temporary relief, but the crashes soon resumed.
-
Scaling Resources:
-
We increased the VM’s capacity to 32GB RAM and 8vCPU, exceeding the requirements for Gateway services. While this delayed the issue due to increased RAM capacity, eventually, all 32GB of RAM were consumed. Again, the Gateway service required manual restarts to clear RAM and restore normal operations, impacting both automations and dashboard refreshes.
-
New Gateway Deployment:
-
We then deployed a new VM with a new Gateway (PowerBIRPAProd) to distribute the load. This Gateway was dedicated solely to BI dashboard updates.
-
Initially, both Gateways functioned well. However, the issue has now reappeared with the PowerAutomateRPAProd Gateway, our primary workflow Gateway.
Recent Failures:
We encountered failures on December 23 and 24. Screenshots of these failures can be provided for reference if needed.
Observations & Questions:
-
Could anyone help identify why RAM consumption escalates this way?
-
Currently, the “Collect Additional Data” option is disabled. Should we enable this option to gather more information if the problem recurs?
-
Has anyone faced similar issues or have insights on potential causes and solutions?
Urgency:
This issue directly impacts our production automation environment, so finding a definitive resolution is crucial. Any guidance or recommendations from the community would be immensely appreciated.
Thank you for your help!