Discover how I tackled false positives in email monitoring by using DatErica to directly query MongoDB, ensuring accurate, real-time alerts and reducing disruptions across key processes.
A few months ago, I encountered a significant problem while monitoring the performance of a key process in my application. We send thousands of emails daily, and every sent email is logged in MongoDB with a timestamp. To ensure everything was running smoothly, I used Grafana paired with Loki logs to monitor the process. I even set up an alert in Grafana to notify me if no logs were received for 15 minutes, which would indicate that the process had stalled.
But here’s where things got tricky.
On several occasions, I received alerts that the process had stopped, only to find out later that everything was working fine. These false positives were frustrating and disruptive. The issue was that sometimes Grafana couldn’t access Loki, or the data processing in Grafana was delayed, causing the alert to trigger incorrectly. The situation became even more chaotic in our Slack channel, where these false alarms would flood the team, leading to unnecessary stress and confusion. I tried disabling notifications when data was unavailable, but that only made things worse — I could miss a real incident.
That’s when I turned to DatErica. I needed a solution that directly monitored the data source — the MongoDB database itself — rather than relying on external logging tools that could be prone to issues. With DatErica, I could set up a reliable pipeline that checked the actual data in the database, providing a more accurate monitoring solution.
First, I configured a Cron Scheduler in DatErica to run every five minutes. This scheduler would trigger a pipeline that checked the latest email sent.
Why this is important: By running the pipeline at regular intervals, I could ensure continuous monitoring without missing any significant gaps in the process.
Next, I used the MongoDB Puller component to connect directly to our production MongoDB. The key was to retrieve the most recent email log, which was done using a simple aggregation query:
[ { "$sort": { "time_sent": -1 // Sort by time_sent in descending order } }, { "$limit": 1 }, { "$project": { "_id": 0, "time_sent": 1, "difference_in_seconds": { "$subtract": [ { "$divide": [ Date.now(), 1000 ] }, "$time_sent" ] } } } ]
This query would give me the time difference between the current time and the time the last email was sent.
Why this works: This approach directly queries the database, ensuring that the data is accurate and up-to-date, free from the issues that can arise with external monitoring tools.
Finally, I added a JSON Validator component to check if the time difference was within an acceptable range. I configured it with the following rule:
Path: difference_in_seconds Required: true Type: NUMBER Rule: LESS_THAN Value: 600 (10 minutes)
If the difference exceeded 10 minutes, the JSON Validator would trigger an alert.
Why this is effective: The JSON Validator ensures that I only get alerts when there’s an actual delay in the process, significantly reducing false positives.
This new setup has been a game-changer. By leveraging DatErica’s powerful data processing and alerting capabilities, I now have a reliable monitoring system that gives me peace of mind. No more false alarms or missed incidents—just accurate, real-time monitoring directly from the source.
While my specific use case involved monitoring email processing, this approach can be applied to a wide range of scenarios:
DatErica provided the perfect solution to a problem that had been plaguing my monitoring system. By directly querying MongoDB and setting up a customized alerting pipeline, I now have a system that’s both reliable and efficient. Whether you're dealing with email processing, inventory management, or any other critical process, DatErica’s flexible pipelines can help you build a monitoring solution that works exactly the way you need it to.
Remark:
This article is inspired by a real use case and feedback from one of our customers who faced similar challenges in their monitoring processes. Their experience with the solution has been shared here with the hope that it will assist others facing similar issues. We are always eager to learn from our customers' experiences and continuously improve our platform to meet real-world needs.
17th August 2024
16th March 2024