So you’ve got some data you think fits on a time series? Think about data that has significance when aggregated or viewed in relation to the time in which the measure was recorded. Like your health metrics. Or like the location of an asset. Or the temperature of your refrigerator. All of those bits are meaningful in isolation but when aggregated into averages, medians, sums, max and mins, that time series component helps to tell that wonderful narrative.
There are many databases that are built for purpose around time series events such as InfluxDB, Kdb + or Prometheus to name 3 but this article isn’t about those. What if you wanted to store that data in a document database like MongoDB? How and why would you go about doing that. For me, I made the decision about a year back and have really enjoyed using the same datastore that our core domain is stored in to mix in this time series data. Sure, with a properly constructed data access layer underneath your domain, you could hydrate POCOs (Plain Old C# Objects) and perform your unique business magic regardless of the underlying data store, but I often wonder why. Sometimes we do things just do to them. The old tech for tech argument. On my current project, we are using Mongo to store sensor/IoT data on t the minute to perform either viewing, analyzing or offloading to something like Spark to perform in memory analytics and then write back into Mongo. Such a wonderful and powerful platform it is. Below I’d like to walk you through a simple implementation.
Let’s use something simple, the temperature of your refrigerator. Suppose you have a sensor that is providing data every minute about the external temperature it detects. You could have something from a data structure that looks like this every minute.
- a unique device identifier
- timestamp for the event
- temperature when the event occurred
Imagine this data streaming in and you’d like to be able to do something with it say every hour and every data.
If you are coming from a relational world, you think in terms of tables and foreign keys. There is often one pretty clear cut way to store you data to achieve speed, performance and the “correct” normal form. With a document database it almost depends on your use cases and queries when designing your documents. Think of how you are going to “read” or “aggregate” the data. For our requirement, I want to be able to
- Look at a specific minute
- Know the average temperature on every hour
Understanding that one of our key components is that timestamp, I might create a structure that looks like this
"timestamp", DateTime, // the specific hour
"timestamp": DateTime, // the specific minute
Let’s walk through this design
I’m creating a document that stores an hour’s worth of data in each document. Why does this matter? Well if you think about it, you have 60 minutes in an hour and 24 hours in a day which equates to 1440 unique events. To pull an hours worth of data I’d need to have Mongo pull back 60 documents. To pull a days worth it would be 1440. A week would be … you do the math. Imagine having 10,000s of fridge sensors and needing to do some aggregations on that. Remember when I mentioned know your reads and your use case. If you only wanted one point in time, would be an issue. A covering index would handle that just fine. But when you want to view multiples this is where that approach falls down.
So in my requirements to understand a specific minute’s data, it is one document. To aggregate an hours observations, it’s one document. To aggregate a day’s worth, it is 24 documents. 1 hour * 24. There are lots of variations here you could apply depending upon your needs.
As you can see, there’s some real power here to being able to model your time series data in a document database. Especially if you are already using Mongo for across the rest of your application(s) as your workhorse. I’m going to continue exploring Mongo over the next few weeks as a detour from AWS and C#.
Hope you are enjoying and thanks for following along!