Skip to content

Hi, My Name is Ben

  • About

Time series data in MongoDB

September 21, 2018 by Benjamen

So you’ve got some data you think fits on a time series?  Think about data that has significance when aggregated or viewed in relation to the time in which the measure was recorded.  Like your health metrics.  Or like the location of an asset.  Or the temperature of your refrigerator.  All of those bits are meaningful in isolation but when aggregated into averages, medians, sums, max and mins, that time series component helps to tell that wonderful narrative.

There are many databases that are built for purpose around time series events such as InfluxDB, Kdb + or Prometheus to name 3 but this article isn’t about those.  What if you wanted to store that data in a document database like MongoDB?  How and why would you go about doing that.  For me, I made the decision about a year back and have really enjoyed using the same datastore that our core domain is stored in to mix in this time series data.  Sure, with a properly constructed data access layer underneath your domain, you could hydrate POCOs (Plain Old C# Objects) and perform your unique business magic regardless of the underlying data store, but I often wonder why.  Sometimes we do things just do to them.  The old tech for tech argument.  On my current project, we are using Mongo to store sensor/IoT data on t the minute to perform either viewing, analyzing or offloading to something like Spark to perform in memory analytics and then write back into Mongo.  Such a wonderful and powerful platform it is.  Below I’d like to walk you through a simple implementation.

The Setup

Let’s use something simple, the temperature of your refrigerator.  Suppose you have a sensor that is providing data every minute about the external temperature it detects.  You could have something from a data structure that looks like this every minute.

{
   "device": "SomeUniqueId",
   "timestamp": "2008-09-15T15:53:00",
   "temperature": 36.23
}

Essentially it’s

  • a unique device identifier
  • timestamp for the event
  • temperature when the event occurred

Imagine this data streaming in and you’d like to be able to do something with it say every hour and every data.

Schema/Document Design

If you are coming from a relational world, you think in terms of tables and foreign keys.  There is often one pretty clear cut way to store you data to achieve speed, performance and the “correct” normal form.  With a document database it almost depends on your use cases and queries when designing your documents.  Think of how you are going to “read” or “aggregate” the data. For our requirement, I want to be able to

  • Look at a specific minute
  • Know the average temperature on every hour

Understanding that one of our key components is that timestamp, I might create a structure that looks like this

{
   "deviceId": string,
   "timestamp", DateTime, // the specific hour
   "averageTemperature": float,
   "minutes": [
      {
         "timestamp": DateTime, // the specific minute
         "temperature": float
      }
    ]
}

Let’s walk through this design

I’m creating a document that stores an hour’s worth of data in each document.  Why does this matter? Well if you think about it, you have 60 minutes in an hour and 24 hours in a day which equates to 1440 unique events.  To pull an hours worth of data I’d need to have Mongo pull back 60 documents.  To pull a days worth it would be 1440.  A week would be … you do the math.  Imagine having 10,000s of fridge sensors and needing to do some aggregations on that.  Remember when I mentioned know your reads and your use case. If you only wanted one point in time, would be an issue.  A covering index would handle that just fine.  But when you want to view multiples this is where that approach falls down.

So in my requirements to understand a specific minute’s data, it is one document.  To aggregate an hours observations, it’s one document.  To aggregate a day’s worth, it is 24 documents.  1 hour * 24.   There are lots of variations here you could apply depending upon your needs.

Wrap Up

As you can see, there’s some real power here to being able to model your time series data in a document database.  Especially if you are already using Mongo for across the rest of your application(s) as your workhorse.  I’m going to continue exploring Mongo over the next few weeks as a detour from AWS and C#.

Hope you are enjoying and thanks for following along!

Facebooktwitterredditpinterestlinkedinmail

Post navigation

Previous Post:

Documenting your ASP.NET Core Web API with Swashbuckle

Next Post:

From C# to Java

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Go SSM and Initializing in an AWS Lambda
  • Spring Boot, Logback and Logstash
  • Covariant Method Return Types in Java
  • From C# to Java
  • Time series data in MongoDB

Recent Comments

  • Benjamen on From C# to Java
  • Mike Graf on From C# to Java
  • DJ on Takeaways from my First Analytics Conference

Archives

  • March 2021
  • March 2019
  • December 2018
  • September 2018
  • August 2018
  • December 2017
  • November 2017
  • October 2017
  • March 2016
  • February 2016
  • November 2015

Categories

  • Agile
  • AWS
  • Data
  • Golang
  • Java
  • Life
  • Programming
  • Tech
  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
© 2021 Hi, My Name is Ben | Built using WordPress and SuperbThemes