Time Series Database (TSDB)
Do not miss this exclusive book on Binary Tree Problems. Get it now for free.
Introduction
Interested in time series databases and want to know a bit about them? Well, you are in the right place if that is what you want because in this article, I will give an overview of Time Series databases, examples of such databases, their applications as well as their advantages and dis-advantages. But first we will have to know a few things to have knowledge on time series databases.
Table of Contents
- What is a Time series?
- Time series databases (TSDBs) and how they work
- Examples of TSDBs
- Applications of TSDBs
- Use Case
- Advantages of TSDBs
- Disadvantages of TSDBs
What is a Time Series?
A time series or time series data is a collection of data points that are indexed based on time-intervals. It is different from regular data in the sense that it includes a time field for each indexed data. Time series allows us to track changes to data over a period of time. Time series enables for the inserting or appending data in comparison to updating or deleting data.
Examples of time series includes:
- Market and Stock prices.
- Weather records.
- Health metrics.
- Sensor data.
- User log monitoring
- Application log monitoring.
- Operational metrics (for observability).
Time Series Databases (TSDBs) and How they work
A time series database is a database management system improved upon for storing and serving time series data. In other words, a TSDB (time series database) is a system that is optimized to work on data and information measured by time.
A database is simply a tool which stores a structured collection of data that is accessed electronically.
A database management system (DBMS) is a software that provides an interface for users to interact with the database in order to perform CRUD (create, read, update, delete) queries.
Compared to traditional databases (relational and non-reltaional), a time series database places more emphasis on the structuring of data amd information according to time. They capture and store data in pairs of time(s) and value(s) which helps for effortless analysis of data as well as instantenous querying. The time(s) are dynamic whereas the value(s) are fixed.
Examples of TSDBs
There are various databases that help in the management of time series data and some of these databases (along with the programming languages used to create them) include:
- InfluxDB (Written in Go)
- Graphite (Written in Python)
- TimescaleDB (Written in C)
- Apache Druid (Written in Java)
- MongoDB* (Written in C++, Python and JavaScript)
- RRDTool (Written in C)
(*) It is worth noting that MongoDB is originally a non-relational database that provides a time series collection to manage time series data, as do most other relational and non-relational databases. The only drawback is thier lack of optimization and inability to handle immense workloads that come with time series data.
I would like to explain one of the examples for you (the reader) to get a grasp on how other time series databases might be like. So I will be shedding more light on TimescaleDB.
TimescaleDB is a time series database that was developed by Timescale Inc., which is a company founded Ajay Kulkarni and Micheal J. Freeman. It was written and designed using the C programming language and extends the functionality of PostgreSQL. Like MongoDB, TimescaleDB supports standard SQL queries and is inherently a relational database (since it is basically an extension of PostgreSQL). It is mostly free and open source, with the exception of the cloud variant of the database which comes in monthly and annually paid plans.
Some characteristics of TimescaleDB which makes it ideal for time series data include:
- The ability to assimilate incredible and large volumes of data using time-space partitioning.
- The ability to efficiently store data using hypertables which cuts storage space down to nearly 90%.
- Using standard SQL queries to analyze data which makes it easier for individuals to use a time series database provided they have prior SQL knowledge.
- The ability to manage data accurately and consistently using features that are not available on traditional databases.
Applications of TSDBs
Time series databases can be applied in a lot of use cases. Some of these application include:
-
Monitoring sensor and logging data: TSDBs can be used in IoT devices and system software to obtain metrics for predictive analytical purposes. A salient use case is in the area of automobiles where data metrics obtaianed from the various sensors of the car can be used in analysis to aid in predictive maintenance. Logging data obtained from users on a piece of software is also another use case. Since the data is time-stamped in both use cases, a TSDB would be the optimal choice for capturing and storing the vast amounts of data obtained.
-
Financial markets: The world of finance, espcially forex and stock trading, require the assembling of data based on time intervals on changing trends of various financial assets. The insights obtained from this is useful for investors and traders, aiding them in thier decision-making process. A TSDB would also be an optimal choice here for capturing and storing data since the data is indexed based on time intervals.
Use Case
A web application that uses a time series database to log user activity.
Here, we shall show a use case that involves the use of a time series database in a web application. We shall be looking at the following in this use case:
- Details on data captured by the system
- Estimates on the size of data stored
- Design of the database
- Queries on the database
Details on data captured by the system
The database captures user activity data log in the form of metrics in a table. The logs captured includes the userid, username and the loginrecord (timestamp) of each user login.
Login_details Table
user_id | user_name | login_record |
---|---|---|
1 | Danny | 2022-11-11 10:20:10 |
2 | James | 2022-11-12 11:36:00 |
3 | Ken | 2022-11-13 09:56:07 |
Estimates on the size of data stored
- Assuming that the time-stamp of each login recorded by the system per minute is 1.5 KB in size for a single user and a million users log into the system at every minute. Then the data stored in the system for a single day would be approximately 130 TB (terabytes) in size.
- For system consistency up to 20 years, the system would have to be: 130 * 365 * 10 = 949 PB (petabytes) in size or more.
Design of the database
Using TimescaleDB as a reference point, we shall look at how to design the time series database for the afforementioned use case.
Designing the database entails having the following components:
- An SQL API
- For full SQL compatibility and a provision of time series functions to be used on the user log data.
- A Query Planner
- To plan queries in a sequence of individual steps, with each query differing from the previous step.
- A Schema management layer
- To manage schema objects such as tables and user log data
- A Storage layer
- To store ingested user log data. For TimescaleDB, the storage layer is built atop PostgreSQL.
Queries made on the database
Examples of queries made on the database include:
First Query: The amount of logins per hour in the previous week
SELECT
time_bucket('1 hour', login_record) as one_hour,
COUNT(*) total_num_of_logins
FROM login_details
WHERE login_record > now() - INTERVAL '1 week'
GROUP BY one_hour
ORDER BY one_hour;
The time_bucket()
function in the query snippet above is used to automatically group and aggregate the user log data to enable a view of the total hourly logins in the past week.
Second Query: The last login time of each user
SELECT
user_id,
user_name,
last(login_record, login_record) as last_login
FROM login_details
WHERE login_record > now() - INTERVAL '1 week'
GROUP BY user_id, user_name
ORDER BY user_id
The last()
function in the query snippet above is used to query the most recent value (in this case, the login time) of the user log data.
Advantages of TSDBs
From the afforementioned use case, there are several advantages of using a time series database over relational and non-relational databases to capture time series data. Some of these advantages are:
- Scalability.
- Efficiency.
- High-speed data querying.
- Data compression.
- Data lifecycle management.
Disadvantages of TSDBs
Like every other technological tool, time series databases are not strangers to thier own limitations in certain areas. Some of them include:
- Data volume limitation
- Difficulty in optimizing read and write opeartions to prevent overlapping.
- Since TSDBs are specialized for time-stamped data, they are mostly unable to manage static data.
With this article at OpenGenus, you must have the complete idea of Time Series Database (TSDB) in System Design.
Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.