The first part of the tutorial will focus on using Docker to setup the specialized TimescaleDB database, and PGAdmin for managing it. We’ll create some simulated IoT data and showcase some of the cool features of TimescaleDB, which you won’t find with ordinary PostgreSQL.
The third part will focus on using Dash to create interactive time series charts for monitoring your IoT data or showcasing your data science application.
All the code for this tutorial can be found here at GitHub.
I use Docker wherever possible, for reproducible environments, and super-easy deployment using Docker Swarm, so if you’re not familiar with Docker, check out the documentation here.
Part 1 - TimescaleDB, PGAdmin, and Docker
First, let’s create a Docker network so our forthcoming containers can talk to each other:
docker network create --attachable --driver bridge timescale_network
Next, let’s start a local TimescaleDB database using Docker-Compose. This will quickly start a local PostgreSQL database with the TimescaleDB extension automatically configured. Create the following
# docker-compose.timescale.yml version: '3.7' services: timescale: image: timescale/timescaledb:1.7.4-pg12 volumes: - type: volume # source: timescale-db # the volume name source: timescale_volume # target: the location in the container where the data are stored target: /var/lib/postgresql/data read_only: false # Custom postgresql.conf file will be mounted (see command: as well) - type: bind source: ./postgresql_custom.conf target: /postgresql_custom.conf read_only: false env_file: .env environment: POSTGRES_HOST: timescale command: ["-c", "config_file=/postgresql_custom.conf"] ports: - 0.0.0.0:5432:5432 networks: timescale_network: deploy: restart_policy: condition: on-failure # Creates a named volume to persist our database data volumes: timescale_volume: # Joins our external network networks: timescale_network: external: true
Note a few things about the above Docker-Compose file:
- It uses the
timescale_networkwe created in the previous step.
- It uses a volume to persist the database’s data, even if the Docker container is removed or replaced. This is very common for ‘Dockerized’ databases.
- It uses port 5432 (this will be important when we try to access the database in the future).
- It uses a custom configuration file, and a
.envfile to store secret database connection information, like your database password. Let’s create those two files next.
Here’s the custom configuration file, in case you want/need to change any of these settings in the future. The file is too long to put in a code block in this article, so just click this link, then copy and paste the text into a file called
postgresql_custom.conf and put it in the root of your project folder.
Next, here’s a template for our secret
.env file, which you can leave in the root of your project folder, alongside the Docker-Compose and database configuration files:
# .env # For the Postgres/TimescaleDB database. POSTGRES_USER=postgres POSTGRES_PASSWORD=password POSTGRES_HOST=timescale POSTGRES_PORT=5432 POSTGRES_DB=postgres PGDATA=/var/lib/postgresql/data
Now that we’ve added the custom configuration and .env files, you can start the TimescaleDB database with the following command. The
-d starts the container in the background (
docker-compose -f docker-compose.timescale.yml up -d
Check your running containers with
docker container ls or the old-school
docker ps. If the container is restarting, check the logs with
docker logs <container id> and ensure you’ve setup the .env file, the config file, and the Docker network it depends on.
Finally, let’s create a friendly PGAdmin environment for administering our database and running SQL. Create a file called
docker-compose.pgadmin.yml and add the following:
# docker-compose.pgadmin.yml version: '3.7' services: pgadmin: # Name of the container this service creates. Otherwise it's prefixed with the git repo name image: "dpage/pgadmin4:latest" restart: unless-stopped env_file: .env environment: PGADMIN_LISTEN_PORT: 9000 ports: - 0.0.0.0:9000:9000 volumes: # So the database server settings get saved and stored even if the container is replaced or deleted - pgadmin:/var/lib/pgadmin networks: timescale_network: volumes: pgadmin: networks: timescale_network: external: true
Add the following lines to your
.env file for PGAdmin. You’ll need this login information when you try to access PGAdmin in the web browser.
# .env # For the PGAdmin web app PGADMIN_DEFAULT_EMAILemail@example.com PGADMIN_DEFAULT_PASSWORD=password
Start the PGAdmin (PostgreSQL Admin) web application with the following Docker command:
docker-compose -f docker-compose.pgadmin.yml up -d
docker container ls again to check if the PGAdmin container is running. Note we specified a port of 9000, so you can now access PGAdmin at http://localhost:9000 or http://127.0.0.1:9000. Login with the username and password you setup in your
Now that you’ve logged into PGAdmin, right-click on “Servers” and “Create/Server…”. Name it “TimescaleDB Local” in the “General” tab, and type the following into the “Connection” tab:
- Host: timescale (this is the Docker “Service” hostname defined in the first docker-compose.yml file for the TimescaleDB database container)
- Port: 5432
- Maintenance database: postgres
- Username: postgres
- Password: password
Click “Save” and you should be connected. Now you can double-click on “TimescaleDB Local” and you can access your database tables at “/Databases/postgres/Schemas/public/Tables”. Pretty cool, huh? Under the “Tools” menu, click on “Query Tool” and you’re ready to start writing SQL.
You’re now the proud commander of a TimescaleDB database, which is identical to a PostgreSQL database (“The world’s most advanced open source database”, if you believe their marketing), except that it now has special abilities for dealing with high-frequency time series data.
Time series data is a bit different from regular relational data for describing users and things. Time series data can arrive any second, or even multiple times per second, depending on what you’re storing, so the database needs to be able to handle lots of insertions. Some examples are financial data, such as stock market trading prices, or internet of things (IoT) data, usually for monitoring environmental metrics like temperature, pressure, humidity, or anything else you can think of. Usually when you query time series data, you’re interested in the most recent data, and you’re usually filtering on the timestamp column, so that definitely needs to be indexed. TimescaleDB specializes in this sort of thing.
In PGAdmin, if you’re not already there, under the “Tools” menu, click on “Query Tool” and type the following SQL to create two IoT data tables:
CREATE TABLE sensors( id SERIAL PRIMARY KEY, type VARCHAR(50), location VARCHAR(50) ); CREATE TABLE sensor_data ( time TIMESTAMPTZ NOT NULL, sensor_id INTEGER, temperature DOUBLE PRECISION, cpu DOUBLE PRECISION, FOREIGN KEY (sensor_id) REFERENCES sensors (id) );
Now the special part that you can’t do in a regular PostgreSQL database. We’re going to transform the
sensor_data table into a “Hypertable”. Behind the scenes, TimescaleDB is going to partition the data on the time dimension, making it easier to filter, index, and drop old time series data.
If you’ve come to this tutorial to take advantage of TimescaleDB’s unique features, the following is where the magic happens.
Run the following query in PGAdmin to create the hypertable, automatically partitioned on the “time” dimension:
SELECT create_hypertable('sensor_data', 'time');
Now that our specialized time series table has been created, let’s create a special index on the sensor ID, since we’re very likely to filter on both sensor ID and time.
create index on sensor_data (sensor_id, time desc);
Let’s now add a few different sensors to the “sensors” table:
INSERT INTO sensors (type, location) VALUES ('a','floor'), ('a', 'ceiling'), ('b','floor'), ('b', 'ceiling');
Now for the fun part–let’s create some simulated time series data:
INSERT INTO sensor_data (time, sensor_id, cpu, temperature) SELECT time, sensor_id, random() AS cpu, random()*100 AS temperature FROM generate_series( now() - interval '31 days', now(), interval '5 minute' ) AS g1(time), generate_series(1,4,1) AS g2(sensor_id);
Run a simple select query to see some of our newly-simulated data:
SELECT * FROM sensor_data WHERE time > (now() - interval '1 day') ORDER BY time;
Here’s another example of selecting the aggregated data (i.e. a 1-hour average, instead of seeing every single data point):
SELECT sensor_id, time_bucket('1 hour', time) AS period, AVG(temperature) AS avg_temp, AVG(cpu) AS avg_cpu FROM sensor_data GROUP BY sensor_id, time_bucket('1 hour', time) ORDER BY sensor_id, time_bucket('1 hour', time);
From the official TimescaleDB tutorial, let’s showcase two more queries. First, instead of a time series history, you might just want the latest data. For that, you can use the “last()” function:
SELECT time_bucket('30 minutes', time) AS period, AVG(temperature) AS avg_temp, last(temperature, time) AS last_temp --the latest value FROM sensor_data GROUP BY period;
And of course, you’ll often want to join the time series data with the metadata (i.e. data about data). In other words, let’s get a location for each sensor, rather than a sensor ID:
SELECT t2.location, --from the second metadata table time_bucket('30 minutes', time) AS period, AVG(temperature) AS avg_temp, last(temperature, time) AS last_temp, AVG(cpu) AS avg_cpu FROM sensor_data t1 INNER JOIN sensors t2 on t1.sensor_id = t2.id GROUP BY period, t2.location;
TimescaleDB has another very useful feature called “continuous aggregates” for continually and efficiently updating aggregated views of our time series data. If you often want to report/chart aggregated data, the following view-creation code is for you:
CREATE VIEW sensor_data_1_hour_view WITH (timescaledb.continuous) AS --TimescaleDB continuous aggregate SELECT sensor_id, time_bucket('01:00:00'::interval, sensor_data.time) AS time, AVG(temperature) AS avg_temp, AVG(cpu) AS avg_cpu FROM sensor_data GROUP BY sensor_id, time_bucket('01:00:00'::interval, sensor_data.time)
That’s it for part 1 of this three-part tutorial on TimescaleDB, Dash, and Flask. Here’s part 2 on integrating Dash and Flask. Part 3 will focus on creating reactive, interactive time series charts in Dash for your single-page application (SPA).