Partitioned Tables

We learned about Splayed tables in the last section and they are really useful with how they utilize memory by selectively loading data from disk to memory based on user queries.

However, in real-world applications splaying would only be so useful if you want to maintain historical data for years and perform analytics on it.

This is where Partitioned tables come in. Partitioned tables introduce another layer on top of splayed tables. For instance, storing a date’s data as a partition and storing another date’s data as another partition and so on.

Let’s look at it visually.

A sample Splayed table structure- Sym column at base, Table directories and underneath them would be the column files with the .d directory
A sample partitioned table- One more layer of partitions on top of Splayed tables. Each partition holds Splayed tables underneath.
Note: The partitions should have underlying integral values. So symbols cannot be used as partitions.

There can be no data under a partition as long as the table structure is maintained

Creating a Partitioned table

  • Let’s create two tables and save them down under the same partition.
  • Then flush the data in those tables, repopulate them and save them down to new partitions.

Save it to disk like we did for Splayed tables:

Create trade and quote tables, both with 100 records each.
Save-down table to the partition: Note the partition directory in the path
Check the contents on disk
Delete the tables from q session
Create the tables again and save-down to new partition
Check the contents on disk again
Start a fresh q session, check the tables and check the partitions
NOTE: NEVER DO THIS
Never type the table name directly in the session. q will try to map all disk data to memory which might kill your session if the data size is large.

We need to specify a date or a range of dates and put filter more conditions to retrieve this data. Few examples below.

Basic Queries to retrieve data from a Partitioned table

  • Let’s try to get the trade data for sym `aapl for date 2018.09.02
Notice the date partition is prepended to the table. Rule of thumb is to always put the date as the first where clause condition to narrow down the search and so on for further clauses.
Note: We'll look at more complex queries and operations in the coming chapters. Let's keep it basic for now.

Create a table with .Q.dpft

q provides another weapon in the .Q namespace arsenal. Let’s see how it makes our life easy.

Syntax:
.Q.dpft[dir;partition;sortCol;table]
  • dir is the root directory(which has sym file) for partitioned table.
  • partition is the partition name.
  • sortCol is the column to sort by.
  • table(as a symbol) is the table to be stored in the partition.

Let’s store another partition for our running example.

Create new trade and quote tables and store them down to a new partition.
Check the contents on disk
Start a new q session, reload the partitioned tables and check the tables and partitions.

Summary

This section covers the partitioned tables in KDB+ database. They form the backbone for storing surreal amounts of Historical data and performing analysis on them.

This section is basic and we’ll run more queries on it to understand even better.