Library

More details about 1.7 Terabyte Firebird SQL database

Alexey Kovyazin, 26-May-2014
Some days ago we have published an article about our tests, devoted to the relationship between Firebird performance and databases growth, where we have tested (among others) the 1.7 Terabyte Firebird SQL database. Here you can find more details about such a big database.

Tables

In the table 1 you can find list of tables with key characteristics. This statistics was taken with gstat –a –r and interpreted with our IBAnalyst tool.

Table	Records	RecLength, bytes	Data Pages	Table size, Mb	Indices size, Mb	Total, %
ORDER_LINE	6300024797	60.09	38880344	607505.3	50582.3	34
STOCK	2100000000	298.88	43783806	684121.9	15809.61	39
ORDERS	630004141	29.00	2692324	42067.56	4250.38	2
HISTORY	630000434	48.77	3446455	53850.86	0.00	3
CUSTOMER	630000000	577.52	24226054	378532.0	8675.78	21
NEW_ORDER	188999981	13.00	623764	9746.31	1260.84	1
DISTRICT	210000	103.92	1860	29.06	1.27	~0
ITEM	100000	82.73	756	11.81	0.52	~0
WAREHOUSE	21000	97.90	179	2.80	0.11	~0

Table 1. Tables and their main parameters in 1.7Terabyte Firebird SQL database

As you can see, the biggest table is ORDER_LINE – it contains more than 6 billion of records. It’s size is about 600Gb and there are 50Gb of indices for this table. This table occupies 34% of the database.

Table STOCK is also very big – it contains ~2 billion of records. Though the record number is smaller than in table ORDER_LINE, STOCK occupies ~680Gb in the database (39%), because record length in STOCK is 298.88 bytes against only 60.09 bytes in ORDER_LINE. Accordingly, size of associated indices for STOCK is only 15Gb.

The third biggest table, CUSTOMER, has even bigger record length – 577.52 bytes, and occupies 378Gb (21%) with only 630 million of records.

Indices

There are not many indices in this database, since it is designed only for tests – majority of tables have only 1 index – primary key. In the real world applications developers create many indices to serve particular users requests, but here indices are concentrated on the fastest inserts and updates of the data – as a result, indices size is only 5-10% of table size, while usually it is 30-50% (if you have indices size more than 50% of the table size, consider redesign of database schema, or, may be, drop useless or non-effective indices).

In the table 2 you can find the list of all indices in this database and their key characteristics, taken from gstat statistics and analyzed by IBAnalyst:

Index	Table	Depth	Keys #	Key length, bytes	# of unique keys	Size, Mb
ORDER_LINE_PK	ORDER_LINE	4	6300024797	1.41	6300024797	50582.34
STOCK_PK	STOCK	4	2100000000	1.00	2100000000	15809.61
ORDERS_PK	ORDERS	3	630004141	1.01	630004141	4250.38
CUSTOMER_LAST	CUSTOMER	3	630000000	0.00	1000	4029.64
CUSTOMER_PK	CUSTOMER	3	630000000	1.01	188999981	1260.84
NEW_ORDER_PK	NEW_ORDER	3	188999981	1.01	188999981	1260.84
DISTRICT_PK	DISTRICT	2	210000	1.37	210000	1.27
ITEM_PK	ITEM	2	100000	1.00	100000	0.52
WAREHOUSE_PK	WAREHOUSE	2	21000	1.00	21000	0.11

Table 2. Indices of Firebird 1.7Tb database

As you can see, all indices have very small key size – the biggest is 1.41, it means that index is very effectively compressed – for example, primary for ORDER_LINE contains 6 billion of keys in 50 Gb of space. This is very good result.

There are 2 indices which have depth = 4. It means that engine need to perform 4 reads of index pages to find requested value. It is recommended that index depth should be not higher than 3, and if it is higher, database page size should be increased. However, we already have the maximum page size in this database (16Kb), so we need to live with it.

Heating

Before we will continue with queries, let's perform “heating”. When database is big, system data, associated with tables, are also big, and it takes some time for Firebird to cache necessary system pages. For example, table ORDER_LINE has 10110 pointer pages. Heating is necessary for any big database.

Without heating the first execution of queries will take significantly longer than with heating.

To heat database cache, let’s perform series of simple queries to load system data into cache:

select first 1 * from ORDER_LINE; select first 1 * from STOCK; select first 1 * from customer; select first 1 * from ORDERS; select first 1 * from ITEM;

This is enough for such a simple data schema. For big and complex databases with hundreds of tables heating schema could be much more complex, it should be carefully developed.

As a result, memory consumption of Firebird process (Firebird SuperServer 2.5.2 64 bit) grows:

Figure 1. Heating of database cache

Queries

After heating we can run several typical queries to estimate how users will work with such a big Firebird database: what will be response times for typical queries and read operations.

Query	Plan and statistics	Description
`select first 10 w_id, w_name, c_id, c_last from WAREHOUSE, customer where c_w_id = w_id and c_w_id = 10000`	`PLAN JOIN (WAREHOUSE NATURAL, CUSTOMER INDEX (CUSTOMER_PK)) STATISTICS Current memory = 166919744 Delta memory = 5616 Max memory = 166977488 Elapsed time= 0.08 sec Buffers = 10000 Reads = 1 Writes 0 Fetches = 51`	Join table WAREHOUSE (21000 records) with CUSTOMER (630 million records), with condition for specific warehouse
`select count(*) from WAREHOUSE, customer where c_w_id = w_id and c_w_id = 10000`	`PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (CUSTOMER_PK)) COUNT ============ 30000 Current memory = 166918560 Delta memory = -1184 Max memory = 166977488 Elapsed time= 0.16 sec Buffers = 10000 Reads = 1175 Writes 0 Fetches = 60040`	Count records for the previous query.
`SELECT first 10 * FROM ORDER_LINE WHERE OL_W_ID = 10050`	`PLAN (ORDER_LINE INDEX (ORDER_LINE_PK)) STATISTICS Current memory = 167013640 Delta memory = 95080 Max memory = 167055968 Elapsed time= 0.80 sec Buffers = 10000 Reads = 10285 Writes 0 Fetches = 10424`	Query to the largest table ORDER_LINE (~6billion records) with condition to select order_line records for specified warehouse id. Since primary key ORDER_LINE_PK is composite, and contains warehouse ID (CONSTRAINT ORDER_LINE_PK: Primary key (OL_W_ID, OL_D_ID, OL_O_ID, OL_NUMBER), it is effectively used in this query.
`SELECT count(*) FROM ORDER_LINE WHERE OL_W_ID = 10050;`	`PLAN (ORDER_LINE INDEX (ORDER_LINE_PK)) COUNT ============ 299509 Current memory = 167125464 Delta memory = -7160 Max memory = 167267976 Elapsed time= 3.41 sec Buffers = 10000 Reads = 1994 Writes 0 Fetches = 599170`	Count for previous query. The second query with the same parameters will be much faster, but queries with different parameters show similar result.
`Select first 10 w_id, w_name, c_id, c_last rom WAREHOUSE, CUSTOMER where c_w_id = w_id and (c_w_id > 8000) and (c_w_id < 10000)`	`PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (CUSTOMER_PK)) STATISTICS Current memory = 167687664 Delta memory = -651888 Max memory = 168931400 Elapsed time= 0.19 sec Buffers = 10000 Reads = 31 Writes 0 Fetches = 63`	Query with join and 2 conditions.
`select first 10 c1.C_ID, c1.C_FIRST, c1.C_LAST, o1.O_ID, o1.O_OL_CNT, i1.I_NAME, i1.I_PRICE, ol1.OL_AMOUNT from customer c1 join orders o1 on (c1.c_w_id = o1.O_w_ID and c1.C_D_ID = o1.O_D_ID and c1.C_ID = o1.O_C_ID) join ORDER_LINE ol1 on (ol1.ol_w_id = c1.C_W_ID and ol1.OL_D_ID = c1.C_D_ID and ol1.OL_O_ID = o1.O_ID) join Item i1 on (ol1.OL_I_ID = i1.i_id) where o1.o_d_id = 1 and o1.o_w_id = 1 and c1.c_id = 10;`	`PLAN JOIN (C1 INDEX (CUSTOMER_PK), O1 INDEX (ORDERS_PK), OL1 INDEX (ORDER_LINE_PK), I1 INDEX (ITEM_PK)) STATISTICS Current memory = 166807128 Delta memory = 6016 Max memory = 166879200 Elapsed time= 0.20 sec Buffers = 9999 Reads = 1 Writes 0 Fetches = 4960`	Query to show details for specific customer in the specific warehouse and district. It demonstrates performance of more complex joins.

Summary

As you can see, 1.7 Terabyte Firebird database shows pretty good performance for common read queries. If query is well-designed and uses good indices, performance is good even on the very big database. Of course, there are pain points, like long fetch time for very large datasets and long time for count operations (since Firebird visits all pages to get an exact number of records for the particular query in the specific transaction), but these pain points are well-known by experienced Firebird developers, and there are methods to design database in the way to workaround them.

But what about performance degradation for inserts and updates? Is it possible to compensate it with smart tuning of Firebird configuration?

In the next article we will consider tuning options for Firebird 1.7 terabyte database – we will boost our bird and make it fly.