| Translations of this document:
Portuguese
Russian
Chinese
Why to create terabyte Firebird database?
Many companies work with big Firebird databases and rely on
them to support important business operations. Some of Firebird databases are
already hundreds of gigabytes in size and continue to grow (see “Who's Big?”
section), and it's easy to predict the moment when they become 2, 3 or 5 times
bigger. So database administrators and vendors are interested to investigate
Firebird behaviour with big databases and get some recommendations how to manage
them.
Also the important reason we had in mind while creating 1Tb
Firebird database was the final elimination of prevalent perception of Firebird
as database engine for “small databases”. This myth seems to be dead now, but
some analysts and journalists raise it from the grave regularly, and we hope
to finish with this ridiculous perception at last.
Hardware
Firebird is well-known for its incredible scalability and
this investigation has confirmed it once again. The initial purpose of this
experiment was just creation of Firebird database 1Tb in size, so we used usual
desktop computer:
Table 1: Hardware
|
Component
|
Parameters
|
|
CPU
|
AMD Athlon 64 x2 5200
|
|
RAM
|
4GB
|
|
Motherboard
|
MSI K9N Platinum
|
|
HDD1 (operation system
and temp)
|
ST3160815AS, 160GB, SATA II
|
|
HDD2 (auxilary)
|
HDT721064SLA360, 640GB, SATA II
|
|
HDD2 (auxilary)
|
HDS728080PLA380, 80GB, SATA
I
|
|
HDD3 (database)
|
ST31500341AS, 1.5TB, SATA II
(Firmware CC1H)
|
In essence, we've put the 1.5Tb HDD into the one of our
office desktops, without any other modifications. This HDD was formatted with
cluster size 16Kb (the same as page size of database, as you can see below).
Software
As it's a desktop computer, operation system is Windows XP
Professional SP3, 32bit. To actually perform the test we used the loader from
TPC-based toolkit (download it from http://ibdeveloper.com/tests/tpc-c/,
both binaries and sources are available).
We'd like to emphasize that loader inserts data as it would
be inserted in real life scenario: records are being inserted and situated
inside database (and at physical disk areas) into several
master-detail-subdetail tables, not table by table.
Table 2: Software
|
Software
|
Version
|
|
Operation system
|
Windows XP Professional SP3,
32bit
|
|
Firebird
|
2.1.3 SuperServer
(snapshot)
|
|
Loader
|
Custom loader from
tpc-based test
|
|
Who's big?
There are several companies who handle significantly big
Firebird databases. Here we’ve listed three of them as examples of real-world
big Firebird databases in 3 key industries: retail, finance and medicine.
Bas-X
Bas-X (http://www.basx.com.au/,
Australia) is a leading provider of enterprise information technologies to
independent retailers, particularly multi-site operators and management groups.
Bas-X is a true leader in Firebird-based development: two of their customers
have Firebird databases with size more than 450Gb, and several others have
databases more than 200Gb.
Interesting thing about Bas-X is that they offer
Software-as-a-Service solutions and using Firebird as database management
system for thousands of clients. This is definitely one of the brightest
examples of real-life cloud computing offerings, and it uses Firebird, which is
absolutely good enough for this hard job.
Watermark Technologies
Watermark Technologies (http://www.watermarktech.co.uk/, UK)
is the great example of utilizing Firebird technology to serve enterprises in
Finance and Government sectors.
Watermark Technologies produces software which uses Firebird
for document management which includes indexed OCR for full text search. It is used by financial advisers, insurance companies and so on. There
are several 300+Gb Firebird databases deployed at present.
Free Firebird licensing is one of the things
which enables Watermark Technologies offer flexible subscription model for end
customers, so they can avoid upfront payments and pay as they go.
Profitmed
Profitmed (http://www.profitmed.net/,
Russia) joint stock company is one the largest Russian pharmaceutical
distributors. They have relatively small database (only ~40Gb), but we decided
to mention them as they have extremely high load in terms of simultaneous
active connections, serving hundreds of small resellers and drug store shops
across Russia. Though this database looks smaller than others, it contains very
dense data inside: SKUs of medicines, warehouse movements are represented as
numbers, and, thanks to Firebird data compression mechanism, these data consume
very modest amount of disk storage.
|
Plan
We had very straightforward plan for this experiment:
- Create database and load it with 1Tb data, without indices
- Create primary keys and appropriate indices (so actually
database size is more than 1Tb)
- Gather database statistics
- Run several SQL queries and estimate database performance
Database and
Firebird server configuration
Database has page size 16384 bytes, the same as HDD cluster,
to maximize disk throughout performance (to read/write 1 page at one I/O cycle).
In the Firebird configuration we have configured additional directory
for temp space and point it to the disk with 640Gb (where ~300Gb was free).
Loading step
Data were loaded into this database by several steps.
Computer was used during loading operations as usual desktop (we have MS Office,
Firefox, IBAnalyst, etc – about 8-12 programs ran at the same time). If we
would dedicate the hardware for this task only, probably it would be faster, so
please consider these values only as a low-end example; they are definitely not
the top results.
Table 3: Loading operations
|
Description
|
Value
|
|
Time to load
|
~70 hours
|
|
Total records inserted
|
6.2 billions
|
|
Average insertion speed
|
24500 records/second
|
|
Average record size
|
146 bytes (min 13 bytes, max – 600 bytes)
|
|
Transactions
|
646489
|
We spent ~4 days at loading, and after that we had Firebird
database with exactly 1Tb size (i.e. 1 099 900 125 184 bytes).
Below you can see database growth and transactions dynamics in FBDataGuard Viewer:
Indices
We created indices one by one and counted their creations'
time and appropriate size of temp file used for sorting.
The biggest index was created for table ORDER_LINE. Its
primary key contains four fields (Smallint, Smallint, Integer and Smallint).
Temp file for this sorting index was 182Gb, and final index size in the
database is 29.3Gb.
It's interesting to see that even index for table with 3.8
billion records has depth = 3, because page size was 16384 bytes, so there is
no overhead while searching data using primary key for this table.
Statistics
After that we have gathered database statistics. It took 7
hours 32 mins 45 secs.
We've put key statistics information into the one table and
included some queries and time measurements:
Table 4: Consolidated statistics for 1Tb database
|
Table name
|
Record counts
|
Size, gb
|
Execution time of select
count(*)
|
Index creation time
|
Tmp file size, Gb
|
Index size, Gb
|
|
WAREHOUSE
|
1240
|
0.002
|
0s
|
0
|
0
|
0.0
|
|
ITEM
|
100000
|
0.012
|
0.7s
|
-
|
-
|
0.0
|
|
DISTRICT
|
124000
|
0.017
|
0.7s
|
6
|
-
|
0.0
|
|
NEW_ORDER
|
111600000
|
32
|
20m
00s
|
23m
00s
|
4.56
|
0.8
|
|
CUSTOMER
|
372000000
|
224
|
-
|
41m
00s
|
-
|
2.6
|
|
customer_last
|
|
|
|
1h
52m 32s
|
12.4
|
2.3
|
|
fk_cust_ware
|
|
|
|
2h
10m 51s
|
-
|
2.3
|
|
HISTORY
|
372000000
|
32
|
-
|
-
|
-
|
-
|
|
ORDERS
|
372000000
|
25
|
32m
00s
|
45m
41s
|
15.2
|
2.5
|
|
STOCK
|
1240000000
|
404
|
-
|
3h
34m 44s
|
41.5
|
9.2
|
|
ORDER_LINE
|
3720051796
|
359
|
-
|
12h
6m 18s
|
182.0
|
29.3
|
Database statistics can be
downloaded from here.
You can use free FBDataGuard Community Edition Viewer to interpret text data
and see not only database performance metrics, but also CPU and memory
consumption.
Queries
First of all, we've run select count(*) queries on
several tables (see 4-th column in Table 4 above). As you know, due to the
multi-version nature of Firebird select count(*) for whole table is an
expensive operation for server because it requires visiting of every page, and
experienced Firebird developers does not use select count(*), but we used it to
demonstrate the overall performance ratio of database and hardware.
After select count queries we have run queries from
real-life scenario and, to be honest, we were amazed with so good results. See
for yourself:
|
Query
|
Statistics
|
Description
|
|
select w_id,
w_name, c_id, c_last
from WAREHOUSE,
customer
where
c_w_id = w_id
|
PLAN JOIN (WAREHOUSE
NATURAL, CUSTOMER INDEX (FK_CUST_WARE))
------ Performance info
------
Prepare time = 15ms
Execute time = 79ms
Avg fetch time = 6.08 ms
Current memory =
272 264 476
Max memory =
272 514 048
Memory buffers = 16 384
Reads from disk to cache =
82
Writes from cache to disk =
0
Fetches from cache =
3 648
|
Simple join of tables with 12400 and 372000000 records,
no WHERE conditions. “Avg fetch time = 6.08 ms”
is for fetching the first row.
|
|
select w_id,
w_name, c_id, c_last
from WAREHOUSE,
customer
where c_w_id =
w_id and c_w_id = 10000
|
PLAN JOIN (WAREHOUSE INDEX
(WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE))
------ Performance info
------
Prepare time = 16ms
Execute time = 78ms
Avg fetch time = 6.00 ms
Current memory =
272 266 148
Max memory =
272 514 048
Memory buffers = 16 384
Reads from disk to cache =
88
Writes from cache to disk =
0
Fetches from cache =
3 656
|
Join of the same tables with condition which forces
selection of recent records. “Avg fetch time =
6.00 ms” is for fetching the first row.
|
|
select count(*)
from WAREHOUSE,
customer
where c_w_id =
w_id and c_w_id = 10000
Result = 30000
|
PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX
(FK_CUST_WARE))
------ Performance info ------
Prepare time = 0ms
Execute time = 453ms
Avg fetch time = 453.00 ms
Current memory = 272 263 844
Max memory = 272 514 048
Memory buffers = 16 384
Reads from disk to cache = 1 048
Writes from cache to disk = 0
Fetches from cache = 60 024
|
Count records for previous query
|
|
|
|
|
|
SELECT * FROM
ORDER_LINE
WHERE OL_W_ID =
500
|
Plan
PLAN (ORDER_LINE
INDEX (ORDER_LINE_PK))
------ Performance
info ------
Prepare time = 0ms
Execute time =
94ms
Avg fetch time =
7.23 ms
Current memory =
136 445 536
Max memory =
136 592 176
Memory buffers =
8 192
Reads from disk to
cache = 150
Writes from cache
to disk = 0
Fetches from cache
= 2 402
|
Query to the biggest table (3.8B records). “Avg fetch time = 7.23 ms” is for
fetching the first row.
|
|
|
|
|
|
|
Plan PLAN (ORDER_LINE INDEX (ORDER_LINE_PK)) ------ Performance info ------ Prepare time = 0ms Execute time = 3s 438ms Avg fetch time = 0.01 ms Current memory = 136 445 496 Max memory = 136 592 176 Memory buffers = 8 192 Reads from disk to cache = 1 840 Writes from cache to disk = 0 Fetches from cache = 598 636
|
|
|
SELECT * FROM
ORDER_LINE
WHERE OL_W_ID =
500
|
The same query to the biggest table (3.8B records), but at this time we have fetched all
records (299245 records fetched).
|
|
|
|
|
select w_id,
w_name, c_id, c_last
from WAREHOUSE,
customer
where c_w_id =
w_id and (c_w_id > 8000) and (c_w_id < 10000)
|
Plan
PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX
(FK_CUST_WARE))
------ Performance info ------
Prepare time = 0ms
Execute time = 125ms
Avg fetch time = 9.62 ms
Current memory = 272 270 824
Max memory = 272 514 048
Memory buffers = 16 384
Reads from disk to cache = 91
Writes from cache to disk = 0
Fetches from cache = 3 659
|
Join tables with 1240 records and 372M records.
|
|
select count(*)
from WAREHOUSE,
customer
where c_w_id =
w_id and (c_w_id > 8000) and (c_w_id < 10000)
Result = 59 970 000
|
Plan
PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX
(FK_CUST_WARE))
------ Performance info ------
Prepare time = 0ms
Execute time = 13m 4s 718ms
Avg fetch time = 784 718.00 ms
Current memory = 272 268 532
Max memory = 272 514 048
Memory buffers = 16 384
Reads from disk to cache = 2 332 583
Writes from cache to disk = 0
Fetches from cache = 119 977 902
|
Count records for previous query
|
Summary
In this experiment Firebird shows the following results
- Undoubted ability to handle big databases. We are pretty
sure that it's possible to create and use 32 Tb database on appropriate
hardware, and Firebird will show the same high performance as it shows for
smaller databases (i.e., 1Tb and below).
- Good scalability and amazingly small footprint. 1Tb
database was created at usual desktop computer and, more important, it can
be used to perform general queries: if you don't fetch millions of records,
query’s speed is the same as it is for moderate size databases (10-15Gb).
This is not the end of this experiment: we intend to
run some queries, gather additional statistics and publish more detailed report
shortly. Please stay tuned.
Contacts
Send all your questions and enquiries to terabyte@ib-aid.com
|
|