Library

Detailed New Features Of Firebird 5, Part 5: Better compression, cache of compiled statements, and more

by D.Simonov, adapted and edited by A.Kovyazin
version 1.0 from 07.12.2023

This material is sponsored and created with the sponsorship and support of IBSurgeon www.ib-aid.com, vendor of HQbird (advanced distribution of Firebird) and supplier of performance optimization, migration and technical support services for Firebird.

1. New ODS and upgrade without backup-restore

The traditional way of updating ODS (On-Disk Structure) is to perform backup on the old version of Firebird and restore on the new one. This is a rather lengthy process, especially on large databases.

However, in the case of updating a minor version of ODS (the number after the dot) backup/restore is redundant (it is only necessary to add the missing system tables and fields, as well as some packages). An example of such an update is updating ODS 13.0 (Firebird 4.0) to ODS 13.1 (Firebird 5.0), since the major version of ODS 13 remained the same.

Starting from Firebird 5.0, it became possible to update the minor version of ODS without the lengthy backup and restore operations. For this, the gfix utility is used with the -upgrade switch.

Key points:

  • The update must be performed manually using the command gfix -upgrade

  • Exclusive access to the database is required, otherwise an error is issued.

  • The system privilege USE_GFIX_UTILITY is required.

  • The update is transactional, all changes are rolled back in case of an error.

  • After the update, Firebird 4.0 can no longer open the database.

Usage:

gfix -upgrade  -user  -pass 
Note
  • This is a one-way modification, there is no way back. Therefore, before updating, make a copy of the database (using nbackup b -0) to have a restore point in case something goes wrong during the process.

  • Updating ODS using gfix -upgrade does not change the data pages of user tables, so the records will not be repacked using the new RLE compression algorithm. But newly inserted records will be compressed using the improved RLE.

2. Improved data compression algorithm

As you know, in Firebird, table records are stored on data pages (DP) in compressed form. This is done so that as many records as possible can fit on one page, which in turn saves disk input-output. Until Firebird 5.0, the classic Run Length Encoding (RLE) algorithm was used to compress records.

The classic RLE algorithm works as follows. A sequence of repeated characters is reduced to a control byte, which determines the number of repetitions, followed by the actual repeated byte. If the data cannot be compressed, the control byte indicates that "the next n bytes should be output unchanged".

The control byte is used as follows:

  • n > 0 [1 .. 127] - next n байт will be stored as is;

  • n < 0 [-3 .. -128] - next byte will be repeated n times, but stored only once;

  • n = 0 - end of data.

Mainly, RLE is effective for compressing trailing zeros in fields of type VARCHAR(N), which are not fully filled or are equal to NULL. It is fast enough and does not load the processor much unlike dictionary-based algorithms, such as LHZ, ZIP, GZ.

But the classic RLE algorithm has drawbacks:

  • the maximum compression ratio is 64 times: the control byte can encode 128 repeating bytes turning them into 2 bytes. Thus, 32000 identical bytes will take up 500 bytes. This problem has worsened lately with the advent of the UTF8 encoding, where 4 bytes are allocated for each character.

  • in some cases, the compressed byte sequence may become longer than the uncompressed one, if the data is not compressible.

  • frequent alternation of short compressible and non-compressible sequences additionally loads the processor, thus offsetting the benefit of saving disk input-output.

Therefore, in Firebird 5.0, an improved RLE compression algorithm (with a variable-length counter) was developed. This algorithm is available only in databases with ODS 13.1 and higher.

Note

Updating ODS using gfix -upgrade does not change the data pages of user tables, so the records will not be repacked using the new RLE compression algorithm. But newly inserted records will be compressed using the improved RLE.

The improved RLE algorithm works as follows. Two previously unused lengths -1 and -2 are used as special markers for longer compressible sequences:

  • {-1, two-byte counter, byte value} - repeating sequences of length from 128 bytes to 64 KB;

  • {-2, four-byte counter, byte value} - repeating sequences of length more than 64 KB.

Compressible sequences of length 3 bytes make no sense if they are located between two non-compressible runs. Compressible sequences of length from 4 to 8 bytes are a borderline case, as they are not very compressed, but increase the total number of runs, which negatively affects the unpacking speed. Starting from Firebird 5.0 fragments shorter than 8 bytes are not compressed.

In addition, in Firebird 5.0 (ODS 13.1) there is another improvement: if as a result of applying the RLE compression algorithm to the record, the byte sequence turned out to be longer (non-compressible data), then the record will be written to the page as is and marked with a special flag as uncompressed.

Now I will show by examples how the new RLE algorithm increases the performance of queries.

First of all, let’s note that compressing records is not a free operation in terms of resources (CPU and memory). This can be easily verified by executing two queries:

SELECT COUNT(*) FROM BIG_TABLE;

SELECT COUNT(SOME_FIELD) FROM BIG_TABLE;

The first query does not use record unpacking, because we are not interested in their content (it is enough to just count the number). The second query has to unpack each record to make sure that the field SOME_FIELD is not NULL. First, let’s see how this is done in Firebird 4.0.

SELECT COUNT(*)
FROM WORD_DICTIONARY;
                COUNT
=====================
              4079052

Current memory = 2610594912
Delta memory = 0
Max memory = 2610680272
Elapsed time = 0.966 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 4318077
SELECT COUNT(CODE_DICTIONARY)
FROM WORD_DICTIONARY;
                COUNT
=====================
              4079052

Current memory = 2610596096
Delta memory = 1184
Max memory = 2610685616
Elapsed time = 1.770 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 4318083

1.770 - 0.966 = 0.804 - 1.770 - 0.966 = 0.804 - most of this time is just the cost of unpacking records.

Now let’s look at the same thing on Firebird 5.0.

SELECT COUNT(*)
FROM WORD_DICTIONARY;
                COUNT
=====================
              4079052

Current memory = 2577478608
Delta memory = 176
Max memory = 2577562528
Elapsed time = 0.877 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 4342385
SELECT COUNT(CODE_DICTIONARY)
FROM WORD_DICTIONARY;
                COUNT
=====================
              4079052

Current memory = 2577491280
Delta memory = 12672
Max memory = 2577577520
Elapsed time = 1.267 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 4342393

1.267 - 0.877 = 0.390 - This is twice less than in Firebird 4.0. Let’s take a look at the statistics of this table in Firebird 4.0 and Firebird 5.0.

Statistics in Firebird 4.0
WORD_DICTIONARY (265)
    Primary pointer page: 855, Index root page: 856
    Total formats: 1, used formats: 1
    Average record length: 191.83, total records: 4079052
    Average version length: 0.00, total versions: 0, max versions: 0
    Average fragment length: 0.00, total fragments: 0, max fragments: 0
    Average unpacked length: 670.00, compression ratio: 3.49
    Pointer pages: 19, data page slots: 59752
    Data pages: 59752, average fill: 87%
    Primary pages: 59752, secondary pages: 0, swept pages: 0
    Empty pages: 1, full pages: 59750
    Fill distribution:
         0 - 19% = 1
        20 - 39% = 0
        40 - 59% = 0
        60 - 79% = 1
        80 - 99% = 59750
Statistics in Firebird 5.0
WORD_DICTIONARY (265)
    Primary pointer page: 849, Index root page: 850
    Total formats: 1, used formats: 1
    Average record length: 215.83, total records: 4079052
    Average version length: 0.00, total versions: 0, max versions: 0
    Average fragment length: 0.00, total fragments: 0, max fragments: 0
    Average unpacked length: 670.00, compression ratio: 3.10
    Pointer pages: 21, data page slots: 65832
    Data pages: 65832, average fill: 88%
    Primary pages: 65832, secondary pages: 0, swept pages: 0
    Empty pages: 4, full pages: 65824
    Fill distribution:
         0 - 19% = 5
        20 - 39% = 2
        40 - 59% = 0
        60 - 79% = 1
        80 - 99% = 65824

From the statistics, it can be seen that the compression ratio is even lower than in Firebird 4.0. So what accounts for such a colossal gain in performance? To understand this, we need to look at the structure of this table:

CREATE TABLE WORD_DICTIONARY (
    CODE_DICTIONARY      BIGINT NOT NULL,
    CODE_PART_OF_SPEECH  INTEGER NOT NULL,
    CODE_WORD_GENDER     INTEGER NOT NULL,
    CODE_WORD_AFFIXE     INTEGER NOT NULL,
    CODE_WORD_TENSE      INTEGER DEFAULT -1 NOT NULL,
    NAME                 VARCHAR(50) NOT NULL COLLATE UNICODE_CI,
    PARAMS               VARCHAR(80),
    ANIMATE              D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */,
    PLURAL               D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */,
    INVARIABLE           D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */,
    TRANSITIVE           D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */,
    IMPERATIVE           D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */,
    PERFECT              D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */,
    CONJUGATION          SMALLINT,
    REFLEXIVE            D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */,
    PROHIBITION          D_BOOL DEFAULT 'Нет' NOT NULL /* D_BOOL = VARCHAR(3) CHECK (VALUE IN('Да', 'Нет')) */
);

In this table, only the fields NAME and PARAMS can be well compressed. Since the fields of type INTEGER have the NOT NULL modifier, and the field takes up 4 bytes, such fields are not compressed in Firebird 5.0. Fields with the D_BOOL domain in UTF8 encoding can be compressed for the value 'Yes' (12 - 4 = 8 bytes) and will not be for the value 'No' (12 - 6 = 6 bytes).

Since the table has many short sequences that could be compressed in Firebird 4.0 and are not compressed in Firebird 5.0, the number of runs processed for unpacking in Firebird 5.0 is less, which gives us a performance gain.

Now I will show an example where the new RLE algorithm greatly wins in compression. For this, we will execute the following script:

CREATE TABLE GOOD_ZIP
(
  ID BIGINT NOT NULL,
  NAME VARCHAR(100),
  DESCRIPTION VARCHAR(1000),
  CONSTRAINT PK_GOOD_ZIP PRIMARY KEY(ID)
);

SET TERM ^;

EXECUTE BLOCK
AS
DECLARE I BIGINT = 0;
BEGIN
  WHILE (I < 100000) DO
  BEGIN
    I = I + 1;
    INSERT INTO GOOD_ZIP (
      ID,
      NAME,
      DESCRIPTION
    )
    VALUES (
      :I,
      'OBJECT_' || :I,
      'OBJECT_' || :I
    );
  END
END^


SET TERM ;^

COMMIT;

And now let’s look at the statistics of the table GOOD_ZIP in Firebird 4.0 and Firebird 5.0.

Statistics in Firebird 4.0
GOOD_ZIP (128)
    Primary pointer page: 222, Index root page: 223
    Total formats: 1, used formats: 1
    Average record length: 111.09, total records: 100000
    Average version length: 0.00, total versions: 0, max versions: 0
    Average fragment length: 0.00, total fragments: 0, max fragments: 0
    Average unpacked length: 4420.00, compression ratio: 39.79
    Pointer pages: 2, data page slots: 1936
    Data pages: 1936, average fill: 81%
    Primary pages: 1936, secondary pages: 0, swept pages: 0
    Empty pages: 0, full pages: 1935
    Fill distribution:
         0 - 19% = 0
        20 - 39% = 0
        40 - 59% = 1
        60 - 79% = 5
        80 - 99% = 1930
Statistics in Firebird 5.0
GOOD_ZIP (128)
    Primary pointer page: 225, Index root page: 226
    Total formats: 1, used formats: 1
    Average record length: 53.76, total records: 100000
    Average version length: 0.00, total versions: 0, max versions: 0
    Average fragment length: 0.00, total fragments: 0, max fragments: 0
    Average unpacked length: 4420.00, compression ratio: 82.22
    Pointer pages: 1, data page slots: 1232
    Data pages: 1232, average fill: 70%
    Primary pages: 1232, secondary pages: 0, swept pages: 0
    Empty pages: 2, full pages: 1229
    Fill distribution:
         0 - 19% = 3
        20 - 39% = 0
        40 - 59% = 0
        60 - 79% = 1229
        80 - 99% = 0

As you can see, in this case the compression ratio in Firebird 5.0 is twice as high!

And finally, let’s look at an example with non-compressible data. For this, we will execute the script:

CREATE TABLE NON_ZIP
(
  UID BINARY(16) NOT NULL,
  REF_UID_1 BINARY(16) NOT NULL,
  REF_UID_2 BINARY(16) NOT NULL
);

SET TERM ^;

EXECUTE BLOCK
AS
DECLARE I BIGINT = 0;
BEGIN
  WHILE (I < 100000) DO
  BEGIN
    I = I + 1;
    INSERT INTO NON_ZIP (
      UID,
      REF_UID_1,
      REF_UID_2
    )
    VALUES (
      GEN_UUID(),
      GEN_UUID(),
      GEN_UUID()
    );
  END
END^


SET TERM ;^

COMMIT;

Let’s look at the statistics of the table NON_ZIP in v4 and v5:

Statistics in Firebird 4.0
NON_ZIP (129)
    Primary pointer page: 2231, Index root page: 2312
    Total formats: 1, used formats: 1
    Average record length: 53.00, total records: 100000
    Average version length: 0.00, total versions: 0, max versions: 0
    Average fragment length: 0.00, total fragments: 0, max fragments: 0
    Average unpacked length: 52.00, compression ratio: 0.98
    Pointer pages: 1, data page slots: 1240
    Data pages: 1240, average fill: 69%
    Primary pages: 1240, secondary pages: 0, swept pages: 0
    Empty pages: 5, full pages: 1234
    Fill distribution:
         0 - 19% = 5
        20 - 39% = 1
        40 - 59% = 0
        60 - 79% = 1234
        80 - 99% = 0
Statistics in Firebird 5.0
NON_ZIP (129)
    Primary pointer page: 1587, Index root page: 1588
    Total formats: 1, used formats: 1
    Average record length: 52.00, total records: 100000
    Average version length: 0.00, total versions: 0, max versions: 0
    Average fragment length: 0.00, total fragments: 0, max fragments: 0
    Average unpacked length: 52.00, compression ratio: 1.00
    Pointer pages: 1, data page slots: 1240
    Data pages: 1240, average fill: 68%
    Primary pages: 1240, secondary pages: 0, swept pages: 0
    Empty pages: 5, full pages: 1234
    Fill distribution:
         0 - 19% = 5
        20 - 39% = 1
        40 - 59% = 0
        60 - 79% = 1234
        80 - 99% = 0

In Firebird 4.0, as a result of compression, the record length increased, Firebird 5.0 saw that as a result of compression, the records become longer and saved the record as it is.

3. Cache of prepared (compiled) statements

The prepared queries cache is one of the most impressive new features of version 5.0. Simply enabling the parameter in the configuration can speed up the application in some scenarios (with frequent queries) by several times.

3.1. A little theory

Any SQL query goes through two mandatory stages: preparation (compilation) and execution.

During the preparation of the query, its syntactic analysis, allocation of buffers for input and output messages, construction of the query plan and its execution tree are performed.

If the application requires multiple execution of the same query with a different set of input parameters, then prepare is usually called separately, the handle of the prepared query is saved in the application, and then execute is called for this handle. This allows to reduce the costs of re-preparing the same query or each execution.

Starting from Firebird 5.0, a cache of compiled (prepared) queries is supported for each connection. This allows to reduce the costs for re-preparing the same queries, if your application does not use explicit caching of handles of prepared queries (at the global level this is not always easy).

By default, caching is enabled, the caching threshold is determined by the parameter MaxStatementCacheSize in firebird.conf. It can be disabled by setting MaxStatementCacheSize to zero. The cache is maintained automatically: cached statements become invalid when necessary (usually when executing any DDL statement).

Note

A query is considered the same if it matches exactly by character, that is, if you have semantically identical queries, but they differ by a comment, then for the cache of prepared queries these are different queries.

In addition to top-level queries, stored procedures, functions and triggers also fall into the cache of prepared queries. The contents of the compiled queries cache can be viewed using the new monitoring table MON$COMPILED_STATEMENTS.

Table 1. Description of the columns of the table MON$COMPILED_STATEMENTS
Column name Datatype Description

MON$COMPILED_STATEMENT_ID

BIGINT

Identified of compiled query

MON$SQL_TEXT

BLOB TEXT

The text of the statement in SQL language. Inside PSQL objects, the text of SQL statements is not displayed.

MON$EXPLAINED_PLAN

BLOB TEXT

Operator’s plan in 'explain' format.

MON$OBJECT_NAME

CHAR(63)

Name of PSQL object (trigger, stored function or stored procedure), where this SQL operator was compiled.

MON$OBJECT_TYPE

SMALLINT

Тип объекта. 2 — trigger;
5 — stored procedure;
15 — stored function.

MON$PACKAGE_NAME

CHAR(63)

Name of PSQL package

MON$STAT_ID

INTEGER

Identifier of statistics

A new column MON$COMPILED_STATEMENT_ID has appeared in the tables MON$STATEMENTS and MON$CALL_STACK, which refers to the corresponding prepared statement in MON$COMPILED_STATEMENTS.

The monitoring table MON$COMPILED_STATEMENTS allows you to easily get the plans of internal queries in a stored procedure, for example like this:

SELECT CS.MON$EXPLAINED_PLAN
FROM MON$COMPILED_STATEMENTS CS
WHERE CS.MON$OBJECT_NAME = 'SP_PEDIGREE'
  AND CS.MON$OBJECT_TYPE = 5
ORDER BY CS.MON$COMPILED_STATEMENT_ID DESC
FETCH FIRST ROW ONLY
Note

Note that the same stored procedure can appear in MON$COMPILED_STATEMENTS multiple times. This is because currently the cache of prepared queries is made for each connection. In future versions, it is planned to make the cache of compiled queries and the metadata cache common for all connections in the Super Server architecture.

4. Support for bidirectional cursors in the network protocol

A cursor in SQL is an object that allows you to move through the records of any result set. It can be used to process a single database record returned by a query. There are unidirectional and bidirectional (scrollable) cursors.

A unidirectional cursor does not support scrolling, that is, retrieving records from such a cursor is possible only sequentially, from the beginning to the end of the cursor. This type of cursor is available in Firebird from the earliest versions, both in PSQL (explicitly declared and implicit cursors) and through the API.

A scrollable or bidirectional cursor allows you to move through the cursor in any direction, jump around and even move to a given position. Support for bidirectional (scrollable) cursors first appeared in Firebird 3.0. They are also available in PSQL and through the API interface.

However, until Firebird 5.0, scrollable cursors were not supported at the network protocol level. This means that you could use the API of bidirectional cursors in your application, only if your connection occurs in embedded mode. Starting from Firebird 5.0 you can use the API of scrollable cursors even if you connect to the database over the network protocol, while the client library fbclient must be no lower than version 5.0.

If your application does not use fbclient, for example written in Java or .NET, then the corresponding driver must support the network protocol Firebird 5.0. For example, Jaybird 5 supports bidirectional cursors in the network protocol.

5. Tracing the COMPILE event

In Firebird 5.0, it became possible to track a new tracing event: parsing stored modules. It allows you to track the moments of parsing stored modules, the corresponding time spent and most importantly - the plans of queries inside these PSQL modules. Tracking the plan is also possible if the PSQL module was already loaded before the start of the tracing session; in this case, the plan will be reported during the first execution noticed by the tracing session.

The following parameters have appeared in the tracing configuration to track the module parsing event:

  • log_procedure_compile - enables tracing of procedure parsing events;

  • log_function_compile - enables tracing of function parsing events;

  • log_trigger_compile - enables tracing of trigger parsing events.

Suppose we have the following query:

SELECT * FROM SP_PEDIGREE(7435, 8, 1);

To track the plan of a stored procedure in a tracing session, you need to set the parameter log_procedure_compile = true. In this case, when preparing this query or executing it, a procedure parsing event will appear in the tracing log, which looks like this:

2023-10-18T20:40:51.7620 (3920:00000000073A17C0) COMPILE_PROCEDURE
	horses (ATT_30, SYSDBA:NONE, UTF8, TCPv6:::1/54464)
	C:\Firebird\5.0\isql.exe:10960

Procedure SP_PEDIGREE:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Cursor "V" (scrollable) (line 19, column 3)
    -> Record Buffer (record length: 132)
        -> Nested Loop Join (inner)
            -> Window
                -> Window Partition
                    -> Record Buffer (record length: 82)
                        -> Sort (record length: 84, key length: 12)
                            -> Window Partition
                                -> Window Buffer
                                    -> Record Buffer (record length: 41)
                                        -> Procedure "SP_HORSE_INBRIDS" as "V H_INB SP_HORSE_INBRIDS" Scan
            -> Filter
                -> Table "HUE" as "V HUE" Access By ID
                    -> Bitmap
                        -> Index "HUE_IDX_ORDER" Range Scan (full match)
Select Expression (line 44, column 3)
    -> Recursion
        -> Filter
            -> Table "HORSE" as "PEDIGREE HORSE" Access By ID
                -> Bitmap
                    -> Index "PK_HORSE" Unique Scan
        -> Union
            -> Filter (preliminary)
                -> Filter
                    -> Table "HORSE" as "PEDIGREE HORSE" Access By ID
                        -> Bitmap
                            -> Index "PK_HORSE" Unique Scan
            -> Filter (preliminary)
                -> Filter
                    -> Table "HORSE" as "PEDIGREE HORSE" Access By ID
                        -> Bitmap
                            -> Index "PK_HORSE" Unique Scan
     28 ms

6. Per-table statistics in isql

Per-table statistics show how many records for each table were read by a full scan, how many using an index, how many inserted, updated or deleted and other counters. The values of these counters have been available for a long time through the API function isc_database_info, which was used by many graphical tools, but not by the console tool isql. The values of these same counters can be obtained by using the monitoring tables MON$RECORD_STATS and MON$TABLE_STATS, or in tracing. Starting from Firebird 5.0, this useful feature appeared in isql.

By default, per-table statistics output is disabled.

To enable it, you need to type the command:

SET PER_TAB ON;

To disable:

SET PER_TAB OFF;

The command SET PER_TAB without the words ON or OFF toggles the state of statistics output.

The full syntax of this command can be obtained using the command HELP SET.

Example of per-table statistics output:

SQL> SET PER_TAB ON;

SQL> SELECT COUNT(*)
CON> FROM HORSE
CON> JOIN COLOR ON COLOR.CODE_COLOR = HORSE.CODE_COLOR
CON> JOIN BREED ON BREED.CODE_BREED = HORSE.CODE_BREED;

                COUNT
=====================
               519623

Per table statistics:
--------------+---------+---------+---------+---------+---------+---------+---------+---------+
 Table name   | Natural | Index   | Insert  | Update  | Delete  | Backout | Purge   | Expunge |
--------------+---------+---------+---------+---------+---------+---------+---------+---------+
BREED         |      282|         |         |         |         |         |         |         |
COLOR         |      239|         |         |         |         |         |         |         |
HORSE         |         |   519623|         |         |         |         |         |         |
--------------+---------+---------+---------+---------+---------+---------+---------+---------+

7. Summary

We’ve just explored some of the cool new things that Firebird 5 can do for you, like making your code faster, easier, and more secure. But wait, there’s more! We still have one more exciting topic to cover: the SQL profiler in 5.0. Don’t go anywhere!