Library

Detailed New Features of Firebird 5. Part 1: Improvements in Optimizer

11.11.2023, Denis Simonov, edited and adjusted by Alexey Kovyazin

This material is sponsored and created with the sponsorship and support of IBSurgeon, vendor of HQbird (advanced distribution of Firebird) and supplier of performance optimization, migration and technical support services for Firebird.

The material is licensed under Public Documentation License https://www.firebirdsql.org/file/documentation/html/en/licenses/pdl/public-documentation-license.html

All parts:

Detailed New Features of Firebird 5. Part 1: Improvements in Optimizer
Detailed New Features of Firebird 5. Part 2: SQLs
Detailed New Features of Firebird 5, Part 3: SKIP LOCK
Detailed New Features of Firebird 5, Part 4: Parallel Features: backup/restore/sweep, creation of indices
Detailed New Features of Firebird 5, Part 5: Better compression, cache of compiled statements, and more
Detailed New Features of Firebird 5, Part 6, SQL Profiling - coming soon
Parallel Reading Of Data in Firebird

Preface

Recently Firebird Project has released Release Candidate of Firebird 5.0, it means that Firebird 5.0 final release is very, very close, and now it is a good time to learn about new features of Firebird 5. Firebird 5 is the 8^th major release of Firebird, and it seems like it will be one of the best releases so far.

In Firebird 5.0 Firebird Project had focus on performance improvements in various areas:

Optimizer improvements – SQLs runs faster due to better execution plans
Scalability in multi-user environments – Firebird can serve more concurrent connections and queries in highly concurrent environments
Parallel backup, restore, sweep, index creation – up to 6-10x faster (depends on hardware)
Cache of compiled prepared statements – up to 25% of improvement for frequent queries
Improved compression of records – Firebird 5.0 works faster with large VARCHARs
Profiling plugin – identify bottlenecks and slow code inside complex stored procedures and triggers

There is also new SQLs and other features. Due to the large amount of the material, we will split it to several parts, and we start with the improvements in the optimizer of Firebird.

1. Improvements in Optimizer

The optimizer is a part of Firebird database engine which is responsible for decision: how to execute the SQL on the specific database in the fastest way. In v5.0 Firebird optimizer has received the biggest amount of changes since version 2.0. It is the most practical part which directly improves the performance of SQLs for databases of any size, from 1Gb to 1Tb. Let’s see in details what was improved, with examples and performance comparisons.

1.1. Cost estimation of HASH vs NESTED LOOP JOIN

HASH JOIN appeared in Firebird 3.0.

The [simplified] idea of HASH JOIN is to cache the small dataset (e.g., a small table) into the memory, calculate hashes of keys, and use the hash of the key to join with larger dataset (e.g., a large table). If there will be the same hash for the several records, they will be processed one by one, to find the exactly the same key. HASH JOIN works only with strict key equality (i.e., =), and allows expressions with keys.

Until version 5.0, Firebird optimizer uses HASH JOIN only in a case of absence of indices for the join condition. If there is index, optimizer of pre-v5 version will use NESTED LOOP JOIN (usually it is INNER JOIN) with index. However, it is not always the fastest way: for example, if very large table is joined with small table using the primary key using INNER JOIN, each record of small table will be read multiple times (data pages and appropriate index pages). With HASH JOIN, the small table will be read once, cached, and calculated hashes will be used to join with very large table.

The obvious question is when to use HASH JOIN and when NESTED LOOP JOIN (INNER JOIN in majority cases)? It is relatively easy to decide to use HASH when small table like CLIENT_COUNTRY is joining with large table like INVOICES, but not all situations are clear. Until Firebird 5, the decision was the sole responsibility of the developer, since it required change of the text of SQL query, now it is done by Firebird optimizer using cost estimation algorithm.

To better understand the effect of HASH JOIN, let’s consider how the same query will be executed in Firebird 4.0 and in Firebird 5.0. To analyze the difference, we have EXPLAIN PLAN, query execution statistics and extended statistics from the isql with set per-tab option (new in 5.0).

In the example below, we the large table named HORSE, where we have list of horses, and small tables with a few records: SEX, COLOR, BREED, FARM. The query selects all records from the large table.

We use COUNT(*) to read all records, and to exclude time to transfer records from the server to the client.

SELECT
  COUNT(*)
FROM
  HORSE
  JOIN SEX ON SEX.CODE_SEX = HORSE.CODE_SEX
  JOIN COLOR ON COLOR.CODE_COLOR = HORSE.CODE_COLOR
  JOIN BREED ON BREED.CODE_BREED = HORSE.CODE_BREED
  JOIN FARM ON FARM.CODE_FARM = HORSE.CODE_FARM

In Firebird 4.0:

Select Expression
    -> Aggregate
        -> Nested Loop Join (inner)
            -> Table "COLOR" Full Scan
            -> Filter
                -> Table "HORSE" Access By ID
                    -> Bitmap
                        -> Index "FK_HORSE_COLOR" Range Scan (full match)
            -> Filter
                -> Table "SEX" Access By ID
                    -> Bitmap
                        -> Index "PK_SEX" Unique Scan
            -> Filter
                -> Table "BREED" Access By ID
                    -> Bitmap
                        -> Index "PK_BREED" Unique Scan
            -> Filter
                -> Table "FARM" Access By ID
                    -> Bitmap
                        -> Index "PK_FARM" Unique Scan

                COUNT
=====================
               519623

Current memory = 2614108752
Delta memory = 438016
Max memory = 2614392048
Elapsed time = 2.642 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 5857109
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
BREED                           |         |   519623|         |         |         |
COLOR                           |      239|         |         |         |         |
FARM                            |         |   519623|         |         |         |
HORSE                           |         |   519623|         |         |         |
SEX                             |         |   519623|         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

And in Firebird 5.0. The optimizer of v5 uses cardinality of table to decide when to use HASH JOIN.

Select Expression
    -> Aggregate
        -> Filter
            -> Hash Join (inner)
                -> Hash Join (inner)
                    -> Hash Join (inner)
                        -> Nested Loop Join (inner)
                            -> Table "COLOR" Full Scan
                            -> Filter
                                -> Table "HORSE" Access By ID
                                    -> Bitmap
                                        -> Index "FK_HORSE_COLOR" Range Scan (full match)
                        -> Record Buffer (record length: 25)
                            -> Table "SEX" Full Scan
                    -> Record Buffer (record length: 25)
                        -> Table "BREED" Full Scan
                -> Record Buffer (record length: 33)
                    -> Table "FARM" Full Scan

                COUNT
=====================
               519623

Current memory = 2579749376
Delta memory = 352
Max memory = 2582802608
Elapsed time = 0.702 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 645256
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
BREED                           |      282|         |         |         |         |
COLOR                           |      239|         |         |         |         |
FARM                            |    36805|         |         |         |         |
HORSE                           |         |   519623|         |         |         |
SEX                             |        4|         |         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

As you can see, HASH JOIN in this situation is 3.5 times faster!

1.2. Cost estimation of HASH vs MERGE JOIN

In Firebird 3.0 the algorithm MERGE JOIN was temporary disabled in favor of HASH JOIN. Usually it was used when NESTED LOOP JOIN was non-optimal (when there were no indices for join condition, or when joining datasets were not related).

And, in the majority of situations, HASH JOIN is more effective than MERGE JOIN, due to the fact that it does not require to sort joining datasets with keys before the merge, however, there are a few cases when MERGE JOIN is better than HASH JOIN:

Merged datasets are already sorted with the join key, for example, merge of 2 subselects with keys specfied in GROUP BY:
```
select count(*)
from
(
    select code_father+0 as code_father, count(*) as cnt
    from horse group by 1
) h
join (
    select code_father+0 as code_father, count(*) as cnt
    from cover group by 1
) c on h.code_father = c.code_father
```
In this example, merged datasets are already sorted by the key code_father, and we don’t need to sort them again, in this case MERGE JOIN will be the most effective.

Unfortunately, Firebird 5.0 cannot recognize this situation, hopefully it will appear in the next version.
Merged datasets are very large. In this case hash-table will become very large and will not fit into memory. The optimizer of Firebird v5 checks cardinalities of merged datasest (tables, for exemple), and if the smallest is more than 1009000 records, v5 will choose MERGE JOIN instead of HASH JOIN. In the explain plan we will see it in the following manner:

SELECT
  *
FROM
  BIG_1
  JOIN BIG_2 ON BIG_2.F_2 = BIG_1.F_1
Select Expression
    -> Filter
        -> Merge Join (inner)
            -> Sort (record length: 44, key length: 12)
                -> Table "BIG_2" Full Scan
            -> Sort (record length: 44, key length: 12)
                -> Table "BIG_1" Full Scan

1.3. Transforming OUTER JOIN into INNER JOIN

In current versions of Firebird (including v5), OUTER JOIN (LEFT JOIN, as the most often example), can be executed only with algorithm NESTED LOOP JOIN, with index for the key of joining table (if possible). The biggest restriction of the LEFT JOIN is the strict order of joining, so optimizer cannot change it to keep the result set exactly the same as it was designed by the developer.

However, if there is non-NULL condition in the “right” (joining) table, the OUTER JOIN works as INNER, with the exception of join order – it is still locked. However, it is true only for previous versions: in v5 Firebird will transform the OUTER join to the INNER, and optimize it.

For example, we have the following query, where we join the large table HORSES with small table FARM, using LEFT JOIN. As you can see, the query has condition on the “right” table FARM, which effectively excludes NULLified records which could produce LEFT JOIN, it means that it is implicit INNER JOIN, but with forced order of join, which prevents optimizer of pre-v5 to filter records on FARM first.

SELECT
  COUNT(*)
FROM
  HORSE
  LEFT JOIN FARM ON FARM.CODE_FARM = HORSE.CODE_FARM
WHERE FARM.CODE_COUNTRY = 1

The result in Firebird 4.0:

Select Expression
    -> Aggregate
        -> Filter
            -> Nested Loop Join (outer)
                -> Table "HORSE" Full Scan
                -> Filter
                    -> Table "FARM" Access By ID
                        -> Bitmap
                            -> Index "PK_FARM" Unique Scan

                COUNT
=====================
               345525

Current memory = 2612613792
Delta memory = 0
Max memory = 2614392048
Elapsed time = 1.524 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 2671475
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
FARM                            |         |   519623|         |         |         |
HORSE                           |   519623|         |         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

As you can see, Firebird v4 follows the specified order of joining – first, it reads the whole table HORSE without an index (there is not condition on HORSE), then join FARM using the join condition index.

In Firebird 5.0 the plan is different:

Select Expression
    -> Aggregate
        -> Nested Loop Join (inner)
            -> Filter
                -> Table "FARM" Access By ID
                    -> Bitmap
                        -> Index "FK_FARM_COUNTRY" Range Scan (full match)
            -> Filter
                -> Table "HORSE" Access By ID
                    -> Bitmap
                        -> Index "FK_HORSE_FARMBORN" Range Scan (full match)

                COUNT
=====================
               345525

Current memory = 2580089760
Delta memory = 240
Max memory = 2582802608
Elapsed time = 0.294 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 563801
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
FARM                            |         |    32787|         |         |         |
HORSE                           |         |   345525|         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

As you can see, Firebird 5.0 recognized that LEFT JOIN can be transformed to INNER due to the condition on the right side, and applied the index for FARM.

As a result, the same query in Firebird 5.0 is executed 4x times faster.

This transformation is very important for dynamically built queries, with custom conditions (various reports, or ORM-generated queries).

1.4. Preliminary evaluation of invariant predicates

In Firebird 5.0, if the predicate in WHERE is invariant (i.e., it does not depend on the fields of datasets/tables), and it is FALSE, the optimizer will not read data from the dataset.

The simplest example is the always false condition 1=0. The idea of such condition is usually to return 0 records from the query.

SELECT COUNT(*) FROM HORSE
WHERE 1=0;

In Firebird 4.0

Select Expression
    -> Aggregate
        -> Filter
            -> Table "HORSE" Full Scan

                COUNT
=====================
                    0

Current memory = 2612572768
Delta memory = 0
Max memory = 2614392048
Elapsed time = 0.137 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 552573
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
HORSE                           |   519623|         |         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

As you can see, Firebrd 4.0 has read all records in table HORSE, to check that there is no record which corresponds to the condition 1=0. Not very intelligent, right?

In Firebird 5.0

Select Expression
    -> Aggregate
        -> Filter (preliminary)
            -> Table "HORSE" Full Scan

                COUNT
=====================
                    0

Current memory = 2580339248
Delta memory = 176
Max memory = 2582802608
Elapsed time = 0.005 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 0

As you can see, the statistics of the query shows 0 reads and 0 fetches, it means that Firebird 5.0 did not read anything from disk and from cache – the optimizer of v5 has calculated the value of the invariant predicate (1=0) before accessing table HORSE and excluded it.

The preliminary evaluation of invariant predicates is shown in the explain plan as Filter (preliminary).

Practically, this feature is very useful for dynamically built queries. For example, we have the following query with parameter :A.

SELECT * FROM HORSE
WHERE :A=1;

The parameter :A does not depend on the fields of dataset (table HORSE), so it can be preliminary calculated, so we can “turn on” and “turn off” this query with this parameter.

Let’s consider more practical example: we need CTE to recursively find еру pedigree of the horse up to the 5th generation.

WITH RECURSIVE
  R AS (
    SELECT
      CODE_HORSE,
      CODE_FATHER,
      CODE_MOTHER,
      0 AS DEPTH
    FROM HORSE
    WHERE CODE_HORSE = ?
    UNION ALL
    SELECT
      HORSE.CODE_HORSE,
      HORSE.CODE_MOTHER,
      HORSE.CODE_FATHER,
      R.DEPTH + 1
    FROM R
      JOIN HORSE ON HORSE.CODE_HORSE = R.CODE_FATHER
    WHERE R.DEPTH < 5
    UNION ALL
    SELECT
      HORSE.CODE_HORSE,
      HORSE.CODE_MOTHER,
      HORSE.CODE_FATHER,
      R.DEPTH + 1
    FROM R
      JOIN HORSE ON HORSE.CODE_HORSE = R.CODE_MOTHER
    WHERE R.DEPTH < 5
  )
SELECT *
FROM R

Query’s execution statistics in Firebird 4.0 looks as below (plan is not shown purposely):

Current memory = 2612639872
Delta memory = 0
Max memory = 2614392048
Elapsed time = 0.027 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 610
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
HORSE                           |         |      127|         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

Let’s compare with Firebird 5.0:

Select Expression
    -> Recursion
        -> Filter
            -> Table "HORSE" as "R HORSE" Access By ID
                -> Bitmap
                    -> Index "PK_HORSE" Unique Scan
        -> Union
            -> Filter (preliminary)
                -> Filter
                    -> Table "HORSE" as "R HORSE" Access By ID
                        -> Bitmap
                            -> Index "PK_HORSE" Unique Scan
            -> Filter (preliminary)
                -> Filter
                    -> Table "HORSE" as "R HORSE" Access By ID
                        -> Bitmap
                            -> Index "PK_HORSE" Unique Scan

Current memory = 2580444832
Delta memory = 768
Max memory = 2582802608
Elapsed time = 0.024 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 252
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
HORSE                           |         |       63|         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

As you can see, Firebird used 2x less reads of table HORSE, die to the fact that condition R.DEPTH < 5 is invariant for the each step of recursive query.

1.5. Faster IN with list of constants

Until Firebird 5.0, the predicate IN with a list of constants is limited by 1500 elements, and it was processed recursively, with transformation to the list of OR conditions.

It means that in pre-v5,

F IN (V1, V2, ... VN)

is actually transformed to

(F = V1) OR (F = V2) OR .... (F = VN)

Starting with Firebird 5.0 the processing of IN is linear, the of 1500 elements is increased to 65535.

In Firebird 5.0, list of constants in IN is cached as binary search tree to speed up the comparison

Let’s see the following example:

SELECT
  COUNT(*)
FROM COVER
WHERE CODE_COVERRESULT+0 IN (151, 152, 156, 158, 159, 168, 170, 200, 202)

In this case we added CODE_COVERRESULT+0 purposely to disable usage of index.

In Firebird 4.0

Select Expression
    -> Aggregate
        -> Filter
            -> Table "COVER" Full Scan

                COUNT
=====================
                45231

Current memory = 2612795072
Delta memory = -288
Max memory = 2614392048
Elapsed time = 0.877 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 738452
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
COVER                           |   713407|         |         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

In Firebird 5.0

Select Expression
    -> Aggregate
        -> Filter
            -> Table "COVER" Full Scan

                COUNT
=====================
                45231

Current memory = 2580573216
Delta memory = 224
Max memory = 2582802608
Elapsed time = 0.332 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 743126
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
COVER                           |   713407|         |         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

As you can see, even for small list in this example, and despite the fact that number of reads of table COVER did not change, the query is 2.5x faster.

If list is very long, or if predicate IN is not selective, index scanning will use search of groups with the pointer of the same level (i.e., horisontal), and not the seach of each group from the root (i.e., vertical) – it means that it will use the single index scan for all values in the IN list.

See the following example:

SELECT
  COUNT(*)
FROM LAB_LINE
WHERE CODE_LABTYPE IN (4, 5)

The result in Firebird 4.0:

Select Expression
    -> Aggregate
        -> Filter
            -> Table "LAB_LINE" Access By ID
                -> Bitmap Or
                    -> Bitmap
                        -> Index "FK_LAB_LINE_LABTYPE" Range Scan (full match)
                    -> Bitmap
                        -> Index "FK_LAB_LINE_LABTYPE" Range Scan (full match)

                COUNT
=====================
               985594

Current memory = 2614023968
Delta memory = 0
Max memory = 2614392048
Elapsed time = 0.361 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 992519
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
LAB_LINE                        |         |   985594|         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

In Firebird 5.0:

Select Expression
    -> Aggregate
        -> Filter
            -> Table "LAB_LINE" Access By ID
                -> Bitmap
                    -> Index "FK_LAB_LINE_LABTYPE" List Scan (full match)

                COUNT
=====================
               985594

Current memory = 2582983152
Delta memory = 176
Max memory = 2583119072
Elapsed time = 0.306 sec
Buffers = 153600
Reads = 0
Writes = 0
Fetches = 993103
Per table statistics:
--------------------------------+---------+---------+---------+---------+---------+
 Table name                     | Natural | Index   | Insert  | Update  | Delete  |
--------------------------------+---------+---------+---------+---------+---------+
LAB_LINE                        |         |   985594|         |         |         |
--------------------------------+---------+---------+---------+---------+---------+

Though the time of query is not changed, the plan is different: instead of 2 Range Scan and bitmap join through OR, Firebird 5 uses the new access method with one-step index list scan, namely LIST SCAN in the explained plan.

1.6. Optimizer strategy ALL ROWS vs FIRST ROWS

There are 2 strategies to optimize queries:

FIRST ROWS — when optimizer builds query plan to return as soon as possible the first rows of the resulting dataset;
ALL ROWS — when optimizer builds query plan to return all rows of the resulting dataset as soon as possible.

Until Firebird 5.0 these strategies also existed, but there was no way to control which one to use.

The default strategy was ALL ROWS, however, in case of clause FIRST …, ROWS … or FETCH FIRST n ROWS, the optimizer had used the strategy to FIRST ROWS. Also, for subselects in IN and in EXISTS it also used strategy FIRST ROWS.

Starting with Firebird 5.0, by default will be used the optimization strategy specified in the parameter OptimizeForFirstRows of firebird.conf or database.conf.

OptimizeForFirstRows = false means strategy ALL ROWS, OptimizeForFirstRows = true means FIRST ROWS.

It is possible to change the optimization strategy for the current session (connection) with the following command:

SET OPTIMIZE FOR {FIRST | ALL} ROWS

It can be useful for reporting and BI applications.

Also, the strategy can be set on the level of the SQL command with clause OPTIMIZE FOR.

SELECT query with clause OPTIMIZE FOR has the following syntax:

SELECT ...
FROM [...]
[WHERE ...]
[...]
[OPTIMIZE FOR {FIRST | ALL} ROWS]

Clause OPTIMIZE FOR should be specified in the end of SELECT query. In PSQL it should be set right before the clause INTO.

A bit of internals behind optimization strategies

Datasets in the query can be conveyors or buffered:

Conveyor dataset returns records during the reading process of its input,
Buffered dataset need to read all records from input, and only after competition it can return the first row.

If active strategy is FIRST ROWS, the optimizer will try to avoid buffered datasets (i.e., SORT or HASH JOIN).

Let’s see how the choice of the optimization strategy changes the query plan – for this let’s use clause OPTIMIZE FOR.

The example of query and plan with optimizer strategy ALL ROWS:

SELECT
  HORSE.NAME AS HORSENAME,
  SEX.NAME AS SEXNAME,
  COLOR.NAME AS COLORNAME
FROM
  HORSE
  JOIN SEX ON SEX.CODE_SEX = HORSE.CODE_SEX
  JOIN COLOR ON COLOR.CODE_COLOR = HORSE.CODE_COLOR
ORDER BY HORSE.NAME
OPTIMIZE FOR ALL ROWS
Select Expression
    -> Sort (record length: 876, key length: 304)
        -> Filter
            -> Hash Join (inner)
                -> Nested Loop Join (inner)
                    -> Table "COLOR" Full Scan
                    -> Filter
                        -> Table "HORSE" Access By ID
                            -> Bitmap
                                -> Index "FK_HORSE_COLOR" Range Scan (full match)
                -> Record Buffer (record length: 113)
                    -> Table "SEX" Full Scan

The example of query and plan with optimizer strategy FIRST ROWS:

SELECT
  HORSE.NAME AS HORSENAME,
  SEX.NAME AS SEXNAME,
  COLOR.NAME AS COLORNAME
FROM
  HORSE
  JOIN SEX ON SEX.CODE_SEX = HORSE.CODE_SEX
  JOIN COLOR ON COLOR.CODE_COLOR = HORSE.CODE_COLOR
ORDER BY HORSE.NAME
OPTIMIZE FOR FIRST ROWS
Select Expression
    -> Nested Loop Join (inner)
        -> Table "HORSE" Access By ID
            -> Index "HORSE_IDX_NAME" Full Scan
        -> Filter
            -> Table "SEX" Access By ID
                -> Bitmap
                    -> Index "PK_SEX" Unique Scan
        -> Filter
            -> Table "COLOR" Access By ID
                -> Bitmap
                    -> Index "PK_COLOR" Unique Scan

As you can see, in the first case, the optimizer has chosen the HASH JOIN and SORT to return all records of the query as soon as possible.

In the second case, the optimizer has chosen the index (ORDER index) and join with NESTED LOOP, because this plan will return the first rows as fast as possible.

1.7. Improved plan output

Queries plans are important for the understanding of the performance of queries, and in Firebird 5 we have better representation of explain plan’s elements.

As you know, there are 2 types of plans: legacy and explained.

Now in the output of explain plan we can see user’s SELECTs (shown as “select expressions”), declared PSQL cursors and sub-queries.

Both legacy and explain plans now includes the information of cursor position (line/column) inside PSQL module.

To see the difference, let’s compare plan’s outputs for several Firebird 4.0 and Firebird 5.0.

Let's start with the query with the subquery:

SELECT *
FROM HORSE
WHERE EXISTS(SELECT * FROM COVER
             WHERE COVER.CODE_FATHER = HORSE.CODE_HORSE)

The explained plan in Firebird 4.0 will look like this:

Select Expression
    -> Filter
        -> Table "COVER" Access By ID
            -> Bitmap
                -> Index "FK_COVER_FATHER" Range Scan (full match)
Select Expression
    -> Filter
        -> Table "HORSE" Full Scan

In Firebird the plan will look like the following:

Sub-query
    -> Filter
        -> Table "COVER" Access By ID
            -> Bitmap
                -> Index "FK_COVER_FATHER" Range Scan (full match)
Select Expression
    -> Filter
        -> Table "HORSE" Full Scan

Now in the plan you can clearly see where the main query is and where the subquery is.

Now let's compare how the plan is displayed for PSQL, for example in the statement EXECUTE BLOCK:

EXECUTE BLOCK
RETURNS (
   CODE_COLOR INT,
   CODE_BREED INT
)
AS
BEGIN
  FOR
    SELECT CODE_COLOR
    FROM COLOR
    INTO CODE_COLOR
  DO
    SUSPEND;

  FOR
    SELECT CODE_BREED
    FROM BREED
    INTO CODE_BREED
  DO
    SUSPEND;
END

In Firebird 4.0, both legacy and explain plans will be printed for each cursor within a block, without additional details, just one after the other.

PLAN (COLOR NATURAL)
PLAN (BREED NATURAL)
Select Expression
    -> Table "COLOR" Full Scan
Select Expression
    -> Table "BREED" Full Scan

In Firebird 5.0, each cursor plan will be preceded by the number of the column and row where the cursor is declared.

-- line 8, column 3
PLAN (COLOR NATURAL)
-- line 15, column 3
PLAN (BREED NATURAL)
Select Expression (line 8, column 3)
    -> Table "COLOR" Full Scan
Select Expression (line 15, column 3)
    -> Table "BREED" Full Scan

Now let's compare the output of explain plans if the cursor is declared explicitly.

EXECUTE BLOCK
RETURNS (
   CODE_COLOR INT
)
AS
  DECLARE C1 CURSOR FOR (
    SELECT CODE_COLOR
    FROM COLOR
  );

  DECLARE C2 SCROLL CURSOR FOR (
    SELECT CODE_COLOR
    FROM COLOR
  );
BEGIN
  SUSPEND;
END

For Firebird 4.0 the plan will be like this:

Select Expression
    -> Table "COLOR" as "C1 COLOR" Full Scan
Select Expression
    -> Record Buffer (record length: 25)
        -> Table "COLOR" as "C2 COLOR" Full Scan

The plan gives the impression that the COLOR table has an alias of C1, although this is not the case.

In Firebird 5.0 the plan will be much clearer:

Cursor "C1" (line 6, column 3)
    -> Table "COLOR" as "C1 COLOR" Full Scan
Cursor "C2" (scrollable) (line 11, column 3)
    -> Record Buffer (record length: 25)
        -> Table "COLOR" as "C2 COLOR" Full Scan

Firstly, it is clear that we have cursors C1 and C2 declared in the block.
Secondly, an additional “scrollable” attribute has been introduced for a bidirectional cursor.

1.8. How to get stored procedure plans

Until Firebird 3.0, the engine had shown plans for stored procedures as a compilation of plans of internal queries, and it was often misleading. In Firebird 3.0, the plan for stored procedures is always shown as NATURAL, and it is also not the best solution.

In Firebird 5 there is better option.

If we will try to see the plan for the stored procedure for the following query, it will not return desired details:

SELECT *
FROM SP_PEDIGREE(?, 5, 1)
Select Expression
    -> Procedure "SP_PEDIGREE" Scan

As expected, the top-level query plan is displayed without the details of the cursor plans inside the stored procedure, like in Firebird 3-4.

Firebird 5.0 has cache of the compiled queries, and monitoring table MON$COMPILED_STATEMENTS can show the plan for compiled queries, including stored procedures.

Once we have prepared query with stored procedure, this procedure will be cached, and its plan can be viewed with the following query:

SELECT CS.MON$EXPLAINED_PLAN
FROM MON$COMPILED_STATEMENTS CS
WHERE CS.MON$OBJECT_NAME = 'SP_PEDIGREE'
  AND CS.MON$OBJECT_TYPE = 5
ORDER BY CS.MON$COMPILED_STATEMENT_ID DESC
FETCH FIRST ROW ONLY
Cursor "V" (scrollable) (line 19, column 3)
    -> Record Buffer (record length: 132)
        -> Nested Loop Join (inner)
            -> Window
                -> Window Partition
                    -> Record Buffer (record length: 82)
                        -> Sort (record length: 84, key length: 12)
                            -> Window Partition
                                -> Window Buffer
                                    -> Record Buffer (record length: 41)
                                        -> Procedure "SP_HORSE_INBRIDS" as "V H_INB SP_HORSE_INBRIDS" Scan
            -> Filter
                -> Table "HUE" as "V HUE" Access By ID
                    -> Bitmap
                        -> Index "HUE_IDX_ORDER" Range Scan (full match)
Select Expression (line 44, column 3)
    -> Recursion
        -> Filter
            -> Table "HORSE" as "PEDIGREE HORSE" Access By ID
                -> Bitmap
                    -> Index "PK_HORSE" Unique Scan
        -> Union
            -> Filter (preliminary)
                -> Filter
                    -> Table "HORSE" as "PEDIGREE HORSE" Access By ID
                        -> Bitmap
                            -> Index "PK_HORSE" Unique Scan
            -> Filter (preliminary)
                -> Filter
                    -> Table "HORSE" as "PEDIGREE HORSE" Access By ID
                        -> Bitmap
                            -> Index "PK_HORSE" Unique Scan

Also, plans for stored procedures will be shown in the trace if configuration contains the line log_procedure_compile = true.

1.9. Summary for part 1

As you can see, there are many improvements for optimizer in Firebird 5.0, and majority of them do not require change of the SQLs, they will be applied automatically immediately after migration, without efforts from the developers of end-user applications.

The version 5 of Firebird is not only focused on performance, but also very convenient in terms of migration: there are compatibility parameters to migrate directly from 2.5 and 3.0, and from 4.0 it even does not require backup/restore.

It is time to go to 5.0!

In the next part of our detailed overview we will review all changes and new features in SQLs of v5.0.