CONNECT BY in Oracle

In this post, we’re going to discuss a very useful SQL extension to work with hierarchal queries. This is sadly only implemented by Oracle and although a few other databases support it as well, it won’t work in PostgreSQL, MySql nor SQL Server.

The dataset Link to heading

Let’s create a very simple hierarchal dataset to work with.

CREATE TABLE GeoArea
(
    parentName  VARCHAR2(100),
    code        VARCHAR2(100)
);

INSERT INTO GeoArea VALUES (NULL, 'World');
INSERT INTO GeoArea VALUES ('World', 'Europe');
INSERT INTO GeoArea VALUES ('Europe', 'Germany');
INSERT INTO GeoArea VALUES ('Germany', 'Berlin');
INSERT INTO GeoArea VALUES ('Germany', 'Stuttgart');
INSERT INTO GeoArea VALUES ('Germany', 'Hamburg');
INSERT INTO GeoArea VALUES ('Europe', 'Italy');
INSERT INTO GeoArea VALUES ('Italy', 'Rome');
INSERT INTO GeoArea VALUES ('Italy', 'Turin');
INSERT INTO GeoArea VALUES ('Italy', 'Milan');
INSERT INTO GeoArea VALUES ('World', 'Asia');
INSERT INTO GeoArea VALUES ('Asia', 'China');
INSERT INTO GeoArea VALUES ('China', 'Beijing');

CONNECT BY Link to heading

Let’s now see an example of a query that uses the tree structure:

SELECT code, LEVEL
  FROM GeoArea
 CONNECT BY PRIOR code = parentName
 START WITH parentName IS NULL;

The result set is:

World       1
Asia        2
China       3
Beijing     4
Europe      2
Germany     3
Berlin      4
Hamburg     4
Stuttgart   4
Italy       3
Milan       4
Rome        4
Turin       4

The CONNECT BY clause is used to express the relationship between a node and the previous one (either the parent or the child); in this case we are matching the parentName of the current record with the code of the PRIOR record. This means that we are going down in the tree; the PRIOR record represents the parent and the current record the child.

The START WITH clause can be used to filter the starting nodes of the tree. Note that this is quite different from putting the filter in the WHERE clause: the WHERE is applied first and is used to filter the complete dataset. Afterwards the START WITH expression is used to select the starting nodes of the trees, that will be built from the records filtered by the WHERE expression. Thus the records that match the WHERE expression should be a complete subtree of the initial table.

In this query we have used the special column LEVEL. This represent the 1-based depth of the record in the tree.

CONNECT_BY_ROOT and SYS_CONNECT_BY_PATH Link to heading

Let’s another sample query:

SELECT code, CONNECT_BY_ROOT code, SYS_CONNECT_BY_PATH(code, '/')
  FROM GeoArea
 CONNECT BY PRIOR code = parentName
 START WITH parentName IS NULL
 ORDER BY LEVEL;

This query returns:

World       World   /World
Europe      World   /World/Europe
Asia        World   /World/Asia
Germany     World   /World/Europe/Germany
Italy       World   /World/Europe/Italy
China       World   /World/Asia/China
Rome        World   /World/Europe/Italy/Rome
Milan       World   /World/Europe/Italy/Milan
Stuttgart   World   /World/Europe/Germany/Stuttgart
Hamburg     World   /World/Europe/Germany/Hamburg
Turin       World   /World/Europe/Italy/Turin
Beijing     World   /World/Asia/China/Beijing
Berlin      World   /World/Europe/Germany/Berlin

The CONNECT_BY_ROOT expression returns the value of the following expression in the root node of the subtree containing the current record.

The SYS_CONNECT_BY_PATH is used to get the full path of the given node, starting from the root, and using the given string to join nodes. In this case we are building the path given by the code of the records, joining them with a slash.

It shouldn’t be a surprise that we can use LEVEL in the ORDER BY clause, as shown.

ORDER SIBLINGS BY Link to heading

It’s often useful to get a tree sorted so that we get the nodes in order relative to their siblings. For instance, in this case we might want to get all the nodes under Italy together, sorted by code. This can be achieved with the useful ORDER SIBLINGS BY clause:

SELECT code, SYS_CONNECT_BY_PATH(code, '/')
  FROM GeoArea
 CONNECT BY PRIOR code = parentName
 START WITH parentName IS NULL
 ORDER SIBLINGS BY code;

which returns:

World       /World
Asia        /World/Asia
China       /World/Asia/China
Beijing     /World/Asia/China/Beijing
Europe      /World/Europe
Germany     /World/Europe/Germany
Berlin      /World/Europe/Germany/Berlin
Hamburg     /World/Europe/Germany/Hamburg
Stuttgart   /World/Europe/Germany/Stuttgart
Italy       /World/Europe/Italy
Milan       /World/Europe/Italy/Milan
Rome        /World/Europe/Italy/Rome
Turin       /World/Europe/Italy/Turin

A bottom-up example Link to heading

For completeness, let’s see an example of a CONNECT BY query that starts from the leaves and goes up in the tree:

CREATE TABLE Employees
(
    name        VARCHAR2(100),
    manager     VARCHAR2(100)
);

INSERT INTO Employees VALUES ('Mark', 'Alex');
INSERT INTO Employees VALUES ('Alex', 'Eric');
INSERT INTO Employees VALUES ('Tony', 'Alex');
INSERT INTO Employees VALUES ('Eric', NULL);

SELECT name, LEVEL
  FROM Employees
  CONNECT BY PRIOR manager = name
  START WITH name = 'Mark';

The SELECT query starts with Mark and returns all the employees that are its manager, recursively.

Mark    1
Alex    2
Eric    3

Conclusions Link to heading

The CONNECT BY construct, while Oracle-specific, can be an excellent way to simplify complex queries. Furthermore, it tends to have excellent performances (similarly to window functions or GROUP BY CUBE), given that it is a first-class tool.