One of the major hurdles faced when converting Oracle’s PL/SQL code to any other database is to find the equivalent of START WITH and CONNECT BY PRIOR. This SQL in Oracle provides a very powerful way of writing hierarchical queries, something that comes in really handy when dealing with hierarchical data – this is fairly common in any enterprise.

Well, there is good news. There is an equivalent available in PostgreSQL … and it is not too hard!

Basically, in order to write the equivalent query in PostgreSQL, we use the WITH RECURSIVE syntax, essentially referring the query to its own output and hence recursively iterating through the result-set.

Let me explain with the help of a simple example.

Consider a simple table ‘places’ that lists out the name of a place under column ‘place’ and where it is located under the column ‘is_in’.

example=# \d places
   Table "public.places"
 Column | Type | Modifiers
--------+------+-----------
  place | text |
  is_in | text |

Consider the following data in this table:

example=# select * from places;
     place      |     is_in
----------------+----------------
 U.S.A          |
 California     | U.S.A
 Florida        | U.S.A
 Massachusetts  | U.S.A
 San Francisco  | California
 Los Angeles    | California
 San Diego      | California
 Miami          | Florida
 Orlando        | Florida
 Tampa          | Florida
 United Kingdom |
 London         | United Kingdom
 Leeds          | United Kingdom
(13 rows)

As you can see, the data is arranged in a hierarchy represented by the chart below:

An example resultset someone would want to retrieve from such data is ‘give me all the places that are in U.S.A.’. In Oracle, the query would like the following:

SELECT op.places, op.is_in
   FROM places op
  START WITH op.is_in = 'U.S.A'
CONNECT BY PRIOR op.place = op.is_in

The PostgreSQL equivalent would be:

WITH RECURSIVE q AS (SELECT place, is_in
                       FROM places
                      WHERE place = 'U.S.A'
                      UNION ALL
                     SELECT m.place, m.is_in
                       FROM places m
                       JOIN q ON q.place = m.is_in)
        SELECT place, is_in FROM q

This gives the following result:

place     |   is_in
---------------+------------
 U.S.A         |
 California    | U.S.A
 Florida       | U.S.A
 Massachusetts | U.S.A
 San Francisco | California
 Los Angeles   | California
 San Diego     | California
 Miami         | Florida
 Orlando       | Florida
 Tampa         | Florida
(10 rows)

Let’s break this query down to understand what’s going on.

The general structure of a WITH RECURSIVE query is:

a)    The non-recursive term

b)    UNION or UNION ALL, depending on whether or not you want to have duplicates

c)    Recursive term containing a reference to the queries own output

Query evaluation is done as follows (taken from PostgreSQL docs):

  1. Evaluate the non-recursive term. For UNION* **(but not *UNION ALL*), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.*
  2. So long as the working table is not empty, repeat these steps:
    1. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION* **(but not *UNION ALL*), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.*
    2. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

An important thing to ensure while writing hierarchical queries like this is the end state, i.e. making sure that eventually, the recursive portion of the query will not return any rows. Without this end state, the query will go into an infinite loop. To illustrate, let’s suppose in the above example, we have another rows in the ‘places’ table that says ‘California’ is in ‘San Francisco’.

example=# select * from places;
      place     |     is_in
----------------+----------------
 U.S.A          |
 California     | U.S.A
 Florida        | U.S.A
 Massachusetts  | U.S.A
 San Francisco  | California
 Los Angeles    | California
 San Diego      | California
 Miami          | Florida
 Orlando        | Florida
 Tampa          | Florida
 United Kingdom |
 London         | United Kingdom
 Leeds          | United Kingdom
 California     | San Francisco

As you can see, the data essentially says that ‘San Francisco’ is in ‘California’ and ‘California’ is in ‘San Francisco’. This data will throw our query above in an infinite loop. The simplest way to avoid such a situation is to use UNION instead of UNION ALL to join the recursive and non-recursive portions of the hierarchical query. With that change, the results are as follows:

example=# WITH RECURSIVE q AS (SELECT place, is_in
                                 FROM places
                                WHERE place = 'U.S.A'
                                UNION
                               SELECT m.place, m.is_in
                                 FROM places m
                                 JOIN q ON q.place = m.is_in)
                  SELECT place, is_in FROM q;
place     |     is_in
---------------+---------------
 U.S.A         |
 California    | U.S.A
 Florida       | U.S.A
 Massachusetts | U.S.A
 San Francisco | California
 Los Angeles   | California
 San Diego     | California
 Miami         | Florida
 Orlando       | Florida
 Tampa         | Florida
 California    | San Francisco
(11 rows)

I hope you find this blog helpful!