You say Optimizer, I say Planner…

But, oh, if we call the whole thing off
Then we must part
And, oh, if we ever part, then that might break my heart…

Are you an Oracle DBA making the transition to PostgreSQL? Are you longing for the good old days when you could override decisions made by the Optimizer?

Or are you a PostgreSQL DBA wondering, “Wait a minute, you can do that?”

Uhhh.. Optimizer, Planner, what?

What is referred to as the Optimizer in Oracle is called the Planner in the land of PostgreSQL. Regardless of the terminology, they do the same thing: modify a SQL statement to increase speed in the form of an execution plan.

Like Oracle, PostgreSQL uses a Cost Based Optimizer (CBO) generating many different execution plans, applying a cost model to all of them and then selecting the one with the best cost value for execution.

Thou shalt not… (hint, hint)

So, what if you don’t always want to follow the exact execution plan generated by your PostgreSQL Planner?

Oracle created the concept of Hints to give DBAs the power to embed a directive into an SQL statement and override the Optimizer’s plan in situations where a DBA might:

  • have knowledge about the database that the Optimizer doesn’t know about;
  • want to test different scenarios (that the Optimizer won’t consider) to benchmark outcomes.

A better name would probably be “commandment” since you really aren’t giving the Optimizer a polite suggestion, but rather an absolute decree.

The PostgreSQL community does not support this functionality in PGDG out of the box, and for good reasons. So, some good community members have stepped in and developed the pg_hint_plan extension to fill the need. And now it is available with the BigSQL Linux distribution!

Warning: Proceed with Caution!

The PGDG communities fear and loathing toward supporting hint functionality is warranted. Using hints comes with a cost (and risk). Here are some of the drawbacks listed in the PostgreSQL wiki regarding the pitfalls:

  • Poor application code maintainability: hints in queries require massive refactoring.
  • Interference with upgrades: today’s helpful hints become anti-performance after an upgrade.
  • Encouraging bad DBA habits slap a hint on instead of figuring out the real issue.
  • Does not scale with data size: the hint that’s right when a table is small is likely to be wrong when it gets larger.
  • Failure to actually improve query performance: most of the time, the optimizer is actually right.
  • Interfering with improving the query planner: people who use hints seldom report the query problem to the project.

How it works:

Read the full documentation (updated from the original source by BigSQL).

Installing pg_hint_plan with BigSQL on Linux

First, intall the BigSQL PostgreSQL distribution (pg96 in this example):

python -c "$(curl -fsSL http://s3.amazonaws.com/pgcentral/install.py)"
./pgc update
./pgc install pg96
./pgc start

Next, install the pg/_hint/_plan extension:

./pgc install hintplan-pg96

Modify postgresql.conf

cd /bigsql/data/pg96
vi postgresql.conf

Add this line at the end of the file:

#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here

shared_preload_libraries='pg_hint_plan'  

Restart PostgreSQL:

cd <location of bigsql>
./pgc restart pg96

Exercise: Create a dummy table and

This example comes from: DBMS Tuning & My Life

First, open psql and create the test data with the following SQL commands:

create table t1 (c1 int, c2 int, c3 int, dummy char(100));
create index t1_idx1 on t1 (c1, c2, c3);
create index t1_idx2 on t1 (c2, c3);
create index t1_idx3 on t1 (c3);
create index t1_idx4 on t1 (c1); 

create table t2 (c1 int, c2 int, c3 int, dummy char(100));
create index t2_idx1 on t2 (c1, c2, c3);
create index t2_idx2 on t2 (c2, c3);
create index t2_idx3 on t2 (c3);

create table t3 (c1 int, c2 int, c3 int, dummy char(100));
create index t3_idx1 on t3 (c1, c2, c3);
create index t3_idx2 on t3 (c2, c3);
create index t3_idx3 on t3 (c3); 

insert into t1 select 1, mod(c1,100), mod(c1,1000), 'dummy' from generate_series(1,100000) c1;
insert into t2 select 1, mod(c1,100), mod(c1,1000), 'dummy' from generate_series(1,10000) c1;
insert into t3 select 1, mod(c1,100), mod(c1,1000), 'dummy' from generate_series(1,100) c1;

Review data you just created:

\dt
        List of relations
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | t1   | table | postgres
 public | t2   | table | postgres
 public | t3   | table | postgres
(3 rows)

select * from t1;
 c1 | c2 | c3  | dummy  
----+----+-----+------
  1 |  1 |   1 | dummy

  1 |  2 |   2 | dummy

  1 |  3 |   3 | dummy

  1 |  4 |   4 | dummy

  etc...

Test of IndexScan and SeqScan hints

Open psql and run this query with the following IndexScan hint:

/*+ IndexScan(t1 t1_idx1) */
explain analyze
select *
from   t1
where  c1=1
and    c2=10
and    c3=100;

You should get the something like:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
     Index Scan using t1_idx1 on t1 (cost=0.42..8.44 rows=1 width=113) (actual time =0.031..0.031 rows=0 loops=1)
       Index Cond: ((c1 = 1) AND (c2 = 10) AND (c3 = 100))
     Planning time: 0.818 ms
     Execution time: 0.116 ms
    (4 rows)

Next, run the query with the following SeqScan hint:

/*+ SeqScan(t1) */
explain analyze
select *
from   t1
where  c1=1
and    c2=10
and    c3=100;

You should get something similar to the following feedback:

QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..3569.00 rows=1 width=113) (actual time=13.496..13.4 96 rows=0 loops=1)
   Filter: ((c1 = 1) AND (c2 = 10) AND (c3 = 100))
   Rows Removed by Filter: 100000
 Planning time: 0.140 ms
 Execution time: 13.516 ms
(5 rows)

Test of Leading, NestLoop, HashJoin hints

/*+ Leading(c b a) NestLoop(c b) HashJoin(c b a) */
explain analyze
select a.*, b.*, c.*
from   t1 a, t2 b, t3 c
where  a.c1=b.c1
and    a.c2=b.c2
and    a.c3=b.c3
and    b.c1=c.c1
and    b.c2=c.c2
and    b.c3=c.c3;

You should get something similar to the following feedback:

QUERY PLAN
--------------------------------------------------------------------------------------------------
Hash Join  (cost=506.93..4451.02 rows=10 width=339) (actual time=2.463..63.442 rows=100000 loops=1)
   Hash Cond: ((a.c1 = b.c1) AND (a.c2 = b.c2) AND (a.c3 = b.c3))
   ->  Seq Scan on t1 a  (cost=0.00..2819.00 rows=100000 width=113) (actual time=0.011..13.456 rows=100000 loops=1)
   ->  Hash  (cost=506.75..506.75 rows=10 width=226) (actual time=2.433..2.433 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 263kB
         ->  Nested Loop  (cost=0.29..506.75 rows=10 width=226) (actual time=0.025..1.816 rows=1000 loops=1)
               ->  Seq Scan on t3 c  (cost=0.00..3.00 rows=100 width=113) (actual time=0.006..0.035 rows=100 loops=1)
               ->  Index Scan using t2_idx2 on t2 b  (cost=0.29..5.03 rows=1 width=113) (actual time=0.005..0.014 rows=10 loops=100)
                     Index Cond: ((c2 = c.c2) AND (c3 = c.c3))
                     Filter: (c.c1 = c1)
 Planning time: 0.849 ms
 Execution time: 70.921 ms
(12 rows)

Now that you know how to control the Optimizer’s outcomes, just remember what Spiderman says:

With great power comes great responsibility.