The Power of Enums

Recently, I was helping out a company that handles financial data so the quality of there data was vital to their business. With that, they dutifully used foreign keys on most tables to ensure that all data entered into PostgreSQL was as expected. They also had a fondness for natural keys. Their reasoning, which makes perfect sense, is that it makes the data more readable. I’d much rather see a row of data that has a state code with a value of NJ instead of the number 30, especially when you’re debugging your application. So, I went ahead and did a little benchmarking to see how big of an impact the foreign keys imposed.

I constructed a simple account table that referenced 4 small tables that contained pretty static data. The largest table was the county table with only 249 rows.

CREATE TABLE account (
 account_id bigserial PRIMARY KEY,
 account_type character(3) REFERENCES account_type,
 account_sub_type character(4) REFERENCES account_sub_type,
 account_currency character(3) REFERENCES currency,
 account_country character(2) REFERENCES country,
 account_number varchar NOT NULL
);

The first thing I did was to set a baseline of loading data with no foreign key constraints. I created a script that would insert 100,000 rows of data a single insert at a time. This would simulate how the table would be used by the application instead of just bulk loading the data. I then reran the same load, but with the foreign keys in place and I was pretty surprised to see how big the impact was.

Baseline: 19.1 seconds
With foreign keys: 42.2 seconds

I figured that the foreign keys would have some effect, but not a 121% decrease in performance. The was a much larger impact than I was expecting so I went ahead and tried the same test, but gradually adding the number of foreign keys.

Baseline: 19.1 seconds
1 foreign key: 26.2 seconds
2 foreign keys: 29.7 seconds
3 foreign keys: 36.4 seconds
4 foreign keys: 42.2 seconds

So for even a single foreign key constraint to the account_type table which has only 5 rows, there was a 39% decrease in performance. Some of the tables are under pretty high load and the impact of the foreign keys would end up being very significant, but the integrity of the data was of utmost importance. All of the tables that are being referred to contain pretty much static data and if anything changes, the new codes would just be added so I figured I would try replacing all of the codes with enums. PostgreSQL will throw an error if you try to insert a value that does not exist in the enum so it has the same effect of a foreign key check on the codes.

So now rerunning the test with our redefined table with the 4 enums

CREATE TABLE account (
 account_id bigserial PRIMARY KEY,
 account_type acct_type,
 account_sub_type sub_type,
 account_currency currency_code,
 account_country country_code,
 account_number varchar NOT NULL
);

I saw something else I didn’t expect:

Baseline: 19.1 seconds
With Enums: 17.5 seconds

There was a 8% increase in performance. I was expecting the test with the enums to be close to the baseline, but I wasn’t expecting it to be faster. Thinking about it, it makes sense. Enums values are just numbers so we’re effectively using surrogate keys under the covers, but the users would still the the enum labels when they are looking at the data. It ended up being a no brainer to use enums for these static tables. There was a increase in performance while still maintaining the integrity of the data.

  1. KeithKeith09-26-2013

    Enums seem nice at first, but if you ever need to change the values in the enum you’ll come across their shortcommings. Adding a value to an enum isn’t too bad, but when it comes time to remove a value, you’re in for a challenge. You can try to make the assumption that the data is static and you won’t need to do that. But in the long run that’s rarely the case.I’ve gone with check constraints instead of enums myself where a foreign key didn’t work well. Might want to test those out as well.

    • Jim MlodgenskiJim Mlodgenski09-26-2013

      You wouldn’t want to use this for data that has a chance where you would want to go change a value of a code, but in this case we will never want to remove a value. Consider a currency code of USD. Now if the US went and changed its currency code to USC for some reason, all of the historical records would still need to refer to the old currency code of USD so the old value would still need to remain even if it is inactive. For the currency code table, values will always be added, never changed or removed.

      Check constraints could work as well, but the downside is that you need to maintain that constraint on all tables that refer to the codes. With an enum, you create it once and where ever you need to use it, you just use the type.

  2. VadimVadim09-26-2013

    have u created indexes on foreign keys?

    • Jim MlodgenskiJim Mlodgenski09-26-2013

      Yes, indexes were created since its the primary key on the look up tables, but for very small tables that all fit into a single page, PostgreSQL will never use that index to look up the row.

  3. DhananjayDhananjay09-26-2013

    Great Jim, For me its came down from 3215.989 ms to 1054.348 ms. But the only problem is; its bit inconvenient to maintain if data is not 100% static.

  4. Andrew DunstanAndrew Dunstan09-26-2013

    enums beat the baseline case. This is NOT surprising, since they are just oid values. The cases where you actually want to remove a value from the enum set can be painful, but there are many cases where that doesn\\’t happen. It can be done using a cascading delete if you use the lookup table method, but that will be just as painful anyway. Changing enum labels isn\\’t hard. And in fact it\\’s WAY faster than where you store the label directly as in Jim\\’s baseline case, where you would again need to check and possibly update every row.

  5. Andrew DunstanAndrew Dunstan09-26-2013

    You should use enums for cases where you might want to add to the enum set but not delete from it, and where you don\’t want to be adding to it in high volume. Jim\’s use cases above are pretty good poster children for enum use.

  6. DennisDennis09-26-2013

    What about something like (untested):

    CREATE DOMAIN country_code AS CHAR(2)
    CHECK(
    VALUE IN (‘aa’, ‘ab’, ‘ac’, …. , ‘zx’, ‘zy’, ‘zz’)
    );

    Then “the downside is that you need to maintain that constraint on all tables that refer to the codes” is not an issue, simply use a column of type “country_code” wherever needed. A check constraint is easy to change or extend, and this also makes it easy if later on you need to change the country code to 3 letters, or the currency code to 4.

    I would be interested to know how this performs relative to the other methods.

  7. Jim MlodgenskiJim Mlodgenski09-26-2013

    So I went ahead and reran the test, but this time with a a domain enforcing a CHECK constraint, and it ended up with even worse performance than the foreign key checks. I created the domains containing all of the possible values, so the country code looks like:
    CREATE DOMAIN country_code_domain AS VARCHAR
    CHECK(VALUE IN (‘TV’, ‘SE’, ‘AI’, … ‘EU’));

    The table then became:
    CREATE TABLE account (
    account_id bigserial PRIMARY KEY,
    account_type acct_type_domain,
    account_sub_type sub_type_domain,
    account_currency currency_code_domain,
    account_country country_code_domain,
    account_number varchar NOT NULL
    );

    I also reran the baseline to make sure nothing else changed and I got:
    Baseline: 19.4 seconds
    Domains: 115.3 seconds

    Nearly 2 minutes for the check constraints.

    Can someone confirm similar behavior? That’s much slower than I was expecting.

  8. DhananjayDhananjay09-26-2013

    Tested for one column only and here is my findings with timing in milliseconds. For me domain with char(X) is better.

    Using FK – 3079.945 ms

    Using Domain with Varchar – 917.135 ms

    CREATE DOMAIN ord_type AS VARCHAR
    CHECK( VALUE IN(‘AAA’, ‘ABA’, ‘ACA’,’ADA’, ‘AEA’, ‘AFA’));

    Enum – 529.935 ms

    CREATE TYPE emp_type AS ENUM (‘AAA’, ‘ABA’, ‘ACA’,’ADA’, ‘AEA’, ‘AFA’);

    Using Domain with char(3)- 0.465 ms

    CREATE DOMAIN ord_type AS CHAR(3)
    CHECK( VALUE IN(‘AAA’, ‘ABA’, ‘ACA’,’ADA’, ‘AEA’, ‘AFA’));

    • Jim MlodgenskiJim Mlodgenski09-27-2013

      Digging a little more, its looks like the performance hit to a CHECK constraint is related to the number of values contained in the IN clause. The timings of just a single CHECK constraint on country_code which has 249 values is much longer than just account_type which has 5 values.

  9. VadimVadim09-27-2013

    enums in depth are the same linked tables with keys – but they hidden from the end users

  10. MishaMisha09-27-2013

    what about to try something around DEFERRABLE options on fk ?

Leave a Reply

Please type the characters of this captcha image in the input box

Please write the answer to the math question of this captcha image in the input box

/* ]]> */