In a previous blog post we talked about the technical details behind the
new CassandraFDW that enables PostgreSQL to connect to a Cassandra
cluster. This provides PostgreSQL users the ability to easily read and
write data to a very popular open source multi-master database that can
support writes in multiple data centers. This means if you are storing
sensor data or large files in an easily scalable Cassandra cluster you
can join those NoSQL tables to your local PostgreSQL data as well!

Today my colleague Balakrishna Gatla will show you how to connect,
read data, write data, and join to a remote Cassandra table.

First – Configure Cassandra

If you don’t have a Cassandra cluster yet, you can use the BigSQL
package manager to install a local Cassandra instance for development
purposes.

Connect to Cassandra using CQLSH

$ cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.17 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh>

CREATE KEYSPACE

In Cassandra, a keyspace is a container for your application data. It is
similar to the schema in a relational database.

cqlsh> CREATE KEYSPACE postgres
   ... WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

Connect to KEYSPACE and CREATE TABLEs

cqlsh> USE postgres;

cqlsh:postgres> CREATE TABLE t1 (id int PRIMARY KEY, name text);
cqlsh:postgres> CREATE TABLE t2 (id int PRIMARY KEY, name text);

Insert some Data into the TABLEs

cqlsh:postgres> INSERT INTO t1 (id, name) VALUES (1, 'abc');
cqlsh:postgres> INSERT INTO t1 (id, name) VALUES (2, 'def');
cqlsh:postgres> INSERT INTO t1 (id, name) VALUES (3, 'ghi');

cqlsh:postgres> INSERT INTO t2 (id, name) VALUES (1, 'mno');
cqlsh:postgres> INSERT INTO t2 (id, name) VALUES (2, 'pqr');
cqlsh:postgres> INSERT INTO t2 (id, name) VALUES (3, 'stu');

Second – Set up Postgres Database

Install Cassandra_FDW in Postgres

It is very easy to install extensions using the BigSQL command-line
utility pgc:

(pgc) Install Cassandra_FDW

Go to the BigSQL directory and invoke the bigsql command-line tool to
install Cassandra FDW:

$ ./pgc install cassandra_fdw3-pg95
['cassandra_fdw3-pg95']
Get:1 http://s3.amazonaws.com/pgcentral cassandra_fdw3-pg95-3.0.0-1-linux64
 Unpacking cassandra_fdw3-pg95-3.0.0-1-linux64.tar.bz2

If you have not already, you can use pgc to also initialize a new
postgres cluster:

$ ./pgc init pg95

Connect via psql and create the extension in the Postgres Database:

postgres=# CREATE EXTENSION cassandra_fdw;

Like any FDW, you need to configure the remote server, the user
credentials to connect, and then the remote tables (think of as “views”
against the remote server). First, create the Foreign Server to provide
connectivity details for Cassandra:

postgres=# CREATE SERVER cass_serv FOREIGN DATA WRAPPER cassandra_fdw
             OPTIONS (host '127.0.0.1', port '9042');

Next, we create the user mapping for authenticating with Cassandra:

postgres=# CREATE USER MAPPING FOR public SERVER cass_serv
             OPTIONS (username 'test', password 'test');

We can now use the Import Foreign Schema feature to bring in a
definition for a specific remote table without specifying its DDL
manually.

postgres=# IMPORT FOREIGN SCHEMA postgres LIMIT TO ("t1")
             FROM SERVER cass_serv INTO public;

We can now query the remote table t1 in Postgres like any local table
(again, think a “view” pointing to remote server).

postgres=# SELECT * FROM public.t1;
 id | name
----+------
  1 | abc
  2 | def
  3 | ghi
(3 rows)

Third – Write Data to Foreign Table in Cassandra

Create Foreign Table in Postgres using explicit DDL allowing us to
specify the primary_key option which is required to be able to write
to a Foreign Table:

postgres=# CREATE FOREIGN TABLE t2 (id int, name text) SERVER cass_serv
             OPTIONS (schema_name 'postgres', table_name 't2', primary_key 'id');

Let’s run a simple query against the Foreign Table:

postgres=# SELECT * FROM public.t2;
 id | name
----+------
  1 | mno
  2 | pqr
  3 | stu
(3 rows)

Now let us try writing to the table:

postgres=# INSERT INTO public.t2 (id, name) VALUES (4, 'vwx');
INSERT 0 1
postgres=# INSERT INTO public.t2 (id, name) VALUES (5, 'asd');
INSERT 0 1
postgres=# SELECT * FROM public.t2;
 id | name
----+------
  5 | asd
  1 | mno
  2 | pqr
  4 | vwx
  3 | stu
(5 rows)

postgres=# UPDATE public.t2 SET name = 'aaa' WHERE id = 5;
UPDATE 1
postgres=# SELECT * FROM public.t2;
 id | name
----+------
  5 | aaa
  1 | mno
  2 | pqr
  4 | vwx
  3 | stu
(5 rows)

postgres=# DELETE FROM public.t2 WHERE id = 5;
DELETE 1
postgres=# SELECT * FROM public.t2;
 id | name
----+------
  1 | mno
  2 | pqr
  4 | vwx
  3 | stu
(4 rows)

Let us see the changes reflected from the Cassandra side:

cqlsh:postgres> SELECT * FROM t2;

 id | name
----+------
  1 |  mno
  2 |  pqr
  4 |  vwx
  3 |  stu

(4 rows)

Fourth – JOIN Postgres and Cassandra Tables

Finally, to make it more interesting, here’s an example of joining a
Postgres table to a remote Cassandra table:

Create a Postgres table:

postgres=# CREATE TABLE t3 (id int, city varchar(20));

Insert some data into it:

postgres=# INSERT INTO t3 VALUES (1, 'New York');
postgres=# INSERT INTO t3 VALUES (2, 'London');
postgres=# INSERT INTO t3 VALUES (3, 'Moscow');
postgres=# INSERT INTO t3 VALUES (4, 'New Delhi');
postgres=# INSERT INTO t3 VALUES (5, 'Dallas');
postgres=# SELECT * FROM t3;
 id |   city
----+-----------
  1 | New York
  2 | London
  3 | Moscow
  4 | New Delhi
  5 | Dallas
(5 rows)

Let us start with an equi-join between the Postgres and Cassandra
tables:

postgres=# SELECT a.id, a.name, b.city FROM t1 a JOIN t3 b ON (a.id = b.id);
 id | name |   city
----+------+----------
  1 | abc  | New York
  2 | def  | London
  3 | ghi  | Moscow
(3 rows)

Next, LEFT OUTER JOIN the Postgres table to the Cassandra table:

postgres=# SELECT a.id, a.name, b.city FROM t1 a LEFT OUTER JOIN t3 b ON (a.id = b.id);
 id | name |   city
----+------+----------
  1 | abc  | New York
  2 | def  | London
  3 | ghi  | Moscow
(3 rows)

Next, RIGHT OUTER JOIN the Postgres table to the Cassandra table:

postgres=# SELECT a.id, a.name, b.city FROM t1 a RIGHT OUTER JOIN t3 b ON (a.id = b.id);
 id | name |   city
----+------+-----------
  1 | abc  | New York
  2 | def  | London
  3 | ghi  | Moscow
    |      | New Delhi
    |      | Dallas
(5 rows)

And finally, let us FULL OUTER JOIN the Postgres table to the Cassandra table:

postgres=# SELECT a.id, a.name, b.city FROM t1 a FULL OUTER JOIN t3 b ON (a.id = b.id);
 id | name |   city
----+------+-----------
  1 | abc  | New York
  2 | def  | London
  3 | ghi  | Moscow
    |      | New Delhi
    |      | Dallas
(5 rows)

We hope this helps you get started with the Cassandra FDW. For more
information, please visit our project on BitBucket.