“Why migrate to Postgres?” is a question that we used to hear a lot in the past. There used to be a number of different reasons to migrate including, take advantage of new technology, a change of staff skills, vendor consolidation caused friction and of course, cost. Nowadays, the main reason we hear for migrating [...]
We talk to a lot of clients who want to perform database migrations but just about all of them are skeptical that the project will be a success. There are a number of reasons for this, including: Fear Lack of time to take on project Belief that their usage isn’t easy convertible to a new [...]
I've been kicking around the idea of founding a Columbus-based PostgreSQL User Group for a while now. I even went so far as to float the idea to people at OLF in '14. After much hemming and hawing (and no one else stepping up in the interim), I've finally gone and done it.
pgCMH is the name of my newly formed group, and we're good to go. We've got our own Twitter (@pgCMH):
as well as our own MeetUp page. We've got a sponsor providing food, and another providing the meeting location. Our first meeting will be in Jan, thanks to all the scheduling conflicts the upcoming holidays create.
Watch this space for updates, follow our Twitter, and join the mailing list on MeetUp. I'd love to get your participation and input. Let's make this group as wildly successful as we can!
You're probably already running pgBadger to monitor your PostgreSQL logs. However, you're probably not running it incrementally throughout the day. Most likely, you've setup a
cron.daily job that runs pgBadger against yesterday's log(s). And that's great. Except when you get the dreaded "what just happened on the db?" email. Are you going to wait until tonight's normal run of pgBadger to see what happened? Are you going to run a 'one off' pgBadger against today's logfile and wait for it to process the entire log? Or are you going to copy the log off somewhere, edit it to cut it down, and then run pgBadger against this cut-down version (hoping you left enough in the log to see proper trending)?
No, most likely you're going to look at your actual monitoring tool that does real-time monitoring of your db and try to figure things out from there. You are running some kind of db monitoring tool, right?
However, let's say that for, uh, reasons, you only have pgBadger at your disposal right this instant. Well, if you were making use of pgBadger's incremental mode you could simply fire off the next scheduled run and it would only process those log entries that were new since the last run. So, for example, if you had a
cron.hourly run of pgBadger it would only process the last hour's worth of entries to update today's report. No waiting to process multiple hours of info that you don't need, no editing of the logfile to remove things outside the window you care about, just run it and done.
Sounds nice, right? So let's set this up shall we? I'm assuming you've already setup
postgresql.conf appropriately, but if you haven't please go that first. The pgBadger website has good documentation on how to do so. According to the docs:
is how we turn on incremental mode. You'll note that we also need to specify an output dir:
I usually stick the pgBadger output into the
pg_log directory. In my mind, having the logs and the report on the logs next to each makes sense, but feel free to stick yours wherever.
Finally, we probably don't need pgBadger reports that are too old, and the docs say we can cull the cruft automatically:
(Ignore the typo, it's that way in the code)
On my servers, I have PostgreSQL setup to log into a different file for each day of the week, with automatic rotation and truncation:
cron.hourly pgBadger looks like:
which as you can see always feeds both yesterday's and today's log into pgBadger (since the cron runs at 2300 and then again at 0000, we need yesterday's log to catch that last hour). Since we're running in incremental mode, it knows at every run where it left off in the files the last time and does a
seek to skip over that data. This cuts the run time down significantly even with the PostgreSQL logging cranked up. You can see it here:
As you can see, it jumps right in at 95% of the file and only processes the newest 5%. In fact, this takes a mere 20 seconds:
on my overloaded Macbook!
So there you have it. Not counting the time it takes you to
ssh to your server, it would have taken all of 20 seconds to have an updated report of what just happened on your database!
Keep in mind, this is also with a single thread. pgBadger has the ability to run multi-threaded. See the
--help for details.
Recently we had a customer request to build a custom extension against Postgres by BigSQL distribution. Even though BigSQL ships with a large set of commonly used extensions and good collection of FDWs, these kind of user build requirements always crop up, based on how powerful the Postgres extension model is. BigSQL makes it easy [...]
Someone at work thought it would be a good idea to give me access to the corporate blog so that I might post PostgreSQL-related things there and have them syndicted to Planet PostgreSQL. So my PostgreSQL ramblings will show up there now instead of here...
From the PostgreSQL docs:
Tablespaces in PostgreSQL allow database administrators to define locations in the file system where the files representing database objects can be stored. Once created, a tablespace can be referred to by name when creating database objects.
By using tablespaces, an administrator can control the disk layout of a PostgreSQL installation. This is useful in at least two ways. First, if the partition or volume on which the cluster was initialized runs out of space and cannot be extended, a tablespace can be created on a different partition and used until the system can be reconfigured.
Second, tablespaces allow an administrator to use knowledge of the usage pattern of database objects to optimize performance. For example, an index which is very heavily used can be placed on a very fast, highly available disk, such as an expensive solid state device. At the same time a table storing archived data which is rarely used or not performance critical could be stored on a less expensive, slower disk system.
As you can see, while not as powerful as tablespaces in, say, Oracle, they do still have their uses in PostgreSQL. You can use them to make use of different filesystems, or different mount options, or different disk types and, in doing so, intelligently apply performance characteristics to subsets of your data. For example, you could put your highest volume tables in a tablespace that is mounted from SSDs while the rest of your db is mounted from spinning rust.
Sounds decent, right? Now you before you go off and be "clever" and create an SSD-backed mountpoint for your new tablespace, understand that there are places you should not create the tablespace. You shouldn't create tablespaces on any kind of ephemeral storage, for example on a
tmpfs or a
ramfs or similar. You also should not create your new tablespaces under $PGDATA. Yes, I'm aware there is
$PGDATA/pg_tblspc but that directory is not for you. The system will auto-populate that directory with pointers to the real location of your tablespaces!
So what happens when you create a tablespace inside $PGDATA? Let's find out. First, we'll create the directory for the tablespace:
bash doug.hunley ~ $ mkdir $PGDATA/tablespaces doug.hunley ~ $ cd $PGDATA/tablespaces doug.hunley ~/pgdata/tablespaces $ pwd /Users/doug.hunley/pgdata/tablespaces
And we see that nothing bad has happened yet. So, let's pop over into
psql and actually create the tablespace:
sql (doug.hunley@[local]:5432/doug.hunley) # CREATE TABLESPACE ts1 LOCATION '/Users/doug.hunley/pgdata/tablespaces'; WARNING: 42P17: tablespace location should not be inside the data directory LOCATION: CreateTableSpace, tablespace.c:295 CREATE TABLESPACE Time: 7.797 ms (doug.hunley@[local]:5432/doug.hunley) #
We get a warning (not an error, for some reason) but it works and all appears fine. Now you can spend minutes/days/months/years using your new tablespace and never notice that you've got a problem. So where does the problem come in?
Let's try to make a backup of our cluster:
bash doug.hunley ~ $ pg_basebackup -D pgdata2 -Fp -R -Xs -c fast -l 'clone for slave' -P -v transaction log start point: 2/17000028 on timeline 1 pg_basebackup: directory "/Users/doug.hunley/pgdata/tablespaces" exists but is not empty doug.hunley ~ $
There it is.
When creating the backup, it tries to ensure the tablespace location is the same, but then it won't write to a non-empty directory. My example is two different $PGDATA locations on the same box, but the issue is the same when using different machines because
pg_basebackup backs up everything in $PGDATA which means your tablespace directory gets cloned before it gets to the actual cloning of the data in the tablespace so you end up with "stuff" in the dir, making it non-empty. Which gives you the same error and output.
OK, so it breaks backups. I can work around that by using another backup method. What else?
How about using
pg_upgrade to do an upgrade? No matter if you run in
link mode or not,
pg_upgrade will not move your tablespace location. So you may have
~/pgdata96 after the upgrade, but your tablespaces are still in
~/pgdata95/tablespaces. So, as per the docs:
Once you are satisfied with the upgrade, you can delete the old cluster's data directories by running the script mentioned when pg_upgrade completes.
And boom you've just deleted your tablespaces off disk. Congratulations!
So there you have it. Two very good reasons to not create tablespaces inside $PGDATA. Please, don't do this. Everyone who admins that cluster going forward will thank you.
It is great to confirm that the latest version of Postgres cleanly builds on the latest Ubuntu without any errors or warnings. Hmmm, I wonder now if GGC 5 makes anything measurably run faster?? Pre-Requisites: [code] $ sudo apt-get install build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev libxslt-dev libssl-dev [/code] With this complete, you can [...]
Data centers are no longer dominated by a single DBMS. Many companies have heterogeneous environments and may want their Postgres database to talk to other database systems. Foreign Data Wrappers can be the right solution for many scenarios. The BigSQL Project provides a well tested, ready to use MySQL FDW with Postgres. This makes life easy for [...]
Affan Salman joins the team East Brunswick, NJ, March 1, 2016 OpenSCG, a leading provider of subscriptions and services for PostgreSQL, announced today that Affan Salman has joined its team of top-tier PostgreSQL talent as Senior Database Architect. Affan is the primary original author of EnterpriseDB's Oracle compatibility from 10 years ago and has spent [...]