HDFS for PostgreSQL Backups

On several occasions, I’ve been talking with groups of PostgreSQL users and the question comes up, “If I use PostgreSQL, why would I want to use Hadoop?” There are many answers and the question is usually asked when people don’t really understand the details about Hadoop, but let’s just focus on a single use case. Backups.

For most larger databases, on line backups are used with point in time recovery. This lets administrators to backup, or more importantly restore, their databases quickly. The trade off for this, is that you’re making a physical copy of the database files so if your database is a terabyte, you’re backup will be a terabyte before you compress it. If you’re keeping weekly backups and you have a company policy to retain your backups for months, it’ll require a lot of storage. That’s where Hadoop comes in.

At the core of Hadoop is the Hadoop Distributed File System (HDFS), which isn’t a POSIX compliant file system, but it does have some pretty great properties. It’s designed to run on inexpensive hardware while still be fault tolerant. This means that you can go out and buy some inexpensive drives and put them in some older desktops you have lying around the office and you’ll have a highly redundant storage cluster. No need to buy an expensive SAN or NAS device or ship your data to a cloud service like Amazon S3.

Leveraging HDFS for your PostgreSQL backups is pretty straight forward. Assuming you have a Hadoop cluster already setup, you’ll just need to put the Hadoop client on your server. From your PostgreSQL server, first test that you can connect to the cluster and do a simple directory listing.

jim@jim-XPS:~$ hadoop dfs -ls hdfs://
Found 3 items
drwxr-xr-x - bigsql supergroup 0 2013-06-26 12:15 /user/bigsql
drwxr-xr-x - bigsql supergroup 0 2013-06-26 12:00 /user/hive
drwxr-xr-x - bigsql supergroup 0 2013-07-08 12:17 /user/postgres

If you run into any errors, you most likely need to change the fs.default.name property in core-site.xml to use the correct URL instead of localhost.

Once you have your connectivity configured correctly, you can leverage the cluster as a place to keep your backups. Create your base backup with the tool of your choice and once it’s done, just copy it to HDFS.

hadoop dfs -copyFromLocal basebackup_20130711.tar.gz hdfs://

You can even set you’re archive command to write out archive your WAL files directly to HDFS. Just be careful with this one if you’re switching log files pretty frequently. The command takes a bit longer than a simple rsync.

archive_command = 'hadoop dfs -copyFromLocal %p hdfs://'


  1. viagra_reviewsviagra_reviews06-22-2014


    • HaitamHaitam07-30-2014

      Thanks, I finally udrnnstaed how/why Pregel can work better on graphs than MapReduce.Just one more question; if the entire graph is too big to fit in memory and on each iteration only an small portion of graph is going to participate and change is it better to continue using MapReduce? http://ginpsshbvek.com [url=http://hjymxsypqt.com]hjymxsypqt[/url] [link=http://ilxelny.com]ilxelny[/link]

  2. generic_viagrageneric_viagra06-27-2014


  3. onlineonline06-28-2014


  4. viagraviagra06-30-2014


    • ChanduChandu07-30-2014

      Nice post, thank you.I have a question. For the last arecutchtire, can you please explain why MapReduce input data has two portions? Why we include model(old) every time? http://vbymvbnkylk.com [url=http://zvxywkdpoi.com]zvxywkdpoi[/url] [link=http://pezphvgpkbz.com]pezphvgpkbz[/link]

  5. viagra_dosageviagra_dosage07-01-2014


  6. cialiscialis07-02-2014


  7. cheap_viagracheap_viagra07-16-2014


    • VuelaVuela07-30-2014

      Thank you to share this information about in gabllloy. I am very happy to get this knowledge form your site because I have lot’s time invested on the internet but not reached to target information. That Data is a collection of facts, such as values or measurements.

  8. buy_cialisbuy_cialis07-17-2014


  9. PanosPanos07-30-2014

    Would you like to have some of your blog’s content repbelishud on DZone.com? We’re trying to expand our readership of advanced developers who are interested in Big Data, data modeling, etc. Let me know if you’re interested. http://pfmevbepxx.com [url=http://wsxunn.com]wsxunn[/url] [link=http://pbzxjna.com]pbzxjna[/link]

  1. check it out09-18-13
  2. valvesoftware09-18-13
  3. Branson Missouri Private Investigator09-18-13
  4. eco technologia09-19-13
  5. http://www.tex4tex.eu09-20-13
  6. restaurant voucher09-21-13
  7. medication for herpes outbreak09-21-13
  8. hemoroizi tratament09-21-13
  9. nicki minaj hot09-21-13
  10. diet pills that really work09-22-13
  11. www.clasimedica.com09-22-13
  12. http://www.youtube.com/watch?v=gkjAd-goBwY/09-22-13
  13. come fare soldi con internet yahoo09-22-13
  14. how to make money from pictures09-22-13
  15. singapore pools 4d09-23-13
  16. http://e-flesz.com09-23-13
  17. Search Rex09-23-13
  18. more information09-23-13
  19. Craigslist Search09-24-13
  20. http://09-24-13
  21. Diabetic Shoes Charleston09-24-13
  22. online shops09-24-13
  23. Search Rex Craigslist Search09-24-13
  24. Electrician Durham09-24-13
  25. hyip news online09-24-13
  26. payday-loans-online09-25-13
  27. great business ideas09-25-13
  28. check here09-25-13
  29. Carpet Pet Urine Removal09-25-13
  30. devenir rentier09-26-13
  31. lavorare da casa con internet+sondaggi09-26-13
  32. tree service atlanta09-26-13
  33. marietta movers atlanta ga09-26-13
  34. water damage atlanta09-27-13
  35. floor decor atlanta09-27-13
  36. raspberry ketone reviews09-27-13
  37. casino bonus senza deposito09-27-13
  38. water damage atlanta ga09-27-13
  39. raspberry ketone review09-27-13
  40. book of ra online elv09-28-13
  41. garcinia extract cambogia09-28-13
  42. smokeless cigarette09-28-13
  43. free e cigarette starter kit09-28-13
  44. water ionizer09-29-13
  45. master coach09-29-13
  46. tabela fipe carros fox 201109-30-13
  47. saffron extract review09-30-13
  48. covert viral bonus09-30-13
  49. vanessa hudgens hot09-30-13
  50. cacodemonomania albumenize abstemiously10-02-13
  51. see the site10-02-13
  52. collage prints on acrylic10-03-13
  53. go now10-03-13
  54. most affordable car insurance10-05-13
  55. http://www.generalestate.eu10-05-13
  56. important source10-07-13
  57. get redirected here10-07-13
  58. shoemoney facebook ads video10-07-13
  59. blu-cigs10-08-13
  60. gioco slot gratis 3D10-08-13
  61. meratol10-09-13
  62. sites de rencontre gratuit en belges10-10-13
  63. how to prevent premature ejaculation10-12-13
  64. website10-13-13

Leave a Reply

Please type the characters of this captcha image in the input box

Please write the answer to the math question of this captcha image in the input box

/* ]]> */