How to Create a PostgreSQL Database Snapshot
In this article we discuss the PostgreSQL pg_dump command and important things to know when creating a database snapshot
A point-in-time database backup, also known as a database snapshot, is a common database development practice for backing up and transferring databases. PostgreSQL includes a robust set of utilities for managing this task. The PostgreSQL snapshot and restore utilities come with a plethora of command-line options, giving you a great deal of control over what data is included and how the backup is performed. For most use cases, there is a standard set of options that can be specified to quickly generate a flexible snapshot.
This article is intended to present these options as well as a summary of important things to know when creating a PostgreSQL database snapshot.
The pg_dump Utility
The command-line utility used to create the PostgreSQL database snapshot is named
pg_dump. Some important things to know about
pg_dumpmakes consistent database snapshots, even if the database is being used concurrently. It does not block other users from reading or writing to the database.
SELECTstatements to create the database snapshot. At a minimum, you'll need
SELECTprivileges on the database you wish to snapshot.
- Generating a snapshot in plain-text format can be restored using the
psqlutility. Snapshots created in archive format are restored with
TL;DR - The pg_dump command with recommended options
For those readers that want to get right to the answer, here is the
pg_dump command with recommended options.
pg_dump -d "<url>" -j 8 -Fd -f <snapshot_directory_name>
Description of options:
-d "<url>" Specifies the source database in URL format. This is a convenient way to specify all database connection information in a single parameter. Example:
-d "postgres://user:pass@host:port/database?sslmode=[require | disable]"
-j 8 Indicates that we want to use parallelization to speed up the generation of the database snapshot. In this case, we are taking snapshots of eight (8) tables at the same time. This necessitates opening 8 + 1 database connections, so adjust the the server's
max connections setting accordingly.
-Fd Selects directory format as the snapshot operation's output. The database snapshot will be saved in this directory, with one file for each table included. This option supports parallel snapshots (as specified by the -j option) and adds flexibility and selectivity when restoring a snapshot using the
<snapshot_directory_name> Specifies the name of the target directory where the snapshot output will be saved. The
pg_dump utility will create this directory and it must not already exist.
The compatibility matrix below will help you determine whether different versions of
pg_dump, the database snapshot, and the PostgreSQL server will work together. The compatibility matrix answers the following questions:
- Will a version of
pg_dumpallow me to create a snapshot from a given version of PostgreSQL server?
- Will I be able to load a database snapshot created by a given version of
pg_dumponto a given version of PostgreSQL server?
|...is older than PostgreSQL Server||...is newer than PostgreSQL Server|
|snapshot version||✅ OK||❌ NO|
||⚠️ OK within 1 major version of server||✅ OK for server versions >= 9.2|
Output Mode and Parallelization
The pg_dump utility supports a few different output formats for the database snapshot specified by the
-F format command-line option. The default option
-Fp generates a plain-text SQL script file that can be used to rebuild the database. While the plain-text output format can potentially be used on other SQL database products (with some modifications), its speed and capabilities are limited.
Archive file formats provide more robust database snapshot and restore capabilities. The directory (-Fd) and custom (-Fc) archive file formats are compressed by default and offer selective restores of the data contained in the snapshot. The directory format also supports creating and restoring snapshots in parallel (specified by the -j parameter), which is why it is included in the recommended command above. Several command-line options supported by pg_dump are ignored when using archive formats but can be specified when restoring the snapshot with pg_restore.
More robust database snapshot and restore capabilities are provided by archive file formats. The archive file formats directory
-Fd and custom
-Fc are compressed by default and allow for selective restores of the data contained in the snapshot. The directory format also supports parallel snapshot creation and restoration (specified by the
-j option). When using archive formats, several command-line options supported by
pg_dump are ignored, but can be specified when restoring the snapshot with the
pg_restore command-line utility.
Selecting What to Snapshot
There are several command-line options that allow you to specify what is included in the database snapshot. You can include
-n or exclude
-N schemas and use
-t pattern to specify which tables are included in the snapshot. However it is generally recommended to take a full snapshot of the database using the directory archive format and use the
pg_restore utility to selectively restore a subset of the snapshot if desired. This way, if a problem arises, you will always have a complete backup of your original database.
Use DBSnapper to Create PostgreSQL Database Snapshots
The DBSnapper app extends the PostgreSQL snapshot and restore utilities' capabilities and simplifies the process of creating, sanitizing, and restoring database snapshots. When combined with the DBSnapper Cloud, management of your snapshots is much easier with private cloud storage profiles and team sharing options.
Sign up now to accelerate your software development. Stop writing fixtures and start using real-world data from your production database instead. With DBSnapper, creating, sanitizing, and sharing snapshots of your PostgreSQL database has never been easier.