Pretty much everything I do these days, at least in terms of systems administration, is done using Docker Compose. I love the ease and flexibility of running and upgrading containerized applications, and have developed a solid understanding of the whole stack. Although I love Kubernetes, it’s way too much for my simple applications, which consist mainly of WordPress and MediaWiki, as well as specialized applications like ActivePieces, Ubiquiti’s Unifi controller, and Home Assistant. But I don’t love the default configuration of many Docker Compose files: They too often store important data in Docker volumes or even in running containers!
My Standard Docker Compose Deployment
I have a standardized Docker host setup running in various locations on-prem and in the cloud. I use Ubuntu Linux and store all of my containers in /srv/sites on a dedicated Linux volume. I also use a dedicated Linux volume for /var/lib/docker to make sure I never use up all the space with other tasks (hello backup) and kill it. It’s just my way.
Each application has a separate subdirectory in /srv/sites, with the Docker compose file, .env file, and so on stored in a subdirectory beneath that (e.g. /srv/sites/blogfoskettsnet/docker-bfn/docker-compose.yml). All application data is stored in subdirectories under a data directory (e.g. /srv/sites/blogfoskettsnet/data). This keeps everything nice and orderly and allows me to back up, migrate, and restore data with ease.
/srv/site/blogfoskettsnet/docker-bfn/docker-compose.yml
/srv/site/blogfoskettsnet/docker-bfn/.env
/srv/site/blogfoskettsnet/data/db/<mariadb files>
/srv/site/blogfoskettsnet/data/site/wordpress-core/<wordpress install files>
/srv/site/blogfoskettsnet/data/site/wp-content/<wordpress content files>
I “inherited” this configuration from my OG web hosting approach, Evert Ramos‘ “Docker Compose LetsEncrypt NGINX Proxy Companion“. Although I relied on this setup for years, I have lately migrated to my own solution leveraging Cloudflare Tunnels, which I will document soon!
Why Migrate From Docker Volumes?
Today I am documenting how to migrate data out of a Docker volume to run it natively on the host. Why would you want to do this? Portability and maintainability is a big reason: If you want to migrate a container from one host to another you have to bring the data with you. And it’s not easy to do this with Docker’s native volumes, since they are stored in /var/lib/docker in an obscure format. It is possible to migrate a volume to another Docker host, but it’s much easier to simply move the data in a filesystem!
Every time I encounter a head-scratching issue with Docker I am reminded that it was designed more to facilitate software development than to run applications in production. I’ve been ranting for over a decade at the foolish trade-off of IOPS for capacity in the overlay filesystem, and I recently banged my head on my desk when I saw the foolish default network address space allocation.
I feel much the same way about Docker volumes: They’re cool, but they’re really not a good idea in practice.
Docker containers should be ephemeral. To realize the potential of containerization, one should be able to move just the Docker compose file, environment variables, and application data to another host or a fresh install and re-start it. To an old-school sysadmin like me, this is pure freaking magic: Blow away the entire system, re-build it fresh, and have it start as if nothing has changed.
Most package creators realize that storing critical application data inside a running container image is just plain dumb. But they have been lured in by the siren song of Docker volumes. The promise of volumes is to externalize data so containers can be re-built easily. But data in a Docker volume is still stored inside the Docker host’s environment in /var/lib/docker, placing it outside the reach of administrators and developers.
It is far superior to map external host storage into a container. Data remains easily accessible and can be manipulated, backed up, and restored using conventional tools. For example, having /etc/mysql/conf.d or /usr/local/etc/php/conf.d mapped to a regular host storage path allows you to tune the behavior of mysql or php without having to muck about inside containers or volumes. And having /var/www/html/wp-content mapped to a regular filesystem location instead of a Docker volume is a godsend when it comes to data protection!
How To Migrate From Docker Volumes to External Storage
What can you do if you’re faced with an application that uses Docker volumes and you want to migrate it to native host storage? The following approach worked for me and ought to work for you as well. This is based on Guido Diepen‘s 2016 guide to migrating data between Docker volumes and should be fairly straightforward if you have a reasonably strong understanding of Docker.
I am intentionally not including step-by-step instructions here: It’s dangerous to muck about if you don’t have a good level of Linux sysadmin and Docker experience. If this sounds confusing, don’t do it!
First, get to know the volumes at hand. For this example, I will migrate the Postgres and Redis data used by ActivePieces to native storage. ActivePieces creates two Docker volumes, postgres_data and redis_data. These are defined in the compose file and mounted to the Postgres and Redis containers, respectively.
sfoskett$ docker volume list
DRIVER VOLUME NAME
local activepieces_postgres_data
local activepieces_redis_data
Rather than worrying about where storage is mounted in a running container, Diepen realized you could directly map a Docker volume to a known location in a new container. He piped the output of a tar command to another tar running on a new Docker host, allowing the volume to be migrated. But it also allows us to export the data for use on native host storage!
Here’s a one-liner to export the activepieces_postgres_data volume to a local tar file:
sfoskett$ docker run --rm -v activepieces_postgres_data:/from alpine ash -c "cd /from ; tar -cf - . " > ~/activepieces_postgres_data.tar
That’s it! Just substitute the name of any Docker volume for “activepieces_postgres_data” in two spots in this one-liner and you’ve got a tar file of the complete volume data.
From there, I simply un-tarred the data to the desired filesystem location on the host:
sfoskett$ cd /srv/sites/activepieces/data/postgres
sfoskett$ sudo tar -xvpf ~/activepieces_postgres_data.tar
Note the “p” in there – this preserves the ownership (user and group) of the files. These are then mapped directly into the running container, which creates some weird-looking ownership on the host! For example, the Redis container uses uid 999 for its data, which is ZeroTier on my server! This is just a quirk of container storage mapping and should be ignored – don’t try to fix the ownership!
Then I just needed to fix the mapping in the ActivePieces Docker compose file:
sfoskett$ diff docker-compose.yml.bak docker-compose.yml
26c26
< - postgres_data:/var/lib/postgresql/data
---
> - ./../data/postgres:/var/lib/postgresql/data
34c34
< - 'redis_data:/data'
---
> - ./../data/redis:/data
37,39c37,39
< volumes:
< postgres_data:
< redis_data:
---
> #volumes:
> #postgres_data:
> #redis_data:
And voila! I now have ActivePieces running with data stored on the local filesystem instead of inside a Docker volume! This allows me to easily move my running ActivePieces instance from one host to another by creating a tar file (as root) of /srv/sites/activepieces and expanding it elsewhere. Magic!
…except it’s not magic because I also want to use Cloudflare Tunnels to hide all of my applications and allow them to be run anywhere, on any hardware, without opening up network ports. But that’s a story for a different post!
Stephen’s Stance
Maybe I’m a caveman or a fool, but I don’t love Docker volumes. By moving data to a host filesystem I can manage the data more easily using standard software, and can modify it more easily if needed. The little trick of using tar to pipe data out of a Docker volume allowed me to do this with applications like Postgres, Redis, and MySQL, not to mention WordPress. I hope this helps you get your data out of Docker volumes too!
Leave a Reply