Gravity Updated

Tuesday, April 13, 2021

Just under a year ago I started working on a side project, which I talked about here, called Gravity Sync.

The basic premise was to create a script that would synchronize two Pi-hole 5.0 instances. What started out as a few lines of bash code though has grown into something kind of crazy, and has been a fun experiment in (re)learning how to code.

echo 'Copying gravity.db from HA primary'
rsync -e 'ssh -p 22' ubuntu@192.168.7.5:/etc/pihole/gravity.db /home/pi/gravity-sync
echo 'Replacing gravity.db on HA secondary'
sudo cp /home/pi/gravity-sync/gravity.db /etc/pihole/
echo 'Reloading configuration of HA secondary FTLDNS from new gravity.db'
pihole restartdns reload-lists

That's it. The basic premise was simple. You have two Pi-hole, all of the blocklist configurations are stored in a single database called gravity.db that need to be the same at both sites. If you made a change to those settings on the primary, you'd login to the secondary and run this code and it would copy the file over from the primary and replace it.

It ran on my system just fine, and it met my needs just fine. I originally included most of this in a post last May, and shared it on a couple of Slack channels with people I knew.

After a little bit I decided I should make this easier to get going, so I started adding things like variables to the top of the script for things like the username and IP address of the other Pi-hole. I also added a few colors to the text.

Feature Requests

The first person to say "yeah that's great but it'd be better if…" was Jim Millard. He wanted the ability to send changes from the secondary back to the primary. From this the first arguments were introduced. Previously you'd just run ./gravity-sync.sh and then things would happen in one way. If I wanted the script to be bi-directional, I'd need a way to indicate that. So with 1.1 of the script, you could say ./gravity-sync.sh pull or ./gravity-sync.sh push and now the data went one way or the other.

At this point I’d realized posting new copies of the raw code to Slack wasn’t very efficient, so I’d move to a GitHub Gist. The script was sort of retroactively deemed 1.1, because there was really no version control or changelog, it was all mostly in my head.

Shortly after breaking out the push and pull functions, I decided to break out the configuration variables into their own file, so that you could just copy in a new version of the main script without having to reset your install each time.

At this point since I had more than one file, using a Gist wasn't very practical, so I moved into a new GitHub repository. Having a repo made me think it might be pretty easy to update the script by just pulling the refreshed code down to my Pi-hole. I started doing this manually, and then realized I could incorporate this into the script by creating a new ./gravity-sync.sh update argument to do this for me.

The ability to check if you had the latest version of the code came shortly after that, and then some basic logging.

Functions

The whole script itself was just one big file and sort of processed through itself serially. That’s really all a bash script can do, is run line by line. I think one of the smarter things I did early on was as the script started to grow, was figure out how to rip out various components and put them into functions. There were parts of the script that were repeatable and having the code physically written out twice in the same script to do the same thing is a waste and introduces the potential for errors.

Around this time I also started to experiment with different status update indicators. At this time the script was really still designed to be run manually, although you could automate it with cron if you wanted.

The various STAT GOOD FAIL WARN messages that the script would throw up, I was also including in each message, including the color. I had a variable for the color code, so I didn’t have to include that, but if I wanted to change the color, or the message itself I’d have to find and replace on every instance.

echo -e "[${CYAN}STAT${NC}] Pulling ${GRAVITY_FI} from ${REMOTE_HOST}"

What if I put the message itself into a variable, and then had a function that assembled the output based on what type of indicator I wanted?

CYAN='\033[0;96m'
NC='\033[0m'
STAT="[${CYAN}STAT${NC}]"
function echo_stat() {
echo -e “${STAT} ${MESSAGE}”
}
MESSAGE=“Pulling ${GRAVITY_FI} from ${REMOTE_HOST}”
echo_stat

I now had a process that was repeatable and I wanted to change any element of the output, I can do it once in the color variable, or the STAT block, and then every time echo_stat was referenced they’d all get updated.

Hashing

One of the big problems I still had at this point is that everytime the script was run, the database was replicated. This was fine for my enviornment, with a small database, but it generated a lot of write activity on the SD cards, and wasn’t ideal.

This started with a simple MD5 hashing operation on both databases. Gravity Sync would look at the local database, and record the hash. It would query the remote databases, and record it’s hash. If they were the same, the script would exit as no replication was required.

If they were different, then it would initate a replication based on the direction you indicated when it was ran (pull or push) and copy the database just like it did before.

Simplifying Deployment

With each release I was starting to add things that required more work to be done with deployment. I wanted to get to the point where I could cut out as much human error as possible and make the script easy as possible to implement.

With 1.4, I added the ability to run a configuration utility to automate the creation of SSH keys, and the required customization of the .conf file.

By this point I’d started to add some “optional” features to the script that were designed to work around a few edge cases. I wanted to validate that the system on the other side was reachable by the secondary Pi-hole, so I added a quick ping test. But not everyone allows ping replies, so this forced me to start letting folks opt out based on their network.

I had to also start deciding if that should be part of the configuration script, or be something that you have to contunue to manually configure the script yourself with. Ironically in my quest to simplify deployment, I made the script a lot more complicated for myself to maintain. The configuration option eats up almost as much time as the core script does, as with any new feature addition I now have to decide if I need to build that into the configuration workflow to get a functional deployment with it, and the implication on other features and functions.

Saying Yes

Around this time another friend of mine asked me if the script would sync the local DNS records in Pi-hole. I didn’t use the feature at the time, and didn’t know how it worked. It turned out to be a mostly simple flat file that could be replicated along side the main gravity.db file.

I was able to reuse the bulk of the replication code to add this, while making it optional for folks who didn’t use the function and therefore didn’t have the file (at the time, Pi-hole has since changed it so the file is created by default even if it’s not used, thanks!)

Saying No

My same friend was using DietPi on his installs, and by default DietPi uses an alternative SSH server/client called Dropbear. Gravity Sync makes extensive use of both SSH and RSYNC to issue commands and files from the secondary to the primary host. Most of the commands only work specific to the OpenSSH implementation and were not the same for Dropbear.

I spent a lot of time getting Dropbear support worked into the script, and announced it as a new feature in version 1.7. Other people started using it.

But it never worked right, and was just confusing for the vast majoirty of folks who had OpenSSH installed by default.

Up to this point I’d tried to work in any feature request that I could because they seemed like reasonable concessions and made the script work better. With this request I should have said “No, just switch DietPi to OpenSSH” and left it alone. Plenty of development cycles were wasted on this, and the ability to use Dropbear was later removed in version 2.1.

Unexpected Side Effects

The decision to support Dropbear wasn’t all bad, as it drove me to rewrite more and more of the existing script in ways that allowed them to be repeatable functions. I wanted to make it as simple as I could for each SSH command to execute how it needed to based on the client and server options, similar to the example with status messages I explained previously. This would go on to serve me well later with future code changes.

Getting Smarter

Version 2.0 was a real big turning point for Gravity Sync. As I was going along breaking more and more of the script up into functions, becoming more modular, I figured out how I could use those modules differently. Up until now it was pull or push. With 2.0, I figured out how I could decided which files had been changed, and then send them the way they needed to be without any user intervention.

It was around this time where the popularity of the script really started to take off. With more attention came a more critical eye. Someone on Reddit pointed out that my method of simply ripping the running database away in a copy could lead to corruption and that the right way to do this was by doing a SQLITE3 based backup of the database.

I started by building out a backup routine with this new method, and incorporating that into the automation script. Once I was satisfied with a few installs, I started working on a way to use this new method as the actual replication method.

Containers

Originally Gravity Sync had no support for containerized deployments of Pi-hole, because up until last fall I really did nothing with containers. This was a feature request from almost the beginning, and one that I ignored as I wasn’t sure how to properly support it. As I started playing around with them more I realized this would be both easier to support than I anticipated and also really useful for me personally.

Documentation

It was pretty early on when someone pointed out that I didn’t have any changelog or proper release notes for Gravity Sync. I’m glad they did. Looking back on the last year it’s nice to see all the work laid out, and actually made writing this retrospective a lot easier.