• Micro.blog

    After decommissioning the vmst.io WriteFreely instance last week, I moved my blog back to Ghost. But then this weekend I remembered what an absolute shit experience writing or editing on a mobile device is with Ghost. A few people mentioned that they just don’t blog from mobile, but I like to be able to do all my work from as many devices as possible (I push vmst.io updates from my phone just because I can.)

    There are still no native Ghost apps, first or third party, and the web interface is terrible on a small screen. Autocomplete doesn’t work, which means capitalization, punctuation and spelling is especially awful even for me.

    I have been listening to the Core Intuition podcast a lot recently, and decided to give Micro.blog and MarsEdit (on Mac) a try. Combined with the recently refreshed Micro.blog iOS app, I think it’s what I need. But what really sold me was the nearly flawless migration from Ghost using the json export file. After finding a nice theme, I flipped the DNS CNAME over.

    I’m still playing with the automatic posting to my Mastodon account, as well as how the micro.blog site will play with ActivityPub. This post will help me see that better 😏

    Monday May 22, 2023
  • Sleep Apnea

    I went to an ENT a few weeks ago because I have been getting tonsil stones for a few years now, especially bad in the winter months. He said “yep your tonsils need to come out, this is going to be terrible, plan to be out for at least a week” 😯

    That adventure is now scheduled for the week of June 26.

    While I’m there I also mentioned that I snore really badly at night. It’s bothered my wife for years and more recently my children have joined in on making it clear they hear it all night too. He ordered an at home sleep study which consisted on wearing a magic Bluetooth ring on my finger for two nights. 😴

    I was skeptical that they’d get good results but the doctors office called with the results today, and apparently I stop breathing while sleeping up to 30 times an hour. 😧

    So I’m getting a CPAP. 😤

    Now at the old age of 39, I have two chronic health conditions, with anxiety/depression being the other.

    Monday May 22, 2023
  • Write(less)Freely

    On Friday, the vmst.io team made the decision today to sunset our WriteFreely offering for vmst.io members. Since launching in January we've only had 23 users create blogs, of which only 12 users ever generated any content, many of which were a single post. In total we had about two users aside from myself that used it with any frequency.

    My usage was mostly to justify it's continued existance.

    Because WriteFreely requires a MySQL database backend (all our other services use Postgres) the costs of keeping the service operational outweighed the benefits to our members.

    I will be moving some of the content that I generated there, here.

    Sunday May 21, 2023
  • Classic Ordering

    Elan Hasson asked in a Matrix chat group of Mastodon admins how he should go back and watch classic Star Trek. It got me thinking about a list of episodes to help jumpstart your knowledge without having to watch every Original Series episode (of which some haven't aged super well, or are confusing to new viewers in the context of modern Trek.)

    Here's the list that I came up with:

    • The Naked Time: Season 1, Episode 4
    • Mudd's Women: Season 1, Episode 6
    • The Corbomite Maneuver: Season 1, Episode 10
    • The Menagerie: Season 1, Episode 11 (Part 1) and Episode 12 (Part 2)
    • Balance of Terror: Season 1, Episode 14
    • Arena: Season 1, Episode 18
    • Tomorrow is Yesterday: Season 1, Episode 19
    • Space Seed: Season 1, Episode 22
    • Errand of Mercy: Season 1, Episode 26
    • City on the Edge of Forever: Season 1, Episode 28
    • Amok Time: Season 2, Episode 1- Mirror, Mirror: Season 2, Episode 4
    • The Doomsday Machine: Season 2, Episode 6
    • The Trouble with Tribbles: Season 2, Episode 15
    • The Ultimate Computer: Season 2, Episode 24
    • Assignment: Earth: Season 2, Episode 26
    • The Enterprise Incident: Season 3, Episode 2
    • The Tholian Web: Season 3, Episode 9
    • Let That Be Your Last Battlefield: Season 3, Episode 15
    • Turnabout Intruder: Season 3, Episode 24

    I feel like that would be the core of some really good episodes, episodes that help round out some of what you see happen later in The Next Generation, Strange New Worlds, Discovery, and even Picard. As well as some just fantastic science-fiction stories, like City of the Edge of Forever.

    Tuesday April 18, 2023
  • Netlify More

    We’ve been using Netlify for the docs.vmst.io site for a while now, but it wasn’t until recently I realized it could be used for other things that we host.

    I’ve now successfully moved:

    This means fewer Docker containers and custom build operations are needed to maintain these deployments. Any new releases of these platforms can be deployed automatically in a few minutes and new deployments can be tested easier.

    Saturday April 15, 2023
  • Instance Purge

    This morning I removed just over 3000 dead instance connections from vmst.io.

    I know Mastodonizens don’t follow their follower accounts as closely as some folks on other networks, but I wanted to let you know you this operation might have caused some go missing in this purge.

    These are instances that have quit responding for at least seven days, and in some cases they haven’t been around since November. A couple weeks ago I saw an uptick in the amount of SSL errors in Sidekiq communicating with various remote instances and noticed a trend of instances that were stood up around 90 days ago, with Let’s Encrypt certificates, that have since been abandoned.

    It's likely that folks were trying to self host and then realized what a pain in the ass it is 😉 As @supernovae@universeodon.com just put it to me in the project Discord, "lots of failed dreams in there."

    I’d say that most of the domains listed didn’t have anyone following or being followed by vmst.io users, but some did. These accounts would not have been in communication with us for some time, as Mastodon stops actively trying to communicate with systems after they have failed for seven days.

    While some of these were removed manually if they were larger instances, this process was largely accomplished by doing a dump of the unavailable_domains table in the Mastodon Postgres database, and exporting the list to a giant string of domains.

    RAILS_ENV=production ./bin/tootctl domains purge away.we.go not.here.anymore bye.bye.birdie

    In my spot checking of these instances, most of them had no direct follow/follower relationship with our users, but some of them did. Going forward this process will be run more frequently.

    Wednesday March 1, 2023
  • Blue Green

    I've doing some blue/green (maybe it's just A/B) testing on vmst.io this week with server updates and Mastodon dependency testing. Specifically focused on some upgraded versions of Nginx and Puma (which are now fully live for everyone) and some backend libraries for the streaming API to things like Redis and Postgres.

    I've been rolling out more updates this way, since things are pretty happy and functional with multiple systems running each component. I haven't decided what the right period of time to let things bake-in is, but I suppose that depends on what is changed.

    The streaming updates are only on half the things right now. So if something either works better or fails spectacularly 50% of the time, that’s why.

    I hate running old code. I've been running Debian 11 stable for a long time, but some of the packages that I end up using are past where they are on other distributions, so I recently flipped to the testing branch. So far I've not run into any issues other than needing to redownload/update some of the Ruby Gems used by Mastodon.

    I'm also currently testing HAProxy as a replacement for Stunnel in our Mastodon Redis configuration. So far so good. I've noticed some activity in the Mastodon GitHub focused on a replacement for the current Redis libraries that don't work with TLS connections. Being able to completely remove Stunnel/HAProxy from the setup would be a fantastic simplification for me.

    Monday February 27, 2023
  • Virtually Federated

    I've been doing things with computers, since last October.

    In addition to launching a Mastodon instance at vmst.io I've since launched an integrated WriteFreely platform at write.vmst.io. As a result I will be blogging there regularly from now on.

    The vmstan.com site will remain here for the time being, until I can maybe migrate the content into WriteFreely.

    Wednesday January 18, 2023
  • January Infrastructure

    One of my core beliefs about Mastodon and the Fediverse is that there needs to be transparency in the operations of the various projects and instances.

    This includes transparency in both the financial and technical operations of the instance. For the financial piece, we have a single source of truth on our Funding page that has total income from memberships as well as monthly/yearly expectations of expenses. We'll be providing a more granular update here soon that shows exactly where we are in terms of free cash flow. (Preview: we're solid for a bit based just on membership rates + one time donors to date.)

    Another Server Post?

    I don't intend to post about our infrastructure changes on a monthly basis, it just seems like I've done nothing by rearrange things on an almost daily basis, so I feel it important to discuss them. Every time I've posted something like this, I've had some kind of feedback from other #mastoadmins or even members with suggestions of ways to do this better. I'm not saying that things are settled, but I feel like we're in a good place right now.

    Last time I provided an update was December 15, 2022. Just before Elon Musk did something else really stupid, I don't even remember what it was at this point. In the course of a few hours the headcount of the instance doubled.

    Sometimes the only ways to test a system are by putting it under stress. Since that last update it became obvious that the way I had Sidekiq laid out wasn't optimal. Last month I had the front end nodes (Kirk and Spock) doing both Web services, and Sidekiq ingress queues. (Ingress is responsible for handling the influx of data from other instances, so anytime someone else posts an update and they let us know, we have to go fetch that data and any attachments.) One day I woke up and the site was down because someone somewhere posted a video and the front end nodes were being crushed by ffmpeg processes.

    Not ideal.

    Changes Since December

    Scotty is now responsible for handling the bulk of Sidekiq activity, including the Ingress queues. The front end nodes now only handle the primary Mastodon Web functions.

    Uhura is now responsible for handling the Mastodon Streaming API, as well as acting as a HTTPS proxy for other backend services like our internal metrics and monitoring. One deviation from Kirk and Spock that handle web traffic, is that Uhura is running Caddy as a web proxy instead of nginx. This is somewhat experimental but has not presented any known issues. Uhura now also hosts our status.vmst.io system powered by a self-hosted instance of Uptime Kuma. This replaces the services of Uptime Robot.

    Exec no longer hosts Elastic Search, which has been moved to the free tier of AWS and replaced with Open Search. Exec runs smaller versions of Scotty's Sidekiq queues plus the scheduler queue, but is primarily responsible for the management, backups and automation of the entire Mastodon system.

    Kirk and Spock are now running fully updated versions of nginx, with restrictions on TLS versions (1.2 or higher now required) and ciphers. As a result support for older browsers (IE8) or systems (Java 7) have been dropped. This is unlikely to actually effect anyone negatively.

    I have also moved this site back to the Digital Ocean static site generator. I was running there previously, then moved to Netlify as a test. While there was nothing wrong with Netlify it just didn't make sense having things in another control panel with no added benefit.

    The backup processes have been streamlined:

    • I have dropped the trial of SimpleBackups that I was running as it was too expensive for what it provided.
    • For the backup of Postgres we use pg_dump and redis-cli for Redis.
    • The native b2-cli utility is then used to make a copy to a Backblaze B2 bucket.
    • The CDN/media data is sync'd directly to Backblaze B2 via rclone.
    • This is done using some custom script that process each task and then fire off notifications to our backend Slack.
    • Backups run every 6 hours. Database backups are retained for 14 days, currently. This may be adjusted for size/cost considerations down the road.
    • All backups are encrypted both in transit and at rest.

    Previously a lot of the automation functions for things like firing off notifications to the Mod Squad in Slack about reported posts, new users, or various server activities were done using Zapier.

    Recently I started working with n8n as a self-hosted alternative and have moved almost all of our processes there and expanded out to many more. Zapier is still used to notify and record donations via Patreon and Ko-fi because they have some native integrations that have proven troublesome to recreate in n8n, but the intent is to move away as soon as possible.

    Mastodon Software Stack

    One thing that I wanted to touch on before I close this out, is the idea of Mastodon forks (such as Glitch or Treehouse) or doing other local modifications to the Mastodon stack.

    Our goal is to run the latest stable version of the Mastodon experience as soon as it's available, with a goal of being live here within 72 hours of being published. If there are security related updates we intend to take those on even quicker to protect our infrastructure, users and your data. In order to do this we run unmodified versions of the Mastodon code found on GitHub, specifically consumed via Docker using the images provided by Mastodon to Docker Hub.

    In order to mitigate an issue with the Streaming API and backend PostgresSQL database, we've had to do one modification to the streaming/index.js that disables an self-signed SSL check by editing two lines in the code. Because of this, that component is not running via Docker but instead running as a native Linux systemd service. Once this workaround is not necessary in the upstream Mastodon code, we will revert to the mainline Docker image.

    Other than those changes necessary for the functionality of our system, we do not intend to modify or customize Mastodon code in any other way that changes the user experience.

    Wednesday January 4, 2023
  • December Infrastructure

    I wanted to provide a round-up of some changes to the vmst.io infrastructure since it went live last month.

    No More Cloudflare

    Our domain had always been registered and DNS provided through Cloudflare. For a brief period I was testing their WAF (web application firewall) service with the site but this led to more problems than perceived benefits. The sentiment within the Fediverse is generally negative towards Cloudflare, although many other instances use them.

    When attacks were launched against various ActivityPub instances by a bad actor being protected by Cloudflare, I decided that it was time to stop using their services. The domain is now registered through a different service but is in the process of being transferred to Gandi. DNS services are provided through DNSimple. I intentionally broke up these two components. DNSSEC is currently not enabled for the domain, but will be as soon as the transfer work is completed.

    I may look for an alternative to provide a level of DDoS/WAF protection for the site, as we grow. For the time being your secure connections will terminate directly to the Digital Ocean managed load balancers and CDN.

    Streamlining Certificates

    We launched with free Let's Encrypt digital certificates for the site and CDN. Let's Encrypt is designed to be a fully automated certificate authority. I love Let's Encrypt and everything they stand for. Unfortunately due to the way our web servers, CDN, and load balancers are configured, automation was easier said than done.

    While I could have continued to manually generate the certificate and apply it to the various components every 90 days, I decided for the sake of not being responsible for that to purchase a certificate through Sectigo for this use. Not only does this extend the renewal responsibility to a year, the generation and application is simpler for me on the backend.

    Additionally, docs.vmst.io has been moved from the Digital Ocean static site generator to Netlify. The major reason was to allow the use of customer provided certificates. Digital Ocean only uses certificates issued by ... Cloudflare.

    Our status.vmst.io page will continue to serve a Let's Encrypt certificate, as there is no mechanism to provide a customer certificate on that service through UptimeRobot.

    Backups

    No one wants to think about backups.

    Backing up the instance on Masto.host was provided by the service.

    Prior to recently, as the focus was just getting things established, I was doing backups only of the database on a manual and infrequent basis. I liked nothing about this.

    I'm currently trialing a service called SimpleBackups that integrates with Digital Ocean to connect to the Redis and Postgres databases, and CDN/Object Store, as well as use native connections GitHub, to automate and perform regular backups of the infrastructure on a daily basis.

    Once I have a handle on size, load, and timing, we'll take more frequent backups. Backups are done to locations outside of Digital Ocean, so in the event of a disaster that impacts the Toronto or NYC datacenters where our data lives, or if Digital Ocean decided to go evil and delete our account, we'll be able to recover the data from a combination of AWS and Backblaze.

    The configurations for all of the Docker and Mastodon components necessary to reconstitute the site (even on a totally different provider) are all stored in a private GitHub repository, also backed up (in multiple locations) to allow quick recovery of any critical component.

    Reduced Frontend Count

    We originally launched with three frontend servers. After spending the last month tweaking Sidekiq, Puma, database pools, and other various Mastodon related services, I decided to vertically scale the front end systems but reduce the count to two. This is actually cheaper than running the three smaller ones.

    Should we experience a need to scale back up to three, it will be trivial to do so as the front end servers are actually an image on Digital Ocean that can be deployed within a few minutes. I originally wanted three to allow flexibility during maintenance operations, if a server was down for updates and we experienced a load spike or other event. Because of the ease of image deployment and the centralized configuration I have put in place, I can temporarily deploy an additional front end system while another is out of service with just a few clicks.

    Additionally, after making adjustments post-Cloudflare, the load balancer should serve up connections via the HTTP2 protocol. While mostly transparent to users, this has the effect of drastically reducing load on the web-frontend.

    Thursday December 15, 2022
  • Registration Numbers

    I’ve had a few thoughts rattling around about registration numbers on vmst.io, vs instances that are generally wide open. The instance officially went live on October 6, but didn’t let anyone else join until a few weeks after. We are listed on joinmastodon.org so we definitely see more random user signups than instances that just rely on word-of-toot.

    Because we ask folks to "apply" for their account here rather than just signup without moderation, we get maybe 1/10th the number of registrations you’d see in somewhere else. The bulk of folks coming from Twitter en-mass were scared, impatient, unwilling to wait. I base this on my observations of registration activity during the mass migration of folks coming over after the Twitter layoffs.

    Some sites that had open registration with less than a 100 users at the start of November ended that week with over 30,000 accounts. We had probably 3,000 applications during that two day period, before the decision was made to close registrations temporarily.

    All of this is to say, we have just over 1600 members right now. We could have had a lot more if we wanted.

    Of those who apply, we have a method of deciding who to let in:

    • We look at the username and display name and reject anything that’s obviously distasteful given the type of community we are seeking to build (xxxlol69, etc)

    But specifically we ask folks to give a reason why they want to join:

    • If people put nothing there, it’s rejected.
    • If they put “idk” it’s rejected.
    • If they just tell me that “Elon sucks” it’s rejected.
    • If it just looks kinda sus... it’s rejected.We also decided that if the application isn’t in English it will also be rejected.

    We are clear in our site description that we are English speaking. This isn't done out of some desire to limit interaction of folks in other languages. We do this only because of our current inability to moderate non-English posts.

    We try never to approve a user until they’ve confirmed their email account. We’ll manually trigger at least one reminder email for confirmation but if after a few days the account remains unconfirmed, we remove it from the queue.

    That basic level of filtering probably means about 1/5 of the people who apply are accepted. That isn't some target/goal, that's just the rough estimate based on the facts above.

    As I was writing this, there were 9 people in the queue.

    I approved 2.

    There has been a somewhat noticeable uptick in the amount of junk registrations. On Tuesday I rejected probably 20 in a row where they were obviously just spamming our registrations page. We’ve periodically closed registrations when we needed to, and had an extended period as we migrated from Masto.host to running on our own infrastructure.

    With that exception, we’re not keeping things small because of major infrastructure considerations at this point. We want things to be performant for the folks who are here, but we can scale a lot higher if we choose to.

    We do this mostly because we want to scale the community here in a responsible way.

    Thursday December 8, 2022
  • November Infrastructure

    This post is a rollup and expansion of a set of Toots around the new vmst·io infrastructure that went live on the morning of Wednesday, November 23, 2022.

    When I launched vmst·io at the start of October, it was intended to just be a personal instance. mastodon.technology had just announced it's pending closure, and I wanted to see what it was like to own my own corner of the Fediverse.

    I signed up for a $6 plan with masto.host and migrated my account. Everything was great, except it was kinda boring. Being on a single-user instance means your local timeline is just you, and your federated timeline is just the same people you follow.

    So I invited a few friends to join me, and upped the hosting to the $9 plan. Then Elon officially bought Twitter and suddenly a few more friends wanted to join me, so I went to $39. Then Elon purged Twitter staff and suddenly I needed about $79 worth of hosting.

    Even before I went to the $39 plan, I started wondering if I could run this myself. So I started digging into documentation, testing various providers, and building an architecture. That is what we moved into on November 23. Now that things have settled, want to take a peek behind the firewall?

    Horizontal or Vertical

    When we talk about scaling any platform, there are generally two directions you can go. Horizontal or vertical.

    Vertical scaling is generally easy, if your app needs more memory than the host has, add more. If it needs more CPU, and it's multi-threaded, just add more. Horizontal scaling is sometimes a little more tricky. This means adding more instances of your application. Even though we're a small instance in comparison to places like hachyderm.io or infosec.exchange, my goal was to build us from the start to be able to go both directions.

    Almost all of our new infrastructure lives in the Toronto and New York data centers of Digital Ocean. Email notifications are handled by Sendgrid. DNS resolution comes through Cloudflare.

    All public traffic is encrypted and what isn’t happens on private networks. We are using managed load balancing and database services. The various self-managed services run on Debian based Docker hosts.

    Why Digital Ocean?

    While the company I work for is a major partner of the major public cloud players, I like to support the littler/independent folks.

    I’ve hosted many things in Linode and Digital Ocean over the years, and in comparing the two it’s really a toss up in price and features. The managed database offering is what finally pushed me to Digital Ocean. The uncertainty with Linode being acquired by Akamai recently, also weighed in.

    One thing I wanted to do was put the backend databases (PostgreSQL and Redis) into managed instances, because I'm not a DBA or an expert in either platform. Linode offered only managed PostgreSQL, and to get no-downtime upgrades you had to purchase their high availability option which was a minimum 3x increase in price for a similarly spec'd platform.

    Digital Ocean also had support for pgBouncer built in. More on that later.

    Why Toronto?

    I’m in the central US, so response times to any DC offering in North America are usually pretty good. I had our moderators and some friends in other countries (Europe) test and they came back with Toronto as the lowest on average. Also, I figured putting the servers in Canada would make sure they were polite. 🇨🇦
    The object store is in New York because it was the closest geographical DC where DO offered the service. The speed of light is only so fast.

    Why Sendgrid?

    I tried Mailjet first and found it finicky. I tried Sendgrid and it worked the first time and every time since. What I discovered later was that by default Sendgrid includes tracking links/images in messages. I have zero interest in knowing if you’re opening your notifications and I personally run blockers to disable all this junk in my own email.

    So while it’s relatively benign, and is related with the service offering, it’s not consistent with our privacy policies so they have been disabled going forward.

    Why Cloudflare?

    Cloudflare is our domain registrar, and also DNS provider.

    What does vmstio-exec do?

    vmstio-exec is essentially the Master-Mastodon node, holding the configuration that is presented to the worker nodes. Also, unlike on the workers, Mastodon is not in a container so I can do things like have direct database access and access to utilities like `tootclt` without impacting front end traffic.

    Mastodon requires one Sidekiq node to run the ""scheduler"" queue, so that's where it sits. It also has more CPU and memory allocated so it can process other Sidekiq jobs while the worker nodes focus on web traffic.
    The NFS share is used to make sure that all of the worker nodes always have the same/latest copy of the configuration and certificates.

    What do the vmstio-workers do?

    These are the frontend nodes for the site. User requests (either direct or from federated instances) flow through Cloudflare to our Digital Ocean managed load balancer. This load balancer can currently handle up 10,000 concurrent connections, and easily scale beyond that with a few clicks.

    The load balancer monitors the health of every worker node and if they're reporting that they're available, then  nginx will handle accept user connections.
    The workers run a complete deployment of Mastodon in Docker containers. Docker allows each startup of the application components to be a clean boot from the image provided by Mastodon. (I hope to move the nginx component to a Docker container before long.)

    Each worker has threads dedicated to handling the frontend web traffic (Mastodon Web & Mastodon Streaming), as well as processing some of the backend load (Sidekiq).

    There are usually three worker nodes running. This allows at least one to be down for maintenance, without impacting user traffic. They are regularly reimaged via Digital Ocean tools, although not automatically.

    What is Stunnel for?

    Mastodon (specifically the Sidekiq component) cannot currently speak native TLS to Redis, meaning all of the traffic is over plaintext. While this isn't a deal breaker as the communication is happening over a private network, it's not ideal. Additionally, I wanted to use Digital Ocean's managed Redis offering instead of being responsible for this component myself. Digital Ocean does not permit you to disable TLS.

    Stunnel creates a secure tunnel for the connection from Mastodon/Sidekiq to Redis, sidestepping Mastodon's lack of TLS unawareness. Mastodon actually thinks it's talking to a Redis instance running on localhost.

    What does the Object Store do?

    This is where all of the image, videos and other media that gets uploaded is stored. It also caches the media of federated instances that you interact with. There is a CDN (Content Delivery Network) that is managed by Digital Ocean that brings these large files closer to your location when you access them. That ability is further enhanced by Cloudflare.

    What does Elastic Search provide?

    When you search for content within the instance, Elastic Search is used to scan the content of your posts, and other posts you interact with so that you don't have to go hunting for it later. Without Elastic Search running you'd only be able to search by hashtags. Not all Mastodon instances have this available.

    What is a pgBouncer?

    pgBouncer manages the connection by the various worker/exec nodes, their various Sidekiq and Puma (web) threads, to the PostgreSQL database. This provides more flexibility in scaling up and managing connection counts. It effectively acts like a reverse load balancer for the PostgreSQL database.

    Are you done?

    Never. As we find betters ways to secure, scale, or provide resiliency -- we'll iterate. Even since launching last week, we've changed a few things around like using Stunnel for connection to managed Redis databases, and added Telegraf and InfluxDB for better telemetry of the infrastructure.

    Enjoy!

    Monday November 28, 2022
  • Still Here

    Honestly just making sure my site still works 🤭

    Considering the news of the day, maybe I'll just blog more.

    Tuesday October 4, 2022
  • Gravity Updated

    Just under a year ago I started working on a side project, which I talked about here, called Gravity Sync.

    The basic premise was to create a script that would synchronize two Pi-hole 5.0 instances. What started out as a few lines of bash code though has grown into something kind of crazy, and has been a fun experiment in (re)learning how to code.

    echo 'Copying gravity.db from HA primary'
    rsync -e 'ssh -p 22' ubuntu@192.168.7.5:/etc/pihole/gravity.db /home/pi/gravity-sync
    echo 'Replacing gravity.db on HA secondary'
    sudo cp /home/pi/gravity-sync/gravity.db /etc/pihole/
    echo 'Reloading configuration of HA secondary FTLDNS from new gravity.db'
    pihole restartdns reload-lists
    

    That's it. The basic premise was simple. You have two Pi-hole, all of the blocklist configurations are stored in a single database called gravity.db that need to be the same at both sites. If you made a change to those settings on the primary, you'd login to the secondary and run this code and it would copy the file over from the primary and replace it.

    It ran on my system just fine, and it met my needs just fine. I originally included most of this in a post  last May, and shared it on a couple of Slack channels with people I knew.

    After a little bit I decided I should make this easier to get going, so I started adding things like variables to the top of the script for things like the username and IP address of the other Pi-hole. I also added a few colors to the text.

    Feature Requests

    The first person to say "yeah that's great but it'd be better if…" was Jim Millard. He wanted the ability to send changes from the secondary back to the primary. From this the first arguments were introduced. Previously you'd just run ./gravity-sync.sh and then things would happen in one way. If I wanted the script to be bi-directional, I'd need a way to indicate that. So with 1.1 of the script, you could say ./gravity-sync.sh pull or ./gravity-sync.sh push and now the data went one way or the other.

    At this point I’d realized posting new copies of the raw code to Slack wasn’t very efficient, so I’d move to a GitHub Gist. The script was sort of retroactively deemed 1.1, because there was really no version control or changelog, it was all mostly in my head.

    Shortly after breaking out the push and pull functions, I decided to break out the configuration variables into their own file, so that you could just copy in a new version of the main script without having to reset your install each time.

    At this point since I had more than one file, using a Gist wasn't very practical, so I moved into a new GitHub repository. Having a repo made me think it might be pretty easy to update the script by just pulling the refreshed code down to my Pi-hole. I started doing this manually, and then realized I could incorporate this into the script by creating a new ./gravity-sync.sh update argument to do this for me.

    The ability to check if you had the latest version of the code came shortly after that, and then some basic logging.

    Functions

    The whole script itself was just one big file and sort of processed through itself serially. That’s really all a bash script can do, is run line by line. I think one of the smarter things I did early on was as the script started to grow, was figure out how to rip out various components and put them into functions. There were parts of the script that were repeatable and having the code physically written out twice in the same script to do the same thing is a waste and introduces the potential for errors.

    Around this time I also started to experiment with different status update indicators. At this time the script was really still designed to be run manually, although you could automate it with cron if you wanted.

    Image.jpeg

    The various STAT GOOD FAIL WARN messages that the script would throw up, I was also including in each message, including the color. I had a variable for the color code, so I didn’t have to include that, but if I wanted to change the color, or the message itself I’d have to find and replace on every instance.

    echo -e "[${CYAN}STAT${NC}] Pulling ${GRAVITY_FI} from ${REMOTE_HOST}"
    

    What if I put the message itself into a variable, and then had a function that assembled the output based on what type of indicator I wanted?

    CYAN='\033[0;96m'
    NC='\033[0m'
    STAT="[${CYAN}STAT${NC}]"
    

    function echo_stat() { echo -e “${STAT} ${MESSAGE}” }

    MESSAGE=“Pulling ${GRAVITY_FI} from ${REMOTE_HOST}” echo_stat

    I now had a process that was repeatable and I wanted to change any element of the output, I can do it once in the color variable, or the STAT block, and then every time echo_stat was referenced they’d all get updated.

    Hashing

    One of the big problems I still had at this point is that everytime the script was run, the database was replicated. This was fine for my enviornment, with a small database, but it generated a lot of write activity on the SD cards, and wasn’t ideal.

    This started with a simple MD5 hashing operation on both databases. Gravity Sync would look at the local database, and record the hash. It would query the remote databases, and record it’s hash. If they were the same, the script would exit as no replication was required.

    If they were different, then it would initate a replication based on the direction you indicated when it was ran (pull or push) and copy the database just like it did before.

    Simplifying Deployment

    With each release I was starting to add things that required more work to be done with deployment. I wanted to get to the point where I could cut out as much human error as possible and make the script easy as possible to implement.

    With 1.4, I added the ability to run a configuration utility to automate the creation of SSH keys, and the required customization of the .conf file.

    By this point I’d started to add some “optional” features to the script that were designed to work around a few edge cases. I wanted to validate that the system on the other side was reachable by the secondary Pi-hole, so I added a quick ping test. But not everyone allows ping replies, so this forced me to start letting folks opt out based on their network.

    I had to also start deciding if that should be part of the configuration script, or be something that you have to contunue to manually configure the script yourself with. Ironically in my quest to simplify deployment, I made the script a lot more complicated for myself to maintain. The configuration option eats up almost as much time as the core script does, as with any new feature addition I now have to decide if I need to build that into the configuration workflow to get a functional deployment with it, and the implication on other features and functions.

    Saying Yes

    Around this time another friend of mine asked me if the script would sync the local DNS records in Pi-hole. I didn’t use the feature at the time, and didn’t know how it worked. It turned out to be a mostly simple flat file that could be replicated along side the main gravity.db file.

    I was able to reuse the bulk of the replication code to add this, while making it optional for folks who didn’t use the function and therefore didn’t have the file (at the time, Pi-hole has since changed it so the file is created by default even if it’s not used, thanks!)

    Saying No

    My same friend was using DietPi on his installs, and by default DietPi uses an alternative SSH server/client called Dropbear. Gravity Sync makes extensive use of both SSH and RSYNC to issue commands and files from the secondary to the primary host. Most of the commands only work specific to the OpenSSH implementation and were not the same for Dropbear.

    I spent a lot of time getting Dropbear support worked into the script, and announced it as a new feature in version 1.7. Other people started using it.

    But it never worked right, and was just confusing for the vast majoirty of folks who had OpenSSH installed by default.

    Up to this point I’d tried to work in any feature request that I could because they seemed like reasonable concessions and made the script work better. With this request I should have said “No, just switch DietPi to OpenSSH” and left it alone. Plenty of development cycles were wasted on this, and the ability to use Dropbear was later removed in version 2.1.

    Unexpected Side Effects

    The decision to support Dropbear wasn’t all bad, as it drove me to rewrite more and more of the existing script in ways that allowed them to be repeatable functions. I wanted to make it as simple as I could for each SSH command to execute how it needed to based on the client and server options, similar to the example with status messages I explained previously. This would go on to serve me well later with future code changes.

    Getting Smarter

    Version 2.0 was a real big turning point for Gravity Sync. As I was going along breaking more and more of the script up into functions, becoming more modular, I figured out how I could use those modules differently. Up until now it was pull or push. With 2.0, I figured out how I could decided which files had been changed, and then send them the way they needed to be without any user intervention.

    It was around this time where the popularity of the script really started to take off. With more attention came a more critical eye. Someone on Reddit pointed out that my method of simply ripping the running database away in a copy could lead to corruption and that the right way to do this was by doing a SQLITE3 based backup of the database.

    I started by building out a backup routine with this new method, and incorporating that into the automation script. Once I was satisfied with a few installs, I started working on a way to use this new method as the actual replication method.

    Containers

    Originally Gravity Sync had no support for containerized deployments of Pi-hole, because up until last fall I really did nothing with containers. This was a feature request from almost the beginning, and one that I ignored as I wasn’t sure how to properly support it. As I started playing around with them more I realized this would be both easier to support than I anticipated and also really useful for me personally.

    Documentation

    It was pretty early on when someone pointed out that I didn’t have any changelog or proper release notes for Gravity Sync. I’m glad they did. Looking back on the last year it’s nice to see all the work laid out, and actually made writing this retrospective a lot easier.

    Tuesday April 13, 2021
  • Gravity Sync

    Thursday February 11, 2021
  • Gravity Sync

    I just got on the Pi-hole bandwagon a few weeks ago, and boy do I love it. Really, who doesn't love DNS? And what is better than a Pi-hole? Two Pi-hole!

    With the release of Pi-hole 5.0, I wanted to rig up a quick and dirty way to accomplish keeping my Pi-hole HA instances in-sync, but it has quickly esclated to more than just dirty and has now become a little more elaborate.

    Originally I posted the installation documentation on this blog, but as it gained more brain time, I have moved those over to the README file of the GitHub repo where the script now lives.

    vmstan/gravity-sync
    What started as a quick and dirty way to accomplish keeping multiple Pihole 5.0 instances in-sync, has now become a little more elaborate. - vmstan/gravity-sync

    The script assumes you have one "primary" PH as the place you make all your configuration changes through the Web UI, doing things such as; manual whitelisting, adding blocklists, device/group management, and other list settings. The script will pull the configuration of the primary PH to the secondary.

    After the script executes it will copy the gravity.db from the master to any secondary nodes you configure it to run on. In theory they should be “exact” replicas every 30 minutes (default timing of the cronjob).

    If you ever make any blocklist changes to the secondary Pihole it’ll just overwrite them when the syncronization script kicks off. However, it should not overwrite any settings specific to the configuration of the secondary Pihole such as upstream resolvers, networking, query log, admin passwords, etc. Only the "Gravity" settings that FTLDNS (Pihole) uses to determine what to block and what to allow are carried over.

    What a successful execution of the script (version 1.2.3) will look like

    Generally speaking I don't forsee any issues with this unless your master Pihole is down for an extended period of time, in which you can specify that you'd like to "push" the configuration from the secondary back over to the primary node. Only do this if you make list changes during a failed over and want them back in production.

    Disclaimer: I've tested this on two Piholes running 5.0 GA in HA configuration, not as DHCP servers, on my network. Your mileage may vary. It shouldn't do anything nasty but if it blows up your Pihole, sorry.

    The actual method of overwriting is what the Pihole developers have suggested doing over at /r/pihole, and apparently is safe 🤞 It might be a little more aggressive than it needs to be about running every 30 minutes (defined by the crontab setting) but I figure the way I have mine setup the second one isn’t really doing anything other than watching for the HA address to failover, so it shouldn’t disrupt users during the reload. Plus, the database itself isn't that big, and according to the Pihole team the database file isn’t locked unless you’re making a change to it on the master (it’s just reading) so there shouldn’t be any disruption to the primary to make a remote copy.

    I want to note that the intial release (1.0) had no error handling or real logic other than executing exactly what it's told to do. If you set it up exactly it'll just work.

    I've since posted 1.1 and higher with some additional arguments and features, if you deployed the script previously I suggest upgrading and adjusting your crontab to include the "pull" argument.

    I've also moved the script to GitHub, which should allow you to keep an updated copy on your system more easily. The script can even update itself if you set it up for that.

    Enjoy!

    Wednesday May 20, 2020
  • Interesting Enough

    Greg Morris writes:

    … I do suffer quite a lot with imposter syndrome. The great thing is, I have learnt not to check my blog stats, I’m not bothered about podcast downloads and I sure as hell don’t care how many people follow me on social media.

    I too, have mastered the art of not checking blog stats, in part by not collecting them at all.

    Yet every time I do stumble over the figures, I am always surprised because I don’t think I am interesting enough. … When I listen to other people on podcasts, and read others writing, they seem infinity more interesting than I think I am. With more to say on topics that I find really interesting. Does everyone feel like this?

    Yes.

    Sunday February 16, 2020
  • Universal TAM

    This video from Atlassian was shared internally at VMware a couple of weeks ago, and my initial comment was that minus the few references to their company specifically, this video was a great representation of the role of Technical Account Managers, generally.

    I was apparently not the only one who thought this, a little while later a posting appeared from their corporate account on LinkedIn with positive comments from representatives of:

    • VMware
    • Zscaler
    • LivePerson
    • Adobe
    • Five9
    • Citrix
    • Nutanix
    • Symantec
    • Microsoft

    Plus a half dozen or so other places I'd never even heard of. If you can get that many representatives of different places to agree you're probably onto something.

    Kudos to Atlassian.

    Saturday February 15, 2020
  • Project Nautilus

    Introducing VMware Project Nautilus:

    Project Nautilus brings OCI container support to Fusion, enabling users to build, run and test apps for nearly any OS or cloud right from the comfort of your own Mac.

    With Project Nautilus, Fusion now has the ability to run Containers as well as VMs. Pull images from a remote repository like Docker Hub or Harbor, and run them in specially isolated 'Pod VMs'.

    This is built into the latest Tech Preview of VMware Fusion, which we've changed how we're releasing.

    As Mike Roy explains in New Decade, New Approach to “Beta”:

    This year, in an ongoing way, we’ll be releasing multiple updates to our Tech Preview branches, similar to how we update things in the main generally available branch. The first release is available now, and we’re calling it ’20H1′.

    We’re also moving our documentation and other things over to GitHub. We’ll be continuing to add more to the org and repos there, maintain and curate it, as well as host code and code examples that we are able to open source.

    I’ve already been playing with Project Nautilus, and It’s pretty slick. I had an nginx server up in a couple minutes after installing, even pulling the image down from the Docker Hub. Being able to spin up container workloads right on macOS, along side Fusion virtual machines, without the Docker runtime installed.

    You can even run VMware Photon OS, as a container inside the PodVM.

    Project Nautilus should eventually make it's way into VMware Workstation, but is not currently available.

    You should also able to do the same thing on ESXi later this year with Project Pacific.

    Tuesday January 21, 2020
  • Resume Deletions

    There are some things that just aren’t worth putting on your resume. This was the reminder that came to mind during replies to Owen Williams on the tweet machine.

    For a very short time I worked for a small family business that sold woodworking tools. Everything from glue and chisels to large computer controlled "put wood in this side and get a cabinet out the other side" machines. I was recommended to the position by a friend who was leaving to work for an ISP. The job I had at the time was part-IT/part-retail for a small grocery store chain, and I wanted to go all in on IT.

    But on what I remember to be my first (or maybe second) day I was asked by the President of the company to disable the accounts of two of his brothers who were VPs (the four boys ran it) — A few hours later one of them shows up at my desk trying to figure out why his email is locked, and doesn’t have a clue who I am.

    This guy looked like he killed wild animals barehanded for fun, and at maybe 21 years old I’m a much scrawnier version of my current self at maybe 160lbs. What a joy it was to tell him to go talk to his brother and then have him return and demand that I reactivate it.

    I left a couple of months later once I found something that was slightly more stable. The company is no longer in business.

    Monday January 13, 2020
  • Veeam Team

    The first time I used Veeam's backup software was in 2010. Up to that point I'd had experience with Symantec Backup Exec, Microsoft Data Protection Manager, and Commvault Simpana. The first time I used VBR to backup my vSphere infrastructure it was like the proverbial iced water to a man in hell.

    As a consultant I'd deployed VBR for customers more times than I can count. Bringing iced water to the hot masses.

    Today's news has me worried for their future:

    Insight Partners is acquiring Veeam Software Group GmbH, a cloud-focused backup and disaster recovery services provider, in a deal valued at about $5 billion—one of the largest ever for the firm.

    Veeam—first backed by Insight in 2013 with a minority investment—will move its headquarters to the U.S. from Baar, Switzerland, as a result of the acquisition. The deal is intended to help increase the company’s share of the American market.

    Hopefully my worry is for nothing, but Insight Partners is a private equity firm. What does that mean, exactly, remains to be seen. But generally speaking:

    • It restructures the acquired firm and attempts to resell at a higher value.
    • Private equity makes extensive use of debt financing to purchase companies.

    Also, as noted by Blocks & Files:

    Co-founders Andrei Baronov and Ratmir Timashev will step down from the board. Baranov and Timashev founded Veeam in 2006 and took in no outside funding until the Insight $500m injection in January 2019.

    I sincerely hope that I'm wrong in my gut reaction here, but wish the best of luck to all my friends at Veeam.

    Thursday January 9, 2020
  • Delicious Strategy

    I don't know who Peter Drucker is, but Matt's quote attributed to him, is sound:

    Apparently Peter is a kind of a big deal, at least according to Wikipedia:

    Peter Ferdinand Drucker (/ˈdrʌkər/; German: [ˈdʀʊkɐ]; November 19, 1909 – November 11, 2005) was an Austrian-born American management consultant, educator, and author, whose writings contributed to the philosophical and practical foundations of the modern business corporation. He was also a leader in the development of management education, he invented the concept known as management by objectives and self-control, and he has been described as "the founder of modern management".

    Even the best ideas will fall flat if the culture of the orginization refuses to adapt to service them. As I said last week:

    The trick, I suppose, is knowing how much of the old ideas and processes are actually still required and why. ... In order to do that you need to understand more than just the business and the technical requirements. ... You have to understand the culture in which it will operate.

    Idea: move everything to the cloud!
    Culture: we must control every aspect of the infrastructure.

    🤔

    Tuesday January 7, 2020
  • Skip Day

    It turns out that finding something to write about every day is really hard. Shocking, I know. You may have noticed (or, maybe not) that January 1-4 there was a new post here every day. I skipped yesterday, but I blame my participation with this tweet from Jehad.

    Not really, I knew I wasn’t going to keep up posting every day. I had a lot of free time on my hands, especially after New Years Day. Today was the first day back to work after being off since December 20. The first half of this time was spent participating in, and in preparation of, the various holiday celebrations our family was invited to.

    Not having work things rolling around in my brain, having ample downtime, gives me a chance to reflect on life. Which in turn prompted me to write them down. Lucky you. Going forward I hope to get at least a couple of posts done every week, for my benefit if anything. Three would probably be a stretch goal.

    On Privilege

    I take this time period off every year, or at least try to. When I worked for the university starting, in 2006, we just had this time period off as the campus was completed closed. Students didn't come back until around MLK Day, so even after returning to campus it was eerily quite, but gave us a couple weeks to catch up and finish any small projects and prepare for the spring semester.

    Even the VAR that I worked for, it was expected that only a skeleton crew would be staffing the company the week of Christmas, and it was built into our company PTO schedule that we be off week. It sort of set a trend that with the exception of a couple years before my children were born, I’ve tried to keep.

    I realize that I’m in a very fortunate position because of the type of work that I do, who I’ve worked for, and especially who I currently work for, that I’m not someone working on Christmas Eve, and rushing back to the office on December 26. The same thing on Thanksgiving.

    I’m incredibly privileged, even living and working among “classically privileged” individuals. Hearing friends and family over the holiday struggle with things like managing vacation days, lack of maternity leave, losing benefits, pay issues, etc, I often bite my tongue and don’t allow myself to reiterate how generous VMware is in many of these areas, for fear of being seen as a braggart.

    Sometimes I even check myself when it comes to internal conversations about these topics, and remind myself that even the most generous and well intentioned efforts are usually faulted when you’re forced to deal with the US medical system.

    My aunt did ask me on Christmas Day if I had to use PTO in order to be off for so long, and I was forced to explain that VMware doesn’t track PTO time. Also, that my manager doesn’t have intimate knowledge of my daily or weekly routine.

    All of this combined usually blows people’s mind, but I try to stay grounded about it, while pretending it will last forever.

    Monday January 6, 2020
  • Outlook Overload

    For the last couple weeks I’ve been confused why Microsoft Outlook on my Mac would start consuming over 100% CPU while sitting idle, spinning up my fans, and generating a bunch of disk write activity.

    At first I assumed it was because I am running the Fast Ring in order to run the new macOS design. However, the same build on my wife’s Mac, also running Catalina, never came anywhere near that even during what could be described only as “aggressive emailing.”

    After messing around with adding and deleting accounts, hoping another beta update would fix it ... I finally got the idea to just drag Outlook to the Trash, and let Hazel detect this and offer to dump all of the associated files (cache, settings, etc) with it.

    After I put Outlook back in Applications, and effectively set it up as new, everything is back to normal. 0.4% CPU

    Saturday January 4, 2020
  • Nonsense Awareness

    Jessica Joy Kerr in her blog post titled “Open your eyes to the nonsense” has a great anecdote from a friend, about the software development process at a utility company:

    “We make power, not sense.”

    But she goes on to make a wider point about evaluating the existing culture and processes of institutions.

    Culture doesn’t make sense, to anyone from outside. Culture is common sense, to anyone embedded in it. To understand and work with a large organization, let go of trying to make sense of it. Observe it and see what’s there. After that, logic might help in finding ways to work skillfully inside it, maybe even to change it.

    This applies to organizations of any size and in every industry, although the nonsense obviously increases in complexity as they scale, as all things do.

    Far too often people expect consultants or an "expert" to come in and tell them  how to make things perfect. In the past I'd only work with customers in a very narrow window, typically to implement one or maybe a handful of technologies. Best case I'd get repeat customers and learn more about their business requirements, and how they operate as an organization and how they make decisions. But more often than not I'd offer recommendations based on general experience and hope that there aren’t unforeseen consequences in the environment.

    I’m also more interested in seeing what everyone in the room has to say about their requirements before I offer what could be an otherwise ignorant or unconsidered opinion. It's not that I'm necessarly afraid of being wrong, but different people from different backgrounds in different departments with different goals often see things... differently.

    One of the things I particularly enjoy about my role as a VMware TAM, is that I get the runway to have these conversations and collect information with the customer, to advise over the long term as opposed to some two day "best practice review" and then run off to the next project.

    Once in a while, companies can adopt technologies in greenfield environments where you can take everything about the vendor’s best practice and apply them. But more often than not, you have to find a way of blending the two. The trick, I suppose, is knowing how much of the old ideas and processes are actually still required and why. You rarely get that in an hourlong conversation. In order to do that you need to understand more than just the business and the technical requirements.

    You have to understand the culture in which it will operate.

    Friday January 3, 2020