
Huge thank you to Jared, Virendra and others who helped with the transition on Outages! And thanks as well for shepherding the list for so many years. No doubt a lot went on behind the scenes to keep it running. I am a little curious if we could have some sort of post-mortem on what happened with the failure a month ago. I’m sure many of us manage mailman lists and mailman installs. If there is something I can learn from what Jared went through to keep my lists running smoothly, I’m sure it would be helpful to us all. Again, thank you, Jared, for everything you’ve done over the years with the list. Joseph Anders joe@inet-ops.com

I second the curiosity :) - small scale mailman administrator here, but really curious to see which kind of event can make a mailman server unrecoverable (bar a data loss). ---- On Thu, 15 May 2025 19:56:39 +0100 Joseph Anders via Outages-discussion <outages-discussion@outages.org> wrote ---
Huge thank you to Jared, Virendra and others who helped with the transition on Outages!
And thanks as well for shepherding the list for so many years. No doubt a lot went on behind the scenes to keep it running.
I am a little curious if we could have some sort of post-mortem on what happened with the failure a month ago. I’m sure many of us manage mailman lists and mailman installs. If there is something I can learn from what Jared went through to keep my lists running smoothly, I’m sure it would be helpful to us all.
Again, thank you, Jared, for everything you’ve done over the years with the list.
Joseph Anders joe@inet-ops.com
______________________________________________ Outages-discussion mailing list outages-discussion@outages.org Sign up for an account https://lists.outages.org/accounts/signup/ To subscribe send an email to outages-discussion-join@outages.org To unsubscribe send an email to outages-discussion-leave@outages.org To contact the list owners outages-owner@outages.org Archives https://lists.outages.org/archives/list/outages-discussion@outages.org/
Thank you for using outages-discussion Lists!
--- www: grg.pw email: me@grg.pw mobile: +44 7716 604314 / +39 393 1049073

On Thu, May 15, 2025 at 01:56:39PM -0500, Joseph Anders via Outages-discussion wrote:
I am a little curious if we could have some sort of post-mortem on what happened with the failure a month ago. I???m sure many of us manage mailman lists and mailman installs. If there is something I can learn from what Jared went through to keep my lists running smoothly, I???m sure it would be helpful to us all.
This isn't a post-mortem on what happened in this particular case, but it's an attempt to share what I've done to manage the Mailman instances I've been running for the past ~20 years (3 dozen public/private lists, 6 domains at the moment), which is similar to what I did with majordomo before that, which is similar... ;) These all apply to Mailman 2.X, but I suspect that most/all of them will work with 3.X as well. 1. All of Mailman's per-list configuration files are under configuration control -- RCS in this case. Yeah, it's old, but it's simple, fast, and works beautifully for this use case. All of the MTA configuration files associated with Mailman (e.g., aliases, virtusertable) are built with make(1), sed(1), and assorted other utilities. This has the advantage that if something is wrong, there's a good chance it's consistently wrong everywhere in multiple files across all MX's. That makes diagnosis a lot easier. 2. I dump out the subscribers for each mailing list every day (using Mailman's "list_lists" and "list_members" in a shell script) and keep track of changes in a simple set of text files (also under RCS). This gives me a useful way to see all subscriptions/unsubscriptions across all mailing lists in one place, and it gives me a backup list of all current subscribers, updated daily. 3. I use dump(8) daily to back up the filesystem Mailman lives in. I use a a modified Towers of Hanoi sequence that includes once-a-month level 0 dumps, and I retain those for at least a year. 4. I rsync(1) the entire Mailman directory tree to a local failover server every day. 5. I rsync most of the Mailman directory tree to another backup server. This includes everything but the web-facing archives, because those can be rebuilt from the relevant mbox files. To make that latter process easier, I've scripted that rebuild process on a per-list basis -- so that it's relatively painless to wipe a web-facing archive and rebuild it. [ I've had to do that when migrating mailing lists, e.g. from example.com to lists.example.com. It turns out to be easier to just rebuild the web-facing archives than to rewrite all the URLs and links with scripts. ] 6. About once a month, I run the mailman-subscribers.py script over all mailing lists and stash the output in a directory. That script produces a text file listing all subscribers and all their subscription settings.) I keep those directories (pretty much) forever. Note 1: Some of these are in part a defense against corruption/loss of the "pickle" files that I've never liked. IMNSHO, any configuration file for anything that can be plain text *should be* plain text, and any rationale for making it otherwise should be justified by a mountain of supporting evidence. And then it should probably still be plain text. Note 2: Everything I've mentioned is built on standard Unix/Linux tools such as dump(8), rsync(1), sh(1), make(1), and cron(8), so it's neither complex nor heavyweight. I think it's well within the reach of journeyman Unix/Linux sysadmins. Note 3: You've no doubt observed that this methodology has lots of redundancy: that's intentional. (a) This is not my first day on the job and (b) I've seen things. ---rsk
participants (3)
-
Giorgio Bonfiglio
-
Joseph Anders
-
Rich Kulawiec