[WF-Infra] What next

anubis  tonopah at hotmail.com
Wed Jun 13 21:00:23 PDT 2001

Thanks to everyone who responded so well to the temporary loss of email.  
It's encouraging to see how quickly you all came up with a workable 

In hopes of lowering our risks for future problems, what steps do we need to 
take to provide ourselves with a robust and responsive infrastructure set 

I believe the first step was to get victor to provide a seamless backup mail 
service in the event of an outage with the primary server.  Is this in 
place?  If not, what do we need to do to get there?

We have a good thing going with seul and web hosting.  What fallback do we 
have if there is a problem with their connection, however?  While I have a 
partial mirror of the www contents, it would not be enough to recover the 
site, or even save most of the information it contains.  I expect we need to 
make occasional server-side backups, and ship them off the www server.  Do 
we have a backup wiki-enabled server that we could get up and running on 
short notice with the latest of these backups?  Obviously, we have the zope 
site nearing readiness, so we should also develop plans to cover that 
server's disappearance as well.

CVS is another critical service, which currently is running on one machine.  
Should we set up a secondary cvs server that contains the full cvs tree and 
history?  I know there are 100 copies of the latest or at least a recent 
checkout, but that won't be enough to fully restore the service, if I recall 
previous discussions correctly.

FTP is currently in good shape, as far as I can see.  We have a nice 
mirroring system that gives us lots of redunancy, yet keeps everything in 
synch.  Please let me know if I'm being naive, though.

We already have a couple of irc servers.  Is that now a sufficiently 
redundant service that we can consider it 'done'?

Brenda (irc logging) is a useful feature.  Currently we have brenda and the 
logs on one server.  How can we set up a backup brenda that would not 
conflict with the primary (ie generate redunadant logs) but would also not 
lead to large gaps in logging services?

DNS is another service that we need to look at.  In some ways its the most 
fundmental to infrastructure.  I  have more concerns about malicious efforts 
with dns than other services (someone could point all our names to a porn 
site, or oracle, or whatever), but I don't think that that concern absolves 
us of a need to make sure the system has some flexibility.  Up till now, 
we've relied on jack to handle all the changes.  We've been lucky that he 
hasn't been on vacation at an inoportune time, and has always worked quickly 
to help us resolve whatever troubles we've had.  I feel like we are tempting 
fate too much to assume that will always be the case, though.  Shit happens, 
and real life can force plans to change no matter how well intentioned.

So I propose that we come up with some method of distributing dns control.  
I hope that you will have some ideas on how to do this.  We can't be the 
first project that has similar needs.

Here are some of the issues as I see it.  Please add your concerns and 
suggestions as well:
1) We need to have one authoritive source of dns information at all times.
2) We need to be able to transfer the authority in a relatively short period 
of time (24-48 hrs?) for whatever reason.
3) We need to insure that authority cannot be hijacked.

I think we might also want to reconsider using granitecanyon for primary dns 
service.  It has occasionally taken many days for records to be updated, and 
we can do better.

Thanks to Demitar who helped enumerate what services we currently rely on, 
and start this discussion (or at least this round of it).

I look forward to your replies.

Get your FREE download of MSN Explorer at http://explorer.msn.com

More information about the Infra mailing list