[WF-Infra] What next
tonopah at hotmail.com
Wed Jun 13 21:00:23 PDT 2001
Thanks to everyone who responded so well to the temporary loss of email.
It's encouraging to see how quickly you all came up with a workable
In hopes of lowering our risks for future problems, what steps do we need to
take to provide ourselves with a robust and responsive infrastructure set
I believe the first step was to get victor to provide a seamless backup mail
service in the event of an outage with the primary server. Is this in
place? If not, what do we need to do to get there?
We have a good thing going with seul and web hosting. What fallback do we
have if there is a problem with their connection, however? While I have a
partial mirror of the www contents, it would not be enough to recover the
site, or even save most of the information it contains. I expect we need to
make occasional server-side backups, and ship them off the www server. Do
we have a backup wiki-enabled server that we could get up and running on
short notice with the latest of these backups? Obviously, we have the zope
site nearing readiness, so we should also develop plans to cover that
server's disappearance as well.
CVS is another critical service, which currently is running on one machine.
Should we set up a secondary cvs server that contains the full cvs tree and
history? I know there are 100 copies of the latest or at least a recent
checkout, but that won't be enough to fully restore the service, if I recall
previous discussions correctly.
FTP is currently in good shape, as far as I can see. We have a nice
mirroring system that gives us lots of redunancy, yet keeps everything in
synch. Please let me know if I'm being naive, though.
We already have a couple of irc servers. Is that now a sufficiently
redundant service that we can consider it 'done'?
Brenda (irc logging) is a useful feature. Currently we have brenda and the
logs on one server. How can we set up a backup brenda that would not
conflict with the primary (ie generate redunadant logs) but would also not
lead to large gaps in logging services?
DNS is another service that we need to look at. In some ways its the most
fundmental to infrastructure. I have more concerns about malicious efforts
with dns than other services (someone could point all our names to a porn
site, or oracle, or whatever), but I don't think that that concern absolves
us of a need to make sure the system has some flexibility. Up till now,
we've relied on jack to handle all the changes. We've been lucky that he
hasn't been on vacation at an inoportune time, and has always worked quickly
to help us resolve whatever troubles we've had. I feel like we are tempting
fate too much to assume that will always be the case, though. Shit happens,
and real life can force plans to change no matter how well intentioned.
So I propose that we come up with some method of distributing dns control.
I hope that you will have some ideas on how to do this. We can't be the
first project that has similar needs.
Here are some of the issues as I see it. Please add your concerns and
suggestions as well:
1) We need to have one authoritive source of dns information at all times.
2) We need to be able to transfer the authority in a relatively short period
of time (24-48 hrs?) for whatever reason.
3) We need to insure that authority cannot be hijacked.
I think we might also want to reconsider using granitecanyon for primary dns
service. It has occasionally taken many days for records to be updated, and
we can do better.
Thanks to Demitar who helped enumerate what services we currently rely on,
and start this discussion (or at least this round of it).
I look forward to your replies.
Get your FREE download of MSN Explorer at http://explorer.msn.com
More information about the Infra