Wednesday, December 12, 2007

Production system maintenance. Part 2

In the previous post I've described several organizational moment of the production system update. Now it is time for technical tips and tricks.

Name code equally on all system nodes
Usually, this is a good idea to name all things equally across systems you manage. It can be a real problem to have jboss4, jboss4.0.5 and jboss as names of the same piece of code on you cluster nodes. Use single and simple convention for code naming and placement.

Name resources differently
While single naming schema for code lowers amount of time spent on unproductive things, resource naming is much different. In mature environment code can be easily restored from several places like developer machines, continuous integration, staging servers, source repository & etc. Resources (i.e. data) are different. Information and schema of your database might be unique on some moments. Moreover, the data may represent great value for your company.

So damage to data should be avoided by any means. As the first line of defence name your databases differently depending on type of environment (production, testing, development & etc), contents and schema version. I usually use following db identifiers:

contents-timestamp_of_last_schema_update-database_type

For example: chirp-20071210-prod While looks quite complex, the given notation may protect you from actions done on a wrong data by mistake. Unfortunately, "Oh God, I've dropped wrong database" problem is not so rare I was expected firstly. ;)

Use consistent hostnames
Correctly set hostname is not an absolute requirement for server functioning. But right names may give you some help during exploitation. It is funny to have server called 'snoopy', if you have 3 total. When you have more, their names should be little more transparent and linked with their properties as the hosting data center and ip address.

Also server software need to know name of the server it runs on. That is why hostname and dns name must be the same.

Highlight your current context Most maintenance errors I have seen were "right command with in a wrong context". For example, I have dropped production db while were thinking it was staging environment. (This is why I make backup of all data on the production server before update now).

So put information about your current host, database, directory everywhere.
  • Put hostname, username and directory must present in the command line prompt.
  • Put hostname into the xterm title.
  • Put hostname and database name into mysql client prompt.
And be attentive.
Don't work as root
Actually, this is impossible. Just try to work as superuser as less as possible. root or Administrator can do many dangerous things. You know, one error and your root filesystem is empty. ;-)

Have a remotely controlled power switch
This is actually is not required for VPS. However, this may be essential for
dedicated servers. It is not a so rare task for system administrator to configure network remotely. And this may be very very very inconvenient to have a node with improperly configured network which can be managed by ssh only.

Yes, data center support team may press "Power" for you. But only if you pay for 24x7 support.

Use cluster management software
Not long ago, Debian package of the day published an article about ClusherSSH which I found very useful. There are several similar programs also. While relatively simple, these programs may easy your life and decrease number of errors.

No comments: