You are currently browsing the daily archive for November 11th, 2008.

The Chinese colleagues in my organization usually go out every Monday to have lunch in a nearby Chinese restaurant. It’s a good time to share some common concerns about the economy, the stock market or the food security in China. Topics like work or technologies seldom emerge in the lunch discussion, unless somebody start the complaint, like what happened this Monday.

One of my colleagues complained that he spent hours trying to decipher the myth that some messages failed in processing with no reason in Production environment. Eventually, he found out the deploy team rolled out a new server last weekend, which was not configured correctly. The new server, participating in a cluster, grasped some of the messages and failed them. That explained why only some messages failed while other went through successfully.

But, why did it take him so long to figure it out? Two reasons, first of all, he wasn’t aware of the environment change (roll out of new server). Second, there weren’t enough logs to show which server processed the messages. He was browsing through the logs of all the known servers trying to find traces of the failure, with no luck. If there are some central logs or database records showing which server in the cluster processed the message, the error will be obvious.

Read the rest of this entry »

 

November 2008
M T W T F S S
« May   Dec »
 12
3456789
10111213141516
17181920212223
24252627282930

a

Blog Stats

  • 14,783 hits