Louise -

Below is the report on yesterday's DealBench issues.  Please let me know if 
you would like further information.

- Dan

 -----Original Message-----
From:  Elrod, Hal  
Sent: Friday, February 02, 2001 2:53 PM
To: Bruce, Dan
Cc: Hillier, Bob; Spitz, John
Subject: DealBench 2/2 auction failure cause and solution
Importance: High


Event Sequence
? Lot 1 of two Sabre auctions went through without issues
? Between Lots, there was a scheduled 15 minute interval.  About 10 minutes 
into this period, developers monitoring the system saw errors showing up on 
the View Deal page.  Within minutes, users reported the same problem 
reentering the Sabre deal.  The rest of the site was unaffected
? Within five minutes, the problem was isolated to the document management 
subsystem of DealBench.  Solution Attempts:
? Restart the process that returns documents
? Move the documents off the deal
Solution:
? Create a second deal, same structure as the first, but without the 
documents.
? The auction on the second Lot started approximately 40 minutes late.  Once 
started, it continued without further errors through several auction 
extensions, finally resulting in a $500K savings for Sabre.

Cause
? Document management process stopped returning documents and was instead 
returning null values.  This caused the View Deal page to appear "hung". The 
Document to be served was the "deal agent logo", a special document type that 
is displayed at the top of the page.

Contributing Factors
? The run.bat script, which launches the document management Java process, 
was not logging its status
? The SQL statement that retrieves the deal agent logo did not correctly 
handle the special case of a document file being moved.  Therefore, moving 
the deal agent logo to another deal did not solve the error.   
? run.bat had never failed previously, and the process had been running 
consistently for over 60 days.  Therefore it was not initially suspected as a 
problem source.

Fixes
? An immediate stop-work on further enhancements to DealBench until system 
stability has been addressed
? Isolate document management process in the test environment, and thoroughly 
test all possible failure scenarios.
? Start logging and tracking of run.bat script and document management process
? Make document management process fully redundant, so it will failover in 
the event of error
? Increase monitoring on new log files
? Initiate an in-depth architectural review with Kevin Montagne's team, to 
identify further areas to improve system robustness