From stdenis@fnal.gov Fri Mar 16 23:55:45 2001 Date: Sat, 9 Dec 2000 08:54:44 -0600 (CST) From: Rick St. Denis Reply-To: r.stdenis@physics.gla.ac.uk To: Mark Lancaster Cc: Bill Ashmanskas , Marjorie Shapiro , Jim Amundson , wolbers@fnal.gov, watts@physics.rutgers.ed, r.stdenis@physics.gla.ac.uk, lmark@cdfsga.fnal.gov, stefano.belforte@ts.infn.it, rharris@fnal.gov, ksmcf@fnal.gov, sexton@fnal.gov, Rob Snihur Subject: Re: Offsite Database Export Review > Only update would be to a new version of msql if msql were found not to > be performant enough. Performance is just on read-speeds. > I did this test this morning on fcdfsgi2 - the results are somewhat > alarming. The code for the two tests was identical (see > fcdfgi2:~lmark/dbase/DB/DBUtils/timing). > > I inserted one row at a time of a table of 6 columns 1000 times > and then read-back one row at a time 1000 times e.g. the type of thing you > might do at BeginJob to load values for 1000 runs or what you would do > each BeginRun. > > Insert Times : mSQL = 5 sec for 1000*1 rows > ORACLE = 569 sec for 1000*1 rows > ReadBack Times : msql = 17 sec for 1000*1 row > ORACLE = 539 sec for 1000*1 > > These results are difficult to believe since all other tests show mSQL to > be slightly slower than ORACLE. However there are three factors here : > > i) I did the test while cdfsoft2 nightly build was running on fcdgsgi2 > (the link stage which we know kills a machine) > ii) The oracle server was cdfondev which is a low-spec machine - however > it is not a typical of the type of a machine a remote institute would use > iii) the msql database is on fcdfsgi2 whereas the oracle one is in FCC > (so there is a network issue). > > So if nothing else this does show that ORACLE under certain circumstances > can be a comple dog. I actually believe there must be something wrong with > the oracle server, network or fcdgsi2 for this to happen. > Nevertheless it shows that to get performance out of ORACLE which you > undoubtedly can and why you pay your money then you need to > monitor/tune/understand your system both at the client and server level. > Msql being a light weight application does not need this. > > Anyway not to dwell on the ORACLE side - it shows that msql times are fine > I think for analysis jobs at remote sites. > > Answers to the rest I would just naively scale the above #s. These are incredibly bad numbers, and if anything cdfondev is actually a higher spec machine than cdfonprd, both of which likely to be higher spec than a home institute. These numbers look like you were doing a lot of connects and disconnects and, as you measured before, this is about 1s each. so I would have expected 1000s and you got 539s. Is this what you did (before I speculate further?). Also, if you want to run this kind of test you do need to ensure that there is no backup going on at the time on the oracle database machine (in this case, the int or dev instance). Finally, I think that the right machine to test on is our linux box, ncdf16, a lower-end pc (300 MHz I think) with 40G. Also, if we have to modify the front end to reduce connects, then we should be careful to test it that way and note the extra software development/licensing issues involved. On the linux machine we have loaded 60% of the run2 estimated data file catalog size in simulation. At that point the box came to a grinding hault. We are investigating why this is. So another test to make is just that: to take one run2's worth of data and load the boxes to see how they perform on a typical run. Otherwise we face nasty surprises. cheers rick