From ashmansk@hep.uchicago.edu Fri Mar 16 23:55:31 2001 Date: Sat, 9 Dec 2000 16:28:56 -0600 (CST) From: Bill Ashmanskas To: Mark Lancaster Cc: Marjorie Shapiro , Jim Amundson , wolbers@fnal.gov, watts@physics.rutgers.ed, r.stdenis@physics.gla.ac.uk, stefano.belforte@ts.infn.it, rharris@fnal.gov, ksmcf@fnal.gov, sexton@fnal.gov, Rob Snihur Subject: Re: Offsite Database Export Review Thanks a lot for the information and the performance numbers. 17 msec per row access is not bad--within a factor of 2-3 of typical disk seek time. Are the results the same if the order of the run numbers for the reads is randomized? What I meant about "first access" vs. "subsequent access" was not within a single A_C session but between sessions, i.e. I was referring to the fact that in Run I, updates of the remote databases were fetched on demand, one component/attribute/run at a time, so the second time one used a given component/attribute/run was much faster than the first. It sounds as if the paradigm here is completely different. Note that I am still (even after reading Mark's note) assuming that for the calibration database, there will still exist a primary key that is isomorphic to "component/attribute/runnumber/versionnumber" except that maybe "versionnumber" becomes a timestamp and/or some identifier like "Alan Sill BPO fix 2002-02-03". If this is not true, then at least one member of the committee will need to be (very briefly) educated a bit on the changes in the database itself, independent of the export mechanism, between run 1 and run 2. I always thought of the DB as a map from "component/attribute/runnumber/versionnumber" to a small, structured collection of mostly integers and floats (stored, as it happened, as a ybos bank), and that its three biggest deficiencies were (a) that versionnumber was typically not useful (an identifier would be better), (b) that there was no adequate mechanism for choosing whether or not to use a given update (e.g. new dE/dx frm Barry, which typically spanned all runs for a given comp/attr), and (c) that the true high-level structure of the data was not maintained in any formal, machine-parsable way (same weakness held for event data). None of this has anything to do with exporting databases, but I wanted to let you know where I am, so that you can decide whether or not it's worth your while to bring me up to date on just what it is we're now talking about exporting. Has anyone given any thought to a model in which a remote institution says that it wants such and such subset (which may be the null set if network access is good) of the FNAL DB pushed automatically, and that accesses to other constants fail over to some sort of slow network connection to a Fermilab database? So then if once in a blue moon I want to read in the SVX pedestals, I can do so without storing lots of seldom-used data in the local DB. How realistic have the tests of the DB itself and of the exporting been to date? I would be curious to see some numbers for storage space and access time for all seven attributes of the CTC data, most recent version only, for run 0 through 75000. I say this because I know that it is physically possible to store all of this information losslessly and without any compression in 3 megabytes, and I could generate performance numbers if needed, which I suspect would be comparable to disk seek time. I expect the COT run-by-run data to be smaller than the CTC's for stage0-derived constants (simpler drift model planned), but TDC calibrations to be larger by 30240/6156, or about x5. Beam position should be identical. Everything else was updated very infrequently, and thus only matters if one-time connection speed is dramatically slow. So I think storing and retrieving the CTC data verbatim is not such a bad exercise. Has a better exercise been done? (Of course the silicon pedestals are what scare the hell out of me, but I am hoping that they will be irrelevant except in production and possibly online initialization, hence moot for this discussion.) Thanks again, and see you Tuesday. -Bill On Sat, 9 Dec 2000, Mark Lancaster wrote: > > Hello Bill et al, > Apologies for the printing problems with 5252 - it's a Star Office > document - so I have put the native, html, text and, God forbid, a .doc > file of the note on > www.hep.ucl.ac.uk/~markl/dbexport.html > > FREEWARE ANSWERS > ---------------- > > > > * What is the ($) cost per machine or per institution (e.g. licensing)? > > > Zero for mSQL > > > * What computing resources will each institution need to allocate? > Disk - but probably less than the ORACLE solution since mSQL does not have > all the same logging/rollback that ORACLE has; which takes space. > Linux/SGI/Sun PC - to act as an msql server - can be low spec. Overheads > are small. > > > * How much effort will be needed by local experts or system administrators > > to keep each system running? What about Fermilab experts/administrators? > > > The only things to be monitored are that the process msql2d is running > locally and the disk usage. This can be done via a script which can be run > remotely. The success/failure of exports is logged centrally at FNAL. I do > not see the overhead being more than the ORACLE case (if anything I > suspect it is less). > > > * Will either solution make unusual demands on file systems, e.g. tens of > > thousands of 100-byte files? > > > No - e.g. msql stores all the calibration database schema in 400 files. > Each table has five associated files which store the data, indices, keys > etc. > > > * Will there be periodic updates or maintenance, and if so, what is the > > computing hardware or computing time burden imposed (e.g. N hours of CPU > > and M gigabytes of scratch space on such and such workstation, daily)? > > > > Only update would be to a new version of msql if msql were found not to > be performant enough. Performance is just on read-speeds. > I did this test this morning on fcdfsgi2 - the results are somewhat > alarming. The code for the two tests was identical (see > fcdfgi2:~lmark/dbase/DB/DBUtils/timing). > > I inserted one row at a time of a table of 6 columns 1000 times > and then read-back one row at a time 1000 times e.g. the type of thing you > might do at BeginJob to load values for 1000 runs or what you would do > each BeginRun. > > Insert Times : mSQL = 5 sec for 1000*1 rows > ORACLE = 569 sec for 1000*1 rows > ReadBack Times : msql = 17 sec for 1000*1 row > ORACLE = 539 sec for 1000*1 > > These results are difficult to believe since all other tests show mSQL to > be slightly slower than ORACLE. However there are three factors here : > > i) I did the test while cdfsoft2 nightly build was running on fcdgsgi2 > (the link stage which we know kills a machine) > ii) The oracle server was cdfondev which is a low-spec machine - however > it is not a typical of the type of a machine a remote institute would use > iii) the msql database is on fcdfsgi2 whereas the oracle one is in FCC > (so there is a network issue). > > So if nothing else this does show that ORACLE under certain circumstances > can be a comple dog. I actually believe there must be something wrong with > the oracle server, network or fcdgsi2 for this to happen. > Nevertheless it shows that to get performance out of ORACLE which you > undoubtedly can and why you pay your money then you need to > monitor/tune/understand your system both at the client and server level. > Msql being a light weight application does not need this. > > Anyway not to dwell on the ORACLE side - it shows that msql times are fine > I think for analysis jobs at remote sites. > > Answers to the rest I would just naively scale the above #s. > > > > * How quickly (in real time) will one be able to read the beamline (four > > real numbers, perhaps with covariance matrix) for 1000 runs, in each case, > > in an offline analysis program that does nothing but this (and is sparse > > in run numbers, as the W samples used to be)? > > > > * Same question, but somewhat more challenging, e.g. constants needed to > > re-run full COT tracking. > > > > * Same question, but even more challenging, e.g. constants required to > > re-run SVX clustering. > > > > * Will there be a performance difference between first access and > > subsequent access to the data for a given (apologies for old terminology) > > component/attribute/run-number? > > > As Rick said - data is memory cached by the DBManager code. The amount to > cache is configurable. > > > * Is this choice expected to affect compile/link time, executable image > > size, or virtual memory usage for remote applications that make DB > > accesses? Or is it some single process running behind the scenes, > > accessed through some common network-like API? > > > > * How robust is each solution if the network connection to Fermilab is > > (a) up but extremely slow or (b) down? > > > > The use of the connection is the scp of the zipped temporary msql DB. If > the network is down then the export attempt is logged as having failed and > the appropriate action can then be taken e.g. retry of alert > administrator. > > Cheers > Mark > > > >