From ashmansk@hep.uchicago.edu Fri Mar 16 23:55:31 2001
Date: Sat, 9 Dec 2000 16:28:56 -0600 (CST)
From: Bill Ashmanskas <ashmansk@hep.uchicago.edu>
To: Mark Lancaster <markl@hep.ucl.ac.uk>
Cc: Marjorie Shapiro <Mdshapiro@lbl.gov>, Jim Amundson <amundson@fnal.gov>,
     wolbers@fnal.gov, watts@physics.rutgers.ed, r.stdenis@physics.gla.ac.uk,
     stefano.belforte@ts.infn.it, rharris@fnal.gov, ksmcf@fnal.gov,
     sexton@fnal.gov, Rob Snihur <snihur@fnal.gov>
Subject: Re: Offsite Database Export Review

Thanks a lot for the information and the performance numbers.  17 msec per
row access is not bad--within a factor of 2-3 of typical disk seek
time.  Are the results the same if the order of the run numbers for the
reads is randomized?

What I meant about "first access" vs. "subsequent access" was not within a
single A_C session but between sessions, i.e. I was referring to the fact
that in Run I, updates of the remote databases were fetched on demand, one
component/attribute/run at a time, so the second time one used a given
component/attribute/run was much faster than the first.  It sounds as if
the paradigm here is completely different.

Note that I am still (even after reading Mark's note) assuming that for
the calibration database, there will still exist a primary key that is
isomorphic to "component/attribute/runnumber/versionnumber" except that
maybe "versionnumber" becomes a timestamp and/or some identifier like
"Alan Sill BPO fix 2002-02-03".  If this is not true, then at least one
member of the committee will need to be (very briefly) educated a bit on
the changes in the database itself, independent of the export mechanism,
between run 1 and run 2.  I always thought of the DB as a map from
"component/attribute/runnumber/versionnumber" to a small, structured
collection of mostly integers and floats (stored, as it happened, as a
ybos bank), and that its three biggest deficiencies were (a) that
versionnumber was typically not useful (an identifier would be better),
(b) that there was no adequate mechanism for choosing whether or not to
use a given update (e.g. new dE/dx frm Barry, which typically spanned all
runs for a given comp/attr), and (c) that the true high-level structure of
the data was not maintained in any formal, machine-parsable way (same
weakness held for event data).  None of this has anything to do with
exporting databases, but I wanted to let you know where I am, so that you
can decide whether or not it's worth your while to bring me up to date on
just what it is we're now talking about exporting.

Has anyone given any thought to a model in which a remote institution says
that it wants such and such subset (which may be the null set if network
access is good) of the FNAL DB pushed automatically, and that accesses to
other constants fail over to some sort of slow network connection to a
Fermilab database?  So then if once in a blue moon I want to read in the
SVX pedestals, I can do so without storing lots of seldom-used data in the
local DB.

How realistic have the tests of the DB itself and of the exporting been to
date?  I would be curious to see some numbers for storage space and access
time for all seven attributes of the CTC data, most recent version only,
for run 0 through 75000.  I say this because I know that it is physically
possible to store all of this information losslessly and without any
compression in 3 megabytes, and I could generate performance numbers if
needed, which I suspect would be comparable to disk seek time.  I expect
the COT run-by-run data to be smaller than the CTC's for stage0-derived
constants (simpler drift model planned), but TDC calibrations to be larger
by 30240/6156, or about x5.  Beam position should be identical.  
Everything else was updated very infrequently, and thus only matters if
one-time connection speed is dramatically slow.  So I think storing and
retrieving the CTC data verbatim is not such a bad exercise.  Has a better
exercise been done? (Of course the silicon pedestals are what scare the
hell out of me, but I am hoping that they will be irrelevant except in
production and possibly online initialization, hence moot for this
discussion.)

Thanks again, and see you Tuesday.

-Bill

On Sat, 9 Dec 2000, Mark Lancaster wrote:

> 
> Hello Bill et al,
>  Apologies for the printing problems with 5252 - it's a Star Office
> document - so I have put the native, html, text and, God forbid, a .doc
> file of the note on 
>   www.hep.ucl.ac.uk/~markl/dbexport.html
> 
> FREEWARE ANSWERS
> ----------------
> > 
> > * What is the ($) cost per machine or per institution (e.g. licensing)?
> >
> Zero for mSQL
>  
> > * What computing resources will each institution need to allocate?
> Disk - but probably less than the ORACLE solution since mSQL does not have
> all the same logging/rollback that ORACLE has; which takes space.
> Linux/SGI/Sun PC - to act as an msql server - can be low spec. Overheads
> are small.
> 
> > * How much effort will be needed by local experts or system administrators
> > to keep each system running?  What about Fermilab experts/administrators?
> > 
> The only things to be monitored are that the process msql2d is running
> locally and the disk usage. This can be done via a script which can be run
> remotely. The success/failure of exports is logged centrally at FNAL. I do
> not see the overhead being more than the ORACLE case (if anything I
> suspect it is less).
> 
> > * Will either solution make unusual demands on file systems, e.g. tens of
> > thousands of 100-byte files?
> > 
> No - e.g. msql stores all the calibration database schema in 400 files.
> Each table has five associated files which store the data, indices, keys
> etc.
> 
> > * Will there be periodic updates or maintenance, and if so, what is the
> > computing hardware or computing time burden imposed (e.g. N hours of CPU
> > and M gigabytes of scratch space on such and such workstation, daily)?
> > 
> 
> Only update would be to a new version of msql if msql were found not to
> be performant enough. Performance is just on read-speeds.
> I did this test this morning on fcdfsgi2 - the results are somewhat
> alarming. The code for the two tests was identical (see
> fcdfgi2:~lmark/dbase/DB/DBUtils/timing).
> 
> I inserted one row at a time of a table of 6 columns 1000 times
> and then read-back one row at a time 1000 times e.g. the type of thing you
> might do at BeginJob to load values for 1000 runs or what you would do
> each BeginRun.
> 
> Insert Times : mSQL = 5 sec for 1000*1 rows
>              ORACLE = 569 sec for 1000*1 rows
> ReadBack Times : msql = 17 sec for 1000*1 row
>              ORACLE = 539 sec for 1000*1
> 
> These results are difficult to believe since all other tests show mSQL to
> be slightly slower than ORACLE. However there are three factors here :
> 
> i) I did the test while cdfsoft2 nightly build was running on fcdgsgi2
>   (the link stage which we know kills a machine)
> ii) The oracle server was cdfondev which is a low-spec machine - however
> it is not a typical of the type of a machine a remote institute would use
> iii) the msql database is on fcdfsgi2 whereas the oracle one is in FCC
>   (so there is a network issue).
> 
> So if nothing else this does show that ORACLE under certain circumstances
> can be a comple dog. I actually believe there must be something wrong with
> the oracle server, network or fcdgsi2 for this to happen. 
> Nevertheless it shows that to get performance out of ORACLE which you
> undoubtedly can and why you pay your money then you need to
> monitor/tune/understand your system both at the client and server level.
> Msql being a light weight application does not need this.
> 
> Anyway not to dwell on the ORACLE side - it shows that msql times are fine
> I think for analysis jobs at remote sites.
> 
> Answers to the rest I would just naively scale the above #s.
> 
> 
> > * How quickly (in real time) will one be able to read the beamline (four
> > real numbers, perhaps with covariance matrix) for 1000 runs, in each case,
> > in an offline analysis program that does nothing but this (and is sparse
> > in run numbers, as the W samples used to be)?
> > 
> > * Same question, but somewhat more challenging, e.g. constants needed to
> > re-run full COT tracking.
> > 
> > * Same question, but even more challenging, e.g. constants required to
> > re-run SVX clustering.
> > 
> > * Will there be a performance difference between first access and
> > subsequent access to the data for a given (apologies for old terminology)
> > component/attribute/run-number?
> > 
> As Rick said - data is memory cached by the DBManager code. The amount to
> cache is configurable.
> 
> > * Is this choice expected to affect compile/link time, executable image
> > size, or virtual memory usage for remote applications that make DB
> > accesses?  Or is it some single process running behind the scenes,
> > accessed through some common network-like API?
> > 
> > * How robust is each solution if the network connection to Fermilab is
> > (a) up but extremely slow or (b) down?
> >
> 
> The use of the connection is the scp of the zipped temporary msql DB. If
> the network is down then the export attempt is logged as having failed and
> the appropriate action can then be taken e.g. retry of alert
> administrator.
> 
> Cheers
> 	Mark
> 
> 
> 
>