From ashmansk@hep.uchicago.edu Fri Mar 16 23:55:58 2001
Date: Mon, 11 Dec 2000 13:53:37 -0600 (CST)
From: Bill Ashmanskas <ashmansk@hep.uchicago.edu>
To: Mark Lancaster <markl@hep.ucl.ac.uk>
Cc: Marjorie Shapiro <Mdshapiro@lbl.gov>, Jim Amundson <amundson@fnal.gov>,
     wolbers@fnal.gov, watts@physics.rutgers.edu, r.stdenis@physics.gla.ac.uk,
     stefano.belforte@ts.infn.it, rharris@fnal.gov, ksmcf@fnal.gov,
     sexton@fnal.gov, Rob Snihur <snihur@fnal.gov>
Subject: Re: Offsite Database Export Review

On Sun, 10 Dec 2000, Mark Lancaster wrote:

Thanks again for filling me in.  I have several follow-ups, one or two of
which are I think pertinent to database export and the rest of which are
related more generally to DB, not DB exporting, and thus you are well
within your rights just to "pass."

Your COTWPO table seems to have a row per version per wire, so reading one
complete alignment requires reading O(nwires) small rows, rather than one
big row.  Is it safe for me to assume that this comes with no particular
cost?  (I think probably the benchmark I quoted--the read times for the
complete set of CTC constants, stored in whatever way you plan to store
the COT constants--would answer this.)  Again you are free to dismiss this
question as, "not pertinent for this week's review."

> For now I am not convined we should get hung up on the details of access
> times and storage overheads - with some optimisation these will all be
> acceptable. We should rather I think concentrate on how easy it will be to
> administer the exports. 

I am still curious to see the results of a simple extension of the
performance test that you already did, which is to write 1000 rows in run
order, then read back those rows in random order, for both database
candidates.  I consider this semi-pertinent, as a comparison of the two
databases' performance.

On my two points "first access vs subsequent access" and "frequently used
vs infrequently used" constants, I think that maybe twice I managed not to
ask the question that I really wanted to ask (though the answers were
useful anyway).  I'm wondering if some of the constants, such as SVX
pedestals, will be deemed by consensus to be so infrequently needed by
remote users (not necessarily node by node), that they would not be
exported, and then one could fall back to some kind of direct socket-based
access to the FNAL database.  I'm not so concerned about having to wait
until the next day to read constants from FNAL.  If the pure-network
fall-back mechanism exists for a subset of the DB, though, then maybe a
process-by-process switch could be thrown (getenv, talk-to, etc.),
allowing one to function with no exported database.  I didn't necessarily
mean that one wants to cook up a separate export list for each remote
instititution; that sounds like a big hassle for FNAL-based
administrators.  I'm wondering if direct TCP/IP socket access to some FNAL
DB server is a straightforward upgrade/downgrade of the current plan,
since it seems like a really simple fallback option for new nodes,
temporary nodes, transient export failures, and who knows what other
unforseen scenarios.

> True CID does map onto comp/attr/run/version and can then via SET_RUN_MAPS
> and USED_SETS be mapped to a higher level identifier e.g. beam-fix-hack-1.

Great, this sounds like exactly the solution we need for the worst of the
Run I DB's deficiencies, the what-version-to-use problem.  I still have no
concept, though, of how one specifies, for a given job, whether or not to
use "beam-fix-hack-1."  It would seem nicer to do it by talking to a
central DB manager, rather than requiring each piece of code that accesses
the DB to have a talk-to parameter allowing one to select the set of
constants to be used.  But this is not at all pertinent to tomorrow's
review.

> The only space/compressions/access issues are with SVXPED - I think with
> everything else you buy enough disk and don't worry about it - it will not
> effect your analysis. Clearly there is some padding with storing data in
> tables (key, index overhead) compared with a binary C-struct file. We get
> some padding at the expense of increased funtionality and a cure for run-1
> deficiencies. 

I more or less accept this as a general argument, but I won't stop
worrying about it until I see the actual numbers.  If I am ever a reviewer
for general database issues, I will want a quantitative performance (time
and space) analysis.  But I probably have no right to expect an answer to
this question in the context of this meeting (unless there is a gross
difference in time/space performance between the two remote database
technologies)--again, just curious.

Thanks again for all the detailed information.

-Bill