From stdenis@fnal.gov Fri Mar 16 23:53:40 2001
Date: Fri, 8 Dec 2000 07:47:43 -0600 (CST)
From: Rick St. Denis <stdenis@fnal.gov>
Reply-To: r.stdenis@physics.gla.ac.uk
To: Bill Ashmanskas <ashmansk@hep.uchicago.edu>
Cc: Marjorie Shapiro <Mdshapiro@lbl.gov>, Jim Amundson <amundson@fnal.gov>,
     Stephen Wolbers <wolbers@fnal.gov>, watts@physics.rutgers.ed,
     Mark Lancaster <lmark@cdfsga.fnal.gov>, stefano.belforte@ts.infn.it,
     rharris@fnal.gov, ksmcf@fnal.gov, john weigand <weigand@fnal.gov>,
     sexton@fnal.gov
Subject: Re: Offsite Database Export Review

hi bill
  let me try to answer some of your questions on the Oracle side where I
have been looking at the question. Mark will have to comment on the msql.
rick
*****************************
Dr. Richard St. Denis,Dept. of Phys.& Astr., Glasgow University
Glasgow G12 8QQ; United Kingdom; UK Phone:   [44] (141) 330 5887
UK Fax  :   [44] (141) 330 5881
======================================
FermiLab PO Box 500; MS 318 Batavia, Illinois 60510  USA
FNAL Phone: [00] 1-630-840-2943 FNAL Fax:   [00] 1-630-840-2968
Sidet:      [00] 1-630-840-8630 FCC:        [00] 1-630-840-3707


On Fri, 8 Dec 2000, Bill Ashmanskas wrote:
> 
> * What is the ($) cost per machine or per institution (e.g. licensing)?
Oracle: 0 (Ruth worked out an agreement)

> 
> * What computing resources will each institution need to allocate?
> 
Oracle: testing on a 300MHz pentium with 40G. Probably you want more power
        and 80G to be safe and not think.


Note: marjorie will make a statement about export and it implies
      significantly less space than the 40G. Depends on what we export.

> * How much effort will be needed by local experts or system administrators
> to keep each system running?  What about Fermilab experts/administrators?
> 
Oracle: Model is centralized control, like cdf offline.  So significant
        fnal expertise, less at the remote sites.  We will give you a
        coherent opinion from oracle consultants.  I think they thought
        it was a piece of cake: but for a $1400/day consultant anything
        is a piece of cake.


> * Will either solution make unusual demands on file systems, e.g. tens of
> thousands of 100-byte files?
Oracle: No

> 
> * Will there be periodic updates or maintenance, and if so, what is the
> computing hardware or computing time burden imposed (e.g. N hours of CPU
> and M gigabytes of scratch space on such and such workstation, daily)?
> 
Oracle: Yes, probably patches and upgrades every few months.  Expect a
major upgrade every 2 years.   Assume you are willing to have a downtime
of a day or so, then we would wipe and repopulate you.   This is a basic 
model: NO writes, NO backup: if a failure, we repopulate you from the
central location.

> * How quickly (in real time) will one be able to read the beamline (four
> real numbers, perhaps with covariance matrix) for 1000 runs, in each case,
> in an offline analysis program that does nothing but this (and is sparse
> in run numbers, as the W samples used to be)?
> 
Oracle: Depends on how you store it and how you get the info: We have not
        understood the model for analysis remotely.  Currently the
        software is optimized for production and online: large amounts
        of information for a single run, read every hour or so. So
        it opens a new connection for each run, reads its info, and 
        goes on.  Doing this for 4000 rows of muon is timed to take 4s
        with about 1s overhead.  So expect only 1s overhead and in this
        mode you get 1000s!  Now, if you put all the 1000 runs into
        one read you get it in 4s.  By the way, if you are at LBL,
        and don't export, you also get it in 3.8s according to mark's
        measurements.  He also tried to access oracle AT fnal from UCL and
        Karlsruhe and got something like 40s and 60s to read the 4000 row
        muon table.


> * Same question, but somewhat more challenging, e.g. constants needed to
> re-run full COT tracking.
How many runs and how many constants?
For Oracle you tend to get rows of data free up to about 4kB.  you need
1s to connect to a table.  I guess the numbers above should give you an
answer.

> 
> * Same question, but even more challenging, e.g. constants required to
> re-run SVX clustering.
> 
oracle: ditto; right now we have the silicon info stored stupidly, and we
        have not see the full detector.  We should end up with 5000 rows,
        one per chip, for all peds.  Assuming you mean to concentrate on
        optimizing: not running production, then you need to get the table
        over once for the run or two in question.  for this you could
        export to a text file and the code underneath does not change. 
        requires a bit of expertise, but if you are tuning silicon
        clustering, you must have more than enough expertise anyway!
        I dont think we want to export silicon peds wholesale.  

> * Will there be a performance difference between first access and
> subsequent access to the data for a given (apologies for old terminology)
> component/attribute/run-number?
oracle and msql
 if same info, same program, the api caches the info so
 that once it is stored, every database access first checks if the
 info is cached and only goes to the db once.

 if you run the program and type exit, then you have to access the
 data again of course.  The reason I say this is that 
 henry has another "first access" model: He suggested
 this is the way in which the export occurs: when you ask for it
 first time, it goes to fnal and then keeps it around.  He is not
 a strong proponent because he immediately thought it just easier
 to have a job ask every night.

 as for pulling over, both solutions allow an on-demand pull.

> 
> * Is this choice expected to affect compile/link time, executable image
> size, or virtual memory usage for remote applications that make DB
> accesses?  Or is it some single process running behind the scenes,
> accessed through some common network-like API?

Oracle: Oracle has a number of processes running (6): that is what it is.
        As for executable: we have 3 possible backends to the API:
        text, oracle, msql.  We are trying to make the link to these
        databases optional so that your exe can be reduced.  This will
        probably be a useless exercise remotely since you need all
        three back ends.  at fnal you can do without msql.  Dennis
        things this saves you 9Mb on the exe, but he does not believe
        he measured this right and is still trying to measure.

        The real problem with exe size is the way in which the api
        was organized and we need more help or this will stay big.
        

> 
> * How robust is each solution if the network connection to Fermilab is
> (a) up but extremely slow or (b) down?
> 
Oracle: We have had some experience in various failures in our
online-offline tests where the machines were down on one end or the other.
Oracle replication just rescheduled itself and on the next cycle got the
constants across.  We find that from online to offline, 4GB of the
calibration numbers takes about 1 hour 30 minutes.  This is what you get
the FIRST time through: so if you had a 4GB database you would need this
amount of time to get a complete refresh froms scratch.  The UPDATEs
of the constants for a calorimeter calibration, about 50K rows, is
2 minutes.  Normally we run in refresh. Again, we need a model of WHAT is
to be exported... 


> -Bill
> 
>