From stdenis@fnal.gov Fri Mar 16 23:53:40 2001 Date: Fri, 8 Dec 2000 07:47:43 -0600 (CST) From: Rick St. Denis Reply-To: r.stdenis@physics.gla.ac.uk To: Bill Ashmanskas Cc: Marjorie Shapiro , Jim Amundson , Stephen Wolbers , watts@physics.rutgers.ed, Mark Lancaster , stefano.belforte@ts.infn.it, rharris@fnal.gov, ksmcf@fnal.gov, john weigand , sexton@fnal.gov Subject: Re: Offsite Database Export Review hi bill let me try to answer some of your questions on the Oracle side where I have been looking at the question. Mark will have to comment on the msql. rick ***************************** Dr. Richard St. Denis,Dept. of Phys.& Astr., Glasgow University Glasgow G12 8QQ; United Kingdom; UK Phone: [44] (141) 330 5887 UK Fax : [44] (141) 330 5881 ====================================== FermiLab PO Box 500; MS 318 Batavia, Illinois 60510 USA FNAL Phone: [00] 1-630-840-2943 FNAL Fax: [00] 1-630-840-2968 Sidet: [00] 1-630-840-8630 FCC: [00] 1-630-840-3707 On Fri, 8 Dec 2000, Bill Ashmanskas wrote: > > * What is the ($) cost per machine or per institution (e.g. licensing)? Oracle: 0 (Ruth worked out an agreement) > > * What computing resources will each institution need to allocate? > Oracle: testing on a 300MHz pentium with 40G. Probably you want more power and 80G to be safe and not think. Note: marjorie will make a statement about export and it implies significantly less space than the 40G. Depends on what we export. > * How much effort will be needed by local experts or system administrators > to keep each system running? What about Fermilab experts/administrators? > Oracle: Model is centralized control, like cdf offline. So significant fnal expertise, less at the remote sites. We will give you a coherent opinion from oracle consultants. I think they thought it was a piece of cake: but for a $1400/day consultant anything is a piece of cake. > * Will either solution make unusual demands on file systems, e.g. tens of > thousands of 100-byte files? Oracle: No > > * Will there be periodic updates or maintenance, and if so, what is the > computing hardware or computing time burden imposed (e.g. N hours of CPU > and M gigabytes of scratch space on such and such workstation, daily)? > Oracle: Yes, probably patches and upgrades every few months. Expect a major upgrade every 2 years. Assume you are willing to have a downtime of a day or so, then we would wipe and repopulate you. This is a basic model: NO writes, NO backup: if a failure, we repopulate you from the central location. > * How quickly (in real time) will one be able to read the beamline (four > real numbers, perhaps with covariance matrix) for 1000 runs, in each case, > in an offline analysis program that does nothing but this (and is sparse > in run numbers, as the W samples used to be)? > Oracle: Depends on how you store it and how you get the info: We have not understood the model for analysis remotely. Currently the software is optimized for production and online: large amounts of information for a single run, read every hour or so. So it opens a new connection for each run, reads its info, and goes on. Doing this for 4000 rows of muon is timed to take 4s with about 1s overhead. So expect only 1s overhead and in this mode you get 1000s! Now, if you put all the 1000 runs into one read you get it in 4s. By the way, if you are at LBL, and don't export, you also get it in 3.8s according to mark's measurements. He also tried to access oracle AT fnal from UCL and Karlsruhe and got something like 40s and 60s to read the 4000 row muon table. > * Same question, but somewhat more challenging, e.g. constants needed to > re-run full COT tracking. How many runs and how many constants? For Oracle you tend to get rows of data free up to about 4kB. you need 1s to connect to a table. I guess the numbers above should give you an answer. > > * Same question, but even more challenging, e.g. constants required to > re-run SVX clustering. > oracle: ditto; right now we have the silicon info stored stupidly, and we have not see the full detector. We should end up with 5000 rows, one per chip, for all peds. Assuming you mean to concentrate on optimizing: not running production, then you need to get the table over once for the run or two in question. for this you could export to a text file and the code underneath does not change. requires a bit of expertise, but if you are tuning silicon clustering, you must have more than enough expertise anyway! I dont think we want to export silicon peds wholesale. > * Will there be a performance difference between first access and > subsequent access to the data for a given (apologies for old terminology) > component/attribute/run-number? oracle and msql if same info, same program, the api caches the info so that once it is stored, every database access first checks if the info is cached and only goes to the db once. if you run the program and type exit, then you have to access the data again of course. The reason I say this is that henry has another "first access" model: He suggested this is the way in which the export occurs: when you ask for it first time, it goes to fnal and then keeps it around. He is not a strong proponent because he immediately thought it just easier to have a job ask every night. as for pulling over, both solutions allow an on-demand pull. > > * Is this choice expected to affect compile/link time, executable image > size, or virtual memory usage for remote applications that make DB > accesses? Or is it some single process running behind the scenes, > accessed through some common network-like API? Oracle: Oracle has a number of processes running (6): that is what it is. As for executable: we have 3 possible backends to the API: text, oracle, msql. We are trying to make the link to these databases optional so that your exe can be reduced. This will probably be a useless exercise remotely since you need all three back ends. at fnal you can do without msql. Dennis things this saves you 9Mb on the exe, but he does not believe he measured this right and is still trying to measure. The real problem with exe size is the way in which the api was organized and we need more help or this will stay big. > > * How robust is each solution if the network connection to Fermilab is > (a) up but extremely slow or (b) down? > Oracle: We have had some experience in various failures in our online-offline tests where the machines were down on one end or the other. Oracle replication just rescheduled itself and on the next cycle got the constants across. We find that from online to offline, 4GB of the calibration numbers takes about 1 hour 30 minutes. This is what you get the FIRST time through: so if you had a 4GB database you would need this amount of time to get a complete refresh froms scratch. The UPDATEs of the constants for a calorimeter calibration, about 50K rows, is 2 minutes. Normally we run in refresh. Again, we need a model of WHAT is to be exported... > -Bill > >