From shapiro@cdflx1.lbl.gov Fri Mar 16 23:57:03 2001 Date: Mon, 11 Dec 2000 16:35:49 -0800 (PST) From: Marjorie Shapiro Reply-To: Mdshapiro@lbl.gov To: Stefano Belforte Cc: r.stdenis@physics.gla.ac.uk, Bill Ashmanskas , Mark Lancaster , Marjorie Shapiro , Jim Amundson , Stephen Wolbers , Terry Watts , rharris@fnal.gov, ksmcf@fnal.gov, sexton@fnal.gov, Rob Snihur , Semeria Franco Subject: Re: Offsite Database Export Review Hi Stefano- Good questions. I believe these are things I should try to address in my talk at the beginning of the review. I'll try to have answers for the morning. In particular, I will say something about the model of the extent to which remote users will need the database as a function of what they are trying to do. Marjorie On Tue, 12 Dec 2000, Stefano Belforte wrote: > I have volunteered a few days ago to read the documents and send some > comments > on DB export as seen from my point of view (Italy). > While most questions have been nicely asked already by Bill, I will add > a few > ones and a few comments. > > Let me start by admitting that I am confused. I could not understand > how the DB will be used in the analysis. How many data at which level, > how far along the data reduction chain will DB constants be needed. > That somehow should precede, I think, a model for export, while it is > true that a DB usage model that makes a lighter use of the DB may > allow an easier export. > > I read things like "if there is a corruption, we just wipe you > (my remore site I guess) clean and re-push everything". > And basically I get the sense that the party line is that a remote > institution should better have the copy of everything (40+ GB, and > better be prepared for 100). > My example remote institution (Trieste) has 1.7 CDF physicists and > local computing resources next year will be one linux server with > 36GB RAID disk. A priori that seemed adequate for code development and > small tasks on specific samples, even containing the somehow largish > offline. Do I understand correctely that I need 4 times that disk space > to work on a few GB of data ? > This is the kind of things that make people look very funny when > standing > in front of financing bodies asking for the few needed K$'s... > > Now that I shared my problems... things I did not understand from the > documents and discussion: > > 1. one server per institution ? Or one for nation ? Or one per node ? > > 2. what is exported every night ? All of it (40 GB) ? Only the > incremental > change ? How much will that be ? US-Italy link is "a good one", I can > hope copies at 1Mbit/sec (but seldom achieved that in practice), > but... > I do not expect hours long ftp/scp to work reliably. A message to the > adminstrator does not look a great error recovery. The day later the > data to copy will be doubled. The copy my just never succeed. > Thinking at 50 (60 ? how many ?) CDF institution, the overall traffic > of the nightly update may be a lot. Will the server and/or the fnal > out link handle it ? > Can we do partial updates by CD/tape ? Not that I like wasting a tape > for 15GB, but if it comes to refreshing 15GB of database I do not see > it > happening on the net. > > 3. So how do I recover from a local corruption if any ? Get 40Gb on the > net ? > By tape ? Will the import by tape be > synchronised/time-stamped/whatever > so that the automatic refresh will pick up smootly ? > > 4. Following Bill, sometimes one will have a small, temporary, initial, > whatever installation with very little needs, will it be possible > somehow to > get the few needed data even slowly but simply without having to know > that oracle exists ? Or msql, or whatever ? > 4bis. do I understand correctely that Oracle will be installed on my > node > by somebody at Fermilab ? On how many of my nodes ? How long after I > get a > new PC will it be working ? Will root privilegies be necessary ? > 4ter. Will it there ever exists an application that needs only little DB > data ? > Or do we imagine that in order to do ANYTHING, a job wil need A LOT of > db data ? > > 5. this brings to a general question that I would have liked to > understand > before I can think of the rest. What is really going to be in the DB > and > when will it be needed ? Again looking at run 1, one drawback was (at > least > in the uneducated user opinion) the fact that so little could be done > without the data base. Still I see that the magnetic fiels needs to be > initialised even to do a data conversion or a bank dump, I hope that > does > not come from a database, but ... I am scared already. > At which level will user analysis be decoupled from database access ? > What are the conditions that will allow me to do without database > export to > my site at all ? > All in all if we can cut the export-the-DB-to nodes from 60 to 20... > > 6. I have a variation of Marjories J/PSI example (see minutes 000328) > that > suits more the model we have in mind for italian analysis: > 1. SB is in Trieste > 2. SB wants to make an n-tuple of J/Psi+xyz events (expect 1/100 > reduction) > 3. SB writes and tests a nice AC++ module, copies it to fcdfsgi2, > somehow > manages to make sure it still compiles and runs across the op-sys > boundary. > 4. SB runs his module on the J/Psi sample. DH/DIM sails the job > through > the data set and it is just a piece of cake. > 4. SB gets a 30GB n-tuple on disk, goes to the web server, enters > budget > code, cryptocard number, mother's maiden name, and gets a tape in > Trieste > with the ntuple. > QUESTION: will SB need the DB in Trieste to work on the ntuple ? > will he have needed it to write/test the simple selection > module on a > 1000 event file ? Or 100 events from just the same run. > 5. 9 weeks later SB decides he wants to refit with latest beam > position. Since he > got a tip from BA he has all the needed stuff in the ntuple, and > even a refit > C++ module to dynamically link into Root because he befriended > Pasha Murat > and bribed him with italian wine. So SB had an ascii files with a > list of > 1827 run numbers and need an ascii files with 1827 beam positions. > QUESTION: how will SB get this ? > is this an "allowed" mode of work ? Or the danger that > somehow > in doing this SB screws up is such that beam position is > better > kept out of reach ? > This last question (6.) amounts to saying: > -will it be possible to access the DB "by hand" for use in n-tuple > analsyis, > or anyhow outside AC++ ? > This mode of access calls even more for a Bill-like access via some > sort of > slow but quick-to-set-up socket connection or whatever (secondary > sourcing ?). > Also this mode of analysis is not just for fun, this is indeed how we > expect to > do most of our work and why as italians we are planning to put hardware > at > Fermilab in addition to (and in place of) hardware in Italy. A major > choice ! > Also I expect many small institution will want to go for a n-tuple only > analysis, or at least mostly n-tuple. Then really all those GB of > svxped and > SVT patterns scattered around and never used (same for COT wire > positions, > drift constants etc.) may cost little money (but.. really) but somehow > strikes > as wrong. Disk is cheap, but not that cheap. Also this can not be > scratch disk. > In most remotes sites DST will never arrive, and even PADs may be used > sparingly and with limited need for re-reconstruction. Why bother to > keep > there a copy of all the DB ? > > And mostly, somehow I can not imagine someone sitting at Fermilan > keeping > track of the DB replica in 50+ places making sure they are all OK. It > sounds > like full time at least. In the end it will have to be a system where > the > end user will haev to take care by him/herself. So it matters a lot to > me how > easy is to do it. Fromj what I read I am scared at hell I the idea of > having to > maintain a DB replica on my system. Hopefully I misunderstood. > If now we are to endorse a DB export policy that promise central > maintance and > push, the manpower requirement should be very well understood and > tested and we > need much more then "we can ask resources to CD". > > Let me end a'la JFK: if I ask little from the DB, I expect the DB to > ask > little from me. > > Stefano >