From stefano.belforte@ts.infn.it Wed Mar 20 13:47:49 2002 Date: Mon, 11 Mar 2002 23:11:56 +0100 From: Stefano Belforte To: Jesus Marco , Francesco Prelz , "Todd Huffman (CDF/ATLAS)" , Lamberto Luminari , r.stdenis@physics.gla.ac.uk, David Waters , Franco.Semeria@bo.infn.it, roberto.carosi@pi.infn.it, mazzanti@bo.infn.it, giovanni.busetto@pd.infn.it, Ivan Vila , Alberto Ruiz , rodrigo , rmarco@ifca.unican.es, Thomas Mueller , Holger Marten , Guenter Quast , Patrick Schemitz , Flavia Donno , Igor Sfiligoi , Antonio Sidoti tel. +39+0461 88 1525 Subject: dinner with Prelz a few words about what I would like to talk about with Francesco Prelz tomorrow at dinner. I announced a background info document, but am scaling down ambitions. The question is which d-grid tool we can use on which time scale to integrate our computing centers. Starting with FNAL of course. CDF is setting up a farm based batch analysis facility at FNAL, called Central Analysis Facility (or CAF) the plan started as "submit from everywhere". Where everywhere means any LInux box in the world that have CDF code installed. Already practical considerations are downscoping it to "log on some central Linux machine at FNAL and submit from there". Part of the problem is that the present user interface to submit job is not "extremely portable", part that kerberos authentication is needed and the fnal specific kerberos installation has to be there. A bit of details are in http://listserv.fnal.gov/scripts/wa.exe?A2=ind0202d&L=cdf_caf&D=0&O=A&P=7359 Note that local kerberos5 installation is not required to log into fnal computers interactively as people can use the cryptocard for that, and several people prefer that to install kerberos. I think this means that the new CDF CAF will not authomatically provide tools for remote access to it, nor tools that can be cloned for remote access across non-fnal sites (D-I-E-GB namely) The interest for "remote access to the CAF" is driven by the expected proliferation of CAFs (a bit of an oxymoron !): there will be big computing facilities for CDF in our countries, as in Korea and maybe elsewhere, there is also the possibility of many small university owned CAFs, likely somewhere at Fermilab, or at home. Some universities have expressed interest in pooling thier local resources, some have indicated desire to install significant hardware at FNAL and is currently being debated wether it will be a separate facility (e.g. in the CDF building) or part of the "central CAF" at Fnal Computing Center. Somehow2 dGRID may provide the needed glue, we would need -user authentication -job submission the first is basically a question wether we can/want use kerberos everywhere, or will want to access computer centers that only accept different mechanisms, so k5-certificates i.f. will be needed. FNAL is working on it, I do not know details, need to make clear what is needed, how much work would it be and then who can do it. Biggest question for me is how far the present glbus certificates scheme is from being acceptable in real systems (i.e. largish computer centers) rather then small test beds. Is any even marginally security-concerned sys.admin. going to trust these certificates ? Job submission is a bit simpler. Present CDF scheme is that user sends a tarball with executable and a few "options" in the usual class-ads spirit, but simpler and of course different implementation. No automatic returning to the user of the job output will be there and data will stay for a couple of days somewhere where they can be picked up from, then deleted. The user will only get a mail. The reason for this is security, we could not find a scheme of giving the batch process the authorisations to write on the local user directory "safely". This latter constraint may be a temporary problem, beacuse it is mainly related to the fact that all jobs on the CDF CAF have to be run as the same user, which may change. But while each CDF member will have an account and thus a username on FNAL machines, this may not be true on all the "CDF regional centers". We could then assume that the users will keep picking up their output themselves, at least for the beginning. The biggest question there for me is just technical, what does it mean in practice to have an interface between e.g. dGrid job submission UI and CDF CAF "submitter process" ? The CDF submitter is a task that runs on the CAF gateway computer and listen for connection on a given tcp port. Would it need to be changed to understand "grid messages", or an i/f to it should be setup on each remote node that translates local condorG or whatever submission requests to something that our submitter understand ? CUrrent draft description of submitter etc. is in http://mit.fnal.gov/~fkw/caf/aceDocs.ps also available in html http://mit.fnal.gov/~fkw/caf/aceDocs/ especially: http://mit.fnal.gov/~fkw/caf/aceDocs/node8.html Stefano