From stefano.belforte@ts.infn.it Wed Mar 20 13:47:49 2002
Date: Mon, 11 Mar 2002 23:11:56 +0100
From: Stefano Belforte <stefano.belforte@ts.infn.it>
To: Jesus Marco <marco@ifca.unican.es>,
     Francesco Prelz <Francesco.Prelz@mi.infn.it>,
     "Todd Huffman (CDF/ATLAS)" <t.huffman1@physics.ox.ac.uk>,
     Lamberto Luminari <lamberto.luminari@roma1.infn.it>,
     r.stdenis@physics.gla.ac.uk, David Waters <dwaters@hep.ucl.ac.uk>,
     Franco.Semeria@bo.infn.it, roberto.carosi@pi.infn.it, mazzanti@bo.infn.it,
     giovanni.busetto@pd.infn.it, Ivan Vila <vila@fnal.gov>,
     Alberto Ruiz <ruiz@ifca.unican.es>, rodrigo <rodrigo@ifca.unican.es>,
     rmarco@ifca.unican.es,
     Thomas Mueller <mullerth@ekp.physik.uni-karlsruhe.de>,
     Holger Marten <holger.marten@hik.fzk.de>,
     Guenter Quast <quast@ekp.physik.uni-karlsruhe.de>,
     Patrick Schemitz <schemitz@ekp.physik.uni-karlsruhe.de>,
     Flavia Donno <Flavia.Donno@pi.infn.it>,
     Igor Sfiligoi <Igor.Sfiligoi@lnf.infn.it>,
     Antonio Sidoti tel. +39+0461 88 1525 <sidoti@science.unitn.it>
Subject: dinner with Prelz

a few words about what I would like to talk about
with Francesco Prelz tomorrow at dinner.
I announced a background info document, but am scaling
down ambitions.

The question is which d-grid tool we can use on which
time scale to integrate our computing centers. Starting
with FNAL of course.

CDF is setting up a farm based batch analysis facility
at FNAL, called Central Analysis Facility (or CAF)
the plan started as "submit from everywhere".
Where everywhere means any LInux box in the world that
have CDF code installed.
Already practical considerations are downscoping it to
"log on some central Linux machine at FNAL and submit from
there". Part of the problem is that the present user
interface to submit job is not "extremely portable", part
that kerberos authentication is needed and the fnal specific
kerberos installation has to be there.
A bit of details are in
http://listserv.fnal.gov/scripts/wa.exe?A2=ind0202d&L=cdf_caf&D=0&O=A&P=7359
Note that local kerberos5 installation is not required
to log into fnal computers interactively as people can use
the cryptocard for that, and several people prefer that to
install kerberos.

I think this means that the new CDF CAF will not authomatically
provide tools for remote access to it, nor tools that can be
cloned for remote access across non-fnal sites (D-I-E-GB namely)

The interest for "remote access to the CAF" is driven by
the expected proliferation of CAFs (a bit of an oxymoron !):
there will be big computing facilities for CDF in our countries,
as in Korea and maybe elsewhere, there is also the possibility
of many small university owned CAFs, likely somewhere at
Fermilab, or at home. Some universities have expressed interest
in pooling thier local resources, some have indicated desire to
install significant hardware at FNAL and is currently being debated
wether it will be a separate facility (e.g. in the CDF building)
or part of the "central CAF" at Fnal Computing Center.


Somehow2 dGRID may provide the needed glue, we would need
-user authentication
-job submission
the first is basically a question wether we can/want use
kerberos everywhere, or will want to access computer centers
that only accept different mechanisms, so k5-certificates
i.f. will be needed. FNAL is working on it, I do not know
details, need to make clear what is needed, how much work
would it be and then who can do it. Biggest question for me
is how far the present glbus certificates scheme is from
being acceptable in real systems (i.e. largish computer
centers) rather then small test beds. Is any even marginally
security-concerned sys.admin. going to trust these certificates ?

Job submission is a bit simpler. Present CDF scheme is that
user sends a tarball with executable and a few "options" in
the usual class-ads spirit, but simpler and of course different
implementation. No automatic returning to the user of the job
output will be there and data will stay for a couple of days
somewhere where they can be picked up from, then deleted.
The user will only get a mail. The reason for this is security,
we could not find a scheme of giving the batch process the
authorisations to write on the local user directory "safely".
This latter constraint may be a temporary problem, beacuse it
is mainly related to the fact that all jobs on the CDF CAF have
to be run as the same user, which may change. But while each CDF
member will have an account and thus a username on FNAL machines,
this may not be true on all the "CDF regional centers".
We could then assume that the users will keep picking up their
output themselves, at least for the beginning.
The biggest question there for me is just technical, what does
it mean in practice to have an interface between e.g. dGrid
job submission UI and CDF CAF "submitter process" ? The CDF
submitter is a task that runs on the CAF gateway computer and
listen  for connection on a given tcp port. Would it need
to be changed to understand "grid messages", or an i/f to it
should be setup on each remote node that translates local
condorG or whatever submission requests to something that
our submitter understand ?
CUrrent draft description of submitter etc. is in 
http://mit.fnal.gov/~fkw/caf/aceDocs.ps
also available in html
http://mit.fnal.gov/~fkw/caf/aceDocs/
especially:
http://mit.fnal.gov/~fkw/caf/aceDocs/node8.html


					Stefano