jan thursday
9:00- 9:30 agenda, logistic, CDF management
boundary conditions and goals, fkw list
glide-in: status, work in progress, next step
VO,VOMS,proxy, DAG: status
of submission to LCG
Need administrative i/f. VOX ?
Subir will investigate
with Cesini/Ciaschini about mass insertion via a
script of all CDF users.
11:00-11:30 break
WMS in LCG, DAG, job clusters sandbox, Condor-C with AS
morning wrapup
CNAF glide in
a. Will submit
glide-in daemon with generic CDF principal, so will all run with same Unix UID,
eventually there will be a Globus developed tool to
use the user’s proxy that is transferred to the glide-in (for CDF this is the
proxy generated from the k5 ticket, for pilot jobs for CMS will be something
else) to ask the local op.sys to get a new UID (gridmap style) to run the user executable. Subir will discuss this with Luca Dell’Agnello
(Tier1 head) and/or Francesco Giacomini (developer of
pilot jobs for EGEE new RB/CE).
b. Need to
investigate local directory for grid job (whether that is nfs
mounted), uniqueness of local directory, and disk quota.
c. Subir: turn CNAF-CAF into a purely glide-in farm
d. Alpha test user
jobs by end of Februrary on test CAF
e. Beta test in March
on test CAF
f. Get UCSD involved
and try there as well, they will try it and work on the UID switching
g. Need a head node
with 8GB
h. Transition
completed by April 1st
i. In parallel production CNAF farm upgrade
in 2 steps:
Move to latest Condor release
and implement groups (2 groups)
Move to SLCern
Francesco and Donatella takes over SAM
station at CNAF and CDF data replica in
job submission to LCG
hope that same stuff
submits to OSG as well
Need to rewrite submitter
+ etc..
(CAF portal to GRID)
submitter (Subir + Armando + Francesco)
tarball in SE
a SE dedicated to CDF at cnaf which is reliable
thin sandbox with CafExe
CafExe to LCG environment, have it do define $CDFSOFT
so that user can do source $CDFSOFT/cdf2.shrc
transfer of cdfcaf keytab
Delli Paoli)
to tag as CDF-MC capables, grid farms that have CDF software installed e.g. mount /afs/infn.it (Francesco)
Investigate usage of
SL provided kerberos to write output to k5 nodes (Stefano)
the meanwhile JDL needs a list of allowed CE
Mailer: need status of job, have to poll
LB service ? Can’t wait for output
sandbox. This is a byproduct of the “monitor project”. We need a
complete rewrite of the server-side part of “CafJobs”
so that something runs on the head node and keeps list of
pending/running/deleted jobs etc. and can do kill, for all jobs in the CDF VO,
e.g. by taking directely to the LB service, not using
edg-job-status crap. It has to be some Python
procedure that can be called and finds the status of all jobs, possibly with
options for more or less detailed report. The implementer will see if better to
use GridIce or LB or both.
This piece of sw
will also be responsible for getting (if needed) and opening and parsing output
sandboxes for completed sections/jobs.
This need a dedicated
person asap.
Daniel Jeans will take this 1st.
This will be the basis of CafWWW and CafMon (jobs).
Need CafKill, CafHold, CafRelease on GRID
CafGui (Subir+Armando)
must be able to select Grid site, or provide a white/black list. More in
general user’s should be able to add class-ads to the
JDL file used when submitting to RB.
Maybe only from CafSubmit.
Shih-Chieh + Igor
FKW will coordinate a bit
with Lucia Silvestris abut job bookkeeping and
CAF-like Finite State Machine monitoring etc., JAM usage in CMS etc.
CafWWW for GridCAF (Daniel
+ Subir) and merge all dCAF’s (Igor) need to
improve/rewrite the analyse pages (Igor
WMS in Grid3/SamGrid,
GLuE inter nos in
preparation of Conrad
16:00-17:00 GIS etc. Conrad Stenberg on Video
17:00-17:30 break
more WMS
wrap up of WMS
jan: friday
CafWWW status and summary of specs, explanation
of what we
have to GridICE developers
Presentation of GridICE and JAM/BOSS as bookeeper
10:30-11:00 break
job monitoring/accounting needs in Grid,
XML servers/parsers, condorCAF
accounting, local and remote accounting,
monitoring/accounting, global CDF job ID ?
we need a probe for GridIce for CondorCAF, then we could use GridIce on all dCAF. Ignazio.
CafMon and GridShell
JAM for job debugging presentation
plan for job debugging, integration of JAM/GridShell
We will wait.
This is not urgent.
Jam for asynch. Job monitoring a’ la CafMon
will be discussed and developed more in LCG context, especially security issues
UCSD will complete
and test their MI based system
JAM as bookkeeping means is
not interesting for us now for users
We would like to use some
logging mechanism to keep track of opened input files (from Armando’s DHlog) as the job is running rather then at process end,
and will explore first the usage of the tool that FNAL DB people developed to
monitor DB load. Igor will talk to Jim Kowalkowski
once at FNAL.
wrap up on monitoring tools and needs
16:30-17:00 break
tying submission and monitor, bookeeping
of job
id's, sites, etc., new specs for
etc. in dCaf's, glide-in, grid
spare for overflow of former discussions
wrap up of the day
2 priorities
1. write
gridCaf web i/f that
fetched info from gridice and turns it into same
format as CondorCaf so it goes into usual pages
2. write
a global monitoring page that links togheter all the dCaf’s including the GRID one, and links to the individual
ones, including Grid,for details.
One low piority: to build a dedicated GridIce server which monitors dCAFs
as if they were Grid’s clusters, need GridIce probe
for CondorCAF and globus MDS, so to instantiate a
GRIS. Also need probe to monitor SAM , CAF daemons
etc. It is good for someone to CDF to learn how to write a Probe for GridIce. Ignazio.
Schedule forecast:
Grid-aware submitter:
All needed
functionalities identified and tested: end of February
A CAF-like submitter
working: end of March
Monitor for Grid: same time schedule
Rewrite in better form and real system for users: end of May
1st attempt at CafWWW for grid
by end of May
By June 1st CDF users can submit MC jobs to LCG using CafGui and get mailer output.
By September 1st CafWWW and complete CafMon. Full CAF-like environment for users.
Start using Grid for good and see how reliable etc. it is.
Global MC queue for current dCAF’s:
Have a MC “fake” CAF
in the list
When user picks it
the GUI does:
Option 1: send all
sections to a real CAF of GUI choice
Option 2: send some
section here, some section there.
We decide for 2.
2 implementation
a) MC “CAF” is a new
kind of submitter
very general, but a
b) wrapper
around CafSubmit
We decide CafMon will have an option to get number of total/free/used
slots, and same on future global CafWWW and users
will do their own job splitting/wrapping. Igor will make the information available.
Splitting of analysis jobs is not in our scope for the time being.
Stefano will send instructions to Donatella,
Francesco and Paola on splitting datasets and Francesco will make a nice tool
if needed, or at least put instructions somewhere.