13 jan thursday

 

9:00- 9:30 agenda, logistic, CDF management reorganization

9:30-10:00 boundary conditions and goals, fkw list

10:00-10:30 glide-in: status, work in progress, next step

10:30-11:00 VO,VOMS,proxy, DAG: status of submission to LCG

          Need administrative i/f. VOX ?

          Subir will investigate with Cesini/Ciaschini about mass insertion via a script of all CDF users.

11:00-11:30   break

11:30-12:30 WMS in LCG, DAG, job clusters sandbox, Condor-C with AS

 

 

14:00-14:30 morning wrapup

1)                CNAF glide in

a.    Will submit glide-in daemon with generic CDF principal, so will all run with same Unix UID, eventually there will be a Globus developed tool to use the user’s proxy that is transferred to the glide-in (for CDF this is the proxy generated from the k5 ticket, for pilot jobs for CMS will be something else) to ask the local op.sys to get a new UID (gridmap style) to run the user executable. Subir will discuss this with Luca Dell’Agnello (Tier1 head) and/or Francesco Giacomini (developer of pilot jobs for EGEE new RB/CE).

b.    Need to investigate local directory for grid job (whether that is nfs mounted), uniqueness of local directory, and disk quota.

c.    Subir: turn CNAF-CAF into a purely glide-in farm

d.    Alpha test user jobs by end of Februrary on test CAF

e.    Beta test in March on test CAF

f.       Get UCSD involved and try there as well, they will try it and work on the UID switching

g.    Need a head node with 8GB

h.    Transition completed by April 1st

i.        In parallel production CNAF farm upgrade in 2 steps:

                                                       i.       Move to latest Condor release and implement groups (2 groups)

                                                     ii.      Move to SLCern

                                                  iii.      Francesco and Donatella takes over SAM station at CNAF and CDF data replica in Italy

2)                job submission to LCG

hope that same stuff submits to OSG as well

     Need to rewrite submitter + etc..

     (CAF portal to GRID)

                   Submitter/Mailer

                             New submitter (Subir + Armando + Francesco)

                             Store tarball in SE

                                      Need a SE dedicated to CDF at cnaf which is reliable

                             Create thin sandbox with CafExe

                             Tailor CafExe to LCG environment, have it do define $CDFSOFT so that user can do source $CDFSOFT/cdf2.shrc

                             Secure transfer of cdfcaf keytab file

                                      (Francesco Delli Paoli)

                             Need to tag as CDF-MC capables, grid farms that have CDF software installed e.g. mount /afs/infn.it (Francesco)

                  

          Investigate usage of SL provided kerberos to write output to k5 nodes  (Stefano)

 

                             In the meanwhile JDL needs a list of allowed CE

Mailer: need status of job, have to poll LB service ? Can’t wait for output sandbox. This is a byproduct of the “monitor project”. We need a complete rewrite of the server-side part of “CafJobs” so that something runs on the head node and keeps list of pending/running/deleted jobs etc. and can do kill, for all jobs in the CDF VO, e.g. by taking directely to the LB service, not using edg-job-status crap. It has to be some Python procedure that can be called and finds the status of all jobs, possibly with options for more or less detailed report. The implementer will see if better to use GridIce or LB or both.

This piece of sw will also be responsible for getting (if needed) and opening and parsing output sandboxes for completed sections/jobs.

 

This need a dedicated person asap.

Daniel Jeans will take this 1st.

 

This will be the basis of CafWWW and CafMon (jobs).

 

Need CafKill, CafHold, CafRelease on GRID (Francesco)

 

                   CafGui (Subir+Armando)

                             User must be able to select Grid site, or provide a white/black list. More in general user’s should be able to add class-ads to the JDL file used when submitting to RB.

Maybe only from CafSubmit.

 

                   CafMon

                             Shih-Chieh + Igor

                             FKW will coordinate a bit with Lucia Silvestris abut job bookkeeping and CAF-like Finite State Machine monitoring etc., JAM usage in CMS etc.

 

                   CafWWW

                             Create CafWWW for GridCAF (Daniel + Subir) and merge all dCAF’s (Igor) need to improve/rewrite the analyse pages (Igor ?)     

 

 

 

14:30-15:30 WMS in Grid3/SamGrid, UK experience

15:30-16:00 GLuE inter nos in preparation of Conrad

 

16:00-17:00 GIS etc. Conrad Stenberg on Video

17:00-17:30   break

17:30-18:30 more WMS

18:30-19:00 wrap up of WMS

 

 

14 jan: friday

 

9:00-10:00 CafWWW status and summary of specs, explanation

           of what we have to GridICE developers

10:00-10:30 Presentation of GridICE and JAM/BOSS as bookeeper

10:30-11:00   break

11:00-12:30 job monitoring/accounting needs in Grid,

           XML servers/parsers, condorCAF monitoring,

           group accounting, local and remote accounting,

           global monitoring/accounting, global CDF job ID ?

 

          we need a probe for GridIce for CondorCAF, then we could use GridIce on all dCAF. Ignazio.

 

14:00-14:30 CafMon and GridShell presentation

14:30-15:00 JAM for job debugging presentation

15:00-16:00 plan for job debugging, integration of JAM/GridShell

          We will wait.

          This is not urgent.

          Jam for asynch. Job monitoring a’ la CafMon will be discussed and developed more in LCG context, especially security issues

          UCSD will complete and test their MI based system

          UK is also working in CMS context on a similar thing.

 

 JAM as bookkeeping means is not interesting for us now for users

 

 We would like to use some logging mechanism to keep track of opened input files (from Armando’s DHlog) as the job is running rather then at process end, and will explore first the usage of the tool that FNAL DB people developed to monitor DB load. Igor will talk to Jim Kowalkowski once at FNAL.

 

16:00-16:30 wrap up on monitoring tools and needs

16:30-17:00   break

17:00-18:00 tying submission and monitor, bookeeping

           of job id's, sites, etc., new specs for

           submitter/mailer etc. in dCaf's, glide-in, grid

18:00-18:30 spare for overflow of former discussions

18:30-19:00 wrap up of the day

2 priorities

          1. write gridCaf web i/f that fetched info from gridice and turns it into same format as CondorCaf so it goes into usual pages

          2. write a global monitoring page that links togheter all the dCaf’s including the GRID one, and links to the individual ones, including Grid,for details.

One low piority: to build a dedicated GridIce server which monitors dCAFs as if they were Grid’s clusters, need GridIce probe for CondorCAF and globus MDS, so to instantiate a GRIS. Also need probe to monitor SAM , CAF daemons etc. It is good for someone to CDF to learn how to write a Probe for GridIce. Ignazio.

 

Schedule forecast:

 

Grid-aware submitter:

          All needed functionalities identified and tested: end of February

          A CAF-like submitter working: end of March

 

Monitor for Grid: same time schedule

 

Rewrite in better form and real system for users: end of May

 

1st attempt at CafWWW for grid by end of May

 

By June 1st CDF users can submit MC jobs to LCG using CafGui and get mailer output.

 

By September 1st CafWWW and complete CafMon. Full CAF-like environment for users.

Start using Grid for good and see how reliable etc. it is.

 

 

Global MC queue for current dCAF’s:

          Have a MC “fake” CAF in the list

          When user picks it the GUI does:

          Option 1: send all sections to a real CAF of GUI choice

          Option 2: send some section here, some section there.

          We decide for 2.

          2 implementation options:

          a) MC “CAF” is a new kind of submitter

          very general, but a pain

          b) wrapper around CafSubmit

          We decide CafMon will have an option to get number of total/free/used slots, and same on future global CafWWW and users will do their own job splitting/wrapping. Igor will make the information available.

 

Splitting of analysis jobs is not in our scope for the time being. Stefano will send instructions to Donatella, Francesco and Paola on splitting datasets and Francesco will make a nice tool if needed, or at least put instructions somewhere.