Minutes of the Euro-CDF-Grid meeting 22 April 2002 (courtesy of Rick St.Denis, with a few after the meeting additions by Stefano Belforte) Attendants: at FNAL: Flavia Donno, Rick St.Denis, Gabriele Garzoglio, Stan Thompson on TV: Stefano Belforte (from Trieste), Antonio Sidoti (from Trento) DataTAG Flavia Donno ============ EDG tools for grid LDAP populated CNAF: has the resource broker User Certification is running: Antonio is coordinating this and this allows jobs to be submitted. A test bed for CDF is being setup by DataTAG initiative using EDG software tools. The test bed uses a RB (Resource Broker) at CNAF and will have CE/WN/SE/UI (Computing resources and User Interface, see Anotnio's slides for a collection of EDG's acronyms) CE, SE, WN for CDF are not there yet (computing and storage elements see Antonio's slides for EDG acronyms). Work is in progress to setup 1CE+2WN in Trieste and 1CE+1UI+2WN in Bologna. This sites did not have EDG installation before and are learning how to do it. In the meanwhile EDG has made available one CE and at CNAF one UI in Torino and one UI at CNAF from the INFN-GRID testbed to people (Antonio and Stefano) working on this. Flavia has installed a UI at FNAL (ncdf29.fnal.gov, one desktop in the trailers) that uses a DOE signed certificate to declare itself to EDG. Job submission has only been tried using personal certificates issued by INFN CA. It is not possible to exclude access to the database. So far Flavia tried to send jobs to nodes without AFS. As soon as the CE has AFS, then the CDF distribution of code will be available. It is worth noting that MySQL can send its information by table. As the CDF database is organized this will not limit the information sent by much because all calibrations for a given detector that were ever taken are in a single table. However, this could be a requirement for future design. One test CDF MC jobs ran successfully in this way, submitted from either Fermilan or Torino and executing at CNAF, output retrieved to any of those places. Needed .so files were tarred with the executable and the exe ran on generic RH6.1 nodes with no access to CDF offline. Details are in: http://www.ts.infn.it/~belforte/offline/grid/edg-test-log.html http://www.ts.infn.it/~belforte/offline/grid/edg-troubles.html Stefano had two questions: 1. Does a job need to send out data 2. Can this be done without database information using Monte Carlo AFS is used because Stefan has set up the distribution of CDF code over AFS. A CDF replica catalog is set up since EDG uses it. The protoype is in LDAP. The next step is to move to a commercial database by the end of July. The (C++) API will be the same but underlying this it is likely to be based on SOAP. In Italy there is work on an interface of Kerberos to Globus. In the long term the plan is to do sam/batch at fnal. Packaging issues will be done in collaboration with the Sam team. CDF Virtual Organizations Antonio Sidoti -======================== See transparencies Condor Stan Thompson ============= Work on Condor is being done in a collaboration between Ruth (PPDG) and University of Wisconsin. CondorG extends condor to use globus and the grid. Condor is a kind of minigrid. Much of it is like the WP1 (work package 1) software but there are differences: o Decisions are made by their scheduler (in D0-speak the negotiator) at a later stage than in WP1. This gives more flexibility and the ability to reschedule. o The D0 negotiator maps to the CondorG scheduler. There is a daemon at each remote site that supplies information. The D0 negotiator is in an investigation stage and not in production. CondorG uses class adds which are like job ads and are used to allocate resources. The job add is an instantiation of a class add. Responses to the matching of resouces are tri state: true, false or undefined. Plug in functions are being added to CondorG to allow the negotiator to work on classes that are very large, such as collections of files. These will also allow return values that characterize the collections such as percentage matching to be returned. CondorG has a feature called Glide-in. A Condor-G shell is created so that for example it can be placed in 4 hour job slot and then run other Condor jobs. This allows a unification of language. However D0 wants to moave away from this. The problem is that 1000h might be allocated to a job and the usage of the machine by these jobs will not be controllable by the local system administrator. It will be desirable that there be a way for the local administrator to hafe some control. One could also have an uniform interface defined for installations to ensure that Condor can function effectively. These issues will be considered in the development of the D0 (and CDF?) Grid. This is on a longer time scale than the D0 grid envisions for its release to production. It is hoped that it can go into production within a year. Hence, they would not consider glide-in immediately. D0 Grid -------- Gabriele The main part of Condor is used in collaboration with D0 to enable distribution of resources. It is built on SAM, the data handling system for D0. Sam is being used and is in production. It lacks Job distribution management. Jobs currently only run on a local node. Work is going on to decouple the job submission layer in SAM. The desire is to use SAM as a data handling system on the grid. Further infomratin may be found on the web page: www-d0.fnal.gov/computing/grid. There are three main aspects to the task. Job handling, data handling and monitoring. Sam is the data handling service. The interface to job submission is done by EDG and the D0 grid at present. Monitoring and information services are being done by Imperial College, Lancaseter, University of Texas and Fermilab. Distribution of SAM code is being done by UPD and UPS. Distribution systems for Grid are currently based on Rpm and there is work being done to understand the intefacing to the usage of Packman. System adminstrators like rpm distribution if they will be held responsible for the code installation. CDF-SAM Rick St. Denis ============== The Pre-pilot project has shown that SAM can be used in a CDF enviroment and a number of issues that were raised have either been dealt with or are being handled in a new project, the "Sam for CDF" project being worked on by a broader group of people than the Sam pre-pilot project. This project draws upon effort in the UK as well as the Fermilab computing division, (ODS-DBA, ISS, SAM) and the CDF computing division group. The aspects of this project include: o Physical infrastructure to support enstore access from remote sites o Database configuration to allow the integration of the CDF data catalogue information into SAM and to move to a single schema o Clients for communication of AC++ with SAM o Installation configuration o Administration and Monitoring o Deployment and Customer service o Using SAM "in anger" (for analysis) o Cooridation There was not enough time to cover all these points indetail. A document describing this project is in an advanced stage and the project is moving rapidly. Meetings are held daily by video for 15 minutes at 9am FNAL time to discuss the progress on each of the items described above.