Minutes of the Euro-CDF-Grid meeting 13 March 2001. Partecipants: Italy: Stefano Belforte, Lamberto Luminari, Antonio Sidoti, Igor Sfiligoi UK: Rick St.Denis, Ian MacArthur, Todd Huffman, David Walters, Paul Crosby, Kevin Watkins, plus others whose name Stefano did not catch Spain: Jesus Marco Fermilab: Ruth Pordes, Vicky White These minutes are based on my (SB) memory and notes helped by notes taken online by Rick St.Denis. Any problem with the minutes is my fault, not Rick's. Meeting started the night before with a discussion between CDF-Italy (Stefano, Lamberto, Antonio, Igor) and Francesco Prelz (WP1[resource scheduling] manager for EurpeanDataGRID, aka EDG-WP1) and Flavia Donno (formerly in EDG testbed project, now dataTAG[data TransAtlantic Grid] manager for subprojetct WP4.3-4 [middleware and application integration], dataTAG aims at integrating european grids with US ones). It was agreed that CDF will join the dataTAG test bed. Flavia is the dataTAG contact for CDF. Details are appended later on to this document. The emerging picture after the meeting is of CDF starting using EuroDataGRID toolkit for interoperability across European institution/laboratories, starting with software configured as SAM stations to deal with data distribution and "hand picked" job destination site. I.e.: I submit a job from Trieste, decide myself if at Italian Tier 1, St.Ander, RAL or ..., and data will be copied to the worker node if/as needed by SAM. Up to Belforte's intelligence to pick the destination so to minimise the chance that all data have to come from overseas. This activity will start in the DataTAG framework/testbed since that is directed to european-us interoperability anyhow. Later evolution will be (non necessarily in this order): . adding FNAL (i.e. kerberos/certificates) . automatic resource usage optimisation (global broker decide where to run, and if copy data or move jobs etc.) . griddification of data replica/catalog management (SAM evolution or modification, distributed DataBases etc.) Wednesday bottom lines are: 1- we agree to a set of goals: o use grid beyond what being done now in UK o make it easier to connect from our countries to fnal o share resources to provide common framework o open the way to exploit coming high speed european networks to build a common (disk resident) data set with faster access than FNAL main repository 2- we did not address to which extent we can use each other hardware and how to monetize (if needed) this. But will go ahead to make it possible first. This is a different problem in different countries though, as e.g. in UK hardware comes "for GRID" and so is meant to be shared to begin with, while in italy it comes "for CDF" based on italian group need. 3- we will not attempt a full blown gridification now, but will keep working along the CDF-UK model of one group exploring some aspect, try to use it in realistic analysis environment and then pulling other groups in if/as fit 4- in this spirit UK will carry ahead the SAM effort, focussing on SAM as data distribution tool. 5- Italy will explore job submission and user authentication/authorisation. 6- Spain will join tests later on, while more significant contribution is depending on getting more manpower (trying to to this now) 7- German people could not attend the meeting, but agreed offline on the main spirit and Germany is in this loop as well. The Karlsruhe group does not have mapower to contribute to the work now but as their computing center is going to be part of the GRID framework anyhow they will join later on as local manpower allows. 8-we will meet again: o the week after April 15, once CDF has made his decision on SAM, will have a videoconference (Stefano/Rick set it up) o around end of May, on the side of the CDF collaboration meeting, will meet in Fermilab. Now more details on what was said/decided: DETAILS ON WED MEETING ========================== computing status in Italy (Stefano): - no significant hardware now - will have it next year; next year national cdf computing facility at bologna. hardware at fnal. - piece of new cdf caf paid for by infn - too much to bother with grid now for italy - round of Rick's questions: who organizes the computing who supports it who funds it answers: Italy - no clearcut line between experiment and IT department. - bologna: computer professionals. service of infn to experiment: buy, connect hardware up to having operating system installed and computers running. how much with configure, needs negotiation Policy agreed with lhc experiments. CNAF will be Tier 1. This is for all hep experiments. operators, system managers etc. Experiments get some positions from infn or grid to have people there that is a connection to experiment specific software. ATLAS, CMS have, but CDF does not yet, not clear if it will. These are centered around virtual organizations. Europe wide VO. For ATLAS there will be a model where each country contributes with a certain amount of resources. These resources belong to the VO. In the end policies for resource usage/sharing is largely uncharted territory. BaBar has a top-down agreement where Slac/France/INFN each put some computing resource for everybodies usage, monetised at central funding agency level, CDF has a much less structured approach to computing. Spain - St.Ander is only FNAL group. - So a unique case - The same group is involved in both CDF and CMS - Partecipates in CrossGRID European Union initiative (Spain,Germany,Poland) an analogous to DataGRID with high hopes for interoperability. - Proposed cross-grid to brussels and this was approved. Started March 1. University plus CSIC (research institution in spain) thought grid was a good idea. Have initial hardware as a cluster of IBM servers for grid. Part for cross-grid testbed. Part for astrophysics. Most for CDF if we can set up something and CMS. Relative priority between CMS and CDF up to St.Ander group to decide. Install condor, grid etc. Datagrid testbed site. Spanish certification authority is centered there. Software development effort centerd around parallel analysis in this cluster. MPI style parallelisation of ROOT e.g. being pursued in collabortion with Karlsruhe Also proposed to funding agency to consider data processing (in CDF lingo it would mean secondary data sets retracking e.g.) People paid for 3 years. Handle management of system, installation of grid and cdf software. 4 people Different emphasys between CrossGRID and DataGRID: Cross grid can be run interactive. Datagrid is batch Goal for MPI usage - 1TB to analyze in 5min. need 50-100pc's for interactive resources. Then Rick gave a great presentation of SAM effort in UK, pointers to slides are in Stefano's grid page: http://www.ts.infn.it/~belforte/offline/grid/index_grid.html Also Paul Crosby showed slides about SAM input module for AC+. The AC++ executable will be inserted within a SAM jobs wrapping (script) that will start fetching of remote files as needed and then launch the AC++ exe that at each file opening will wait for the needed file to be on disk. Everybody was bought in by terrific Ricks' sale speach, we will all try SAM stations. We are concerned about the vital role played by the DB sever at FNAL, but will keep going assuming it is an inginitely loaded but infinitely powerfull and always available resource. Ensueing discussion including Vicky and Ruth made us define the goals and framework outlined at top. We will use the CDF_GRID mail distribution list set up by Frank Wurthwein for this and all future messages. Overall coordination of this effort will keep being a collaborative process. ========================================== CDF PARTECIPATION IN DATATAG ======================================= Italy will start first exploring executing CDF software on dataTAG testbed machines (configured with EuroDataGrid toolkit) and then remote job submission and resource broker (i.e. EDG-WP1 tools), using EDG authentication/authorisation scheme (i.e. globus Certificates). Interoperability of certificates and kerberos was discussed a bit, also with Don Petravick (FNAL), but as much work in going on on that we will not try to use this at the beginning. We will start by exploring EDG software as remote job submission (scheduling, output retrieval etc.) and make no attempt to use grid tools for data management. Work on this will start around april 15 and will go through the following steps: 1. run CDF MC on dataTAG machine (no data access required) goals: verify enviroment (RH version, CDF software, authentication) start using grid Virtual Organization services for CDF (in my understanding this means authorising users via a list of certificates collected in a VO, using LDAP tools, and testing interoperability there of certifcates issued by different Authorities, esp. EU vs. US) learn user interface, scheduler, output retrieval ... verify if/how LCFG is usefull/needed 2. run CDF analisis on local data files placed by hand goals: very simple extension of above, mainly address convenience of usage and running of a more complex application then MC with database access to FNAL e.g. 3. (could be done in paraller with 1.+2.) setup a CDF SAM station in Italy and make it work for simple analysis tasks 4. run CDF as a remote SAM station on the test bed, so data will be copied from FNAL as needed and this simulates a possible realistic way of using EDG tools to integrate european CDF sites 5. a dataTAG test station will also be setup at FNAL to explore allowing European on the road to use home resource "easily" and test usability of DOE certificates. We are all aware that in the long term the unique central SAM DB server may be a bottleneck (no SAM job can run if connection to FNAL DB server is missing), but assume this will be solved later and hopefully by somebody else as SAM and the many GRIDs merge. >From the CDF point of view: Antonio will run the tests on italian test bed nodes first Then users from UK and Spain will also try: Jesus, David. Antonio will also get a DOE certificate and simulate an US user. Igor will investigate ways to make a CDF-exe tarball that carries all shareables and environment so that it runs on "any grid node" without need of custom system configuration (no LCFG e.g.). A more detailed list of steps needed to actually carry on this, esp. from the point of view of dataTAG needed work, is in the attached mail by Flavia. Subject: Re: CDF-EuroGrid March 12/13 meeting times Date: Wed, 13 Mar 2002 15:12:05 +0100 From: flavia Organization: CERN To: Stefano Belforte CC: Jesus Marco , Francesco Prelz , "Todd Huffman (CDF/ATLAS)" , Lamberto Luminari , r.stdenis@physics.gla.ac.uk, David Waters , Franco.Semeria@bo.infn.it, roberto.carosi@pi.infn.it, mazzanti@bo.infn.it, giovanni.busetto@pd.infn.it, Ivan Vila , Alberto Ruiz , rodrigo , rmarco@ifca.unican.es, Thomas Mueller , Holger Marten , Guenter Quast , Patrick Schemitz , Igor Sfiligoi , "Antonio Sidoti tel. +39+0461 88 1525" , Antonia.Ghiselli@cnaf.infn.it, Cristina.Vistoli@cnaf.infn.it, petravick@fnal.gov, Luciano.Gaido@to.infn.it, Alessandro.Italiano@cnaf.infn.it, Andrea.Chierici@cnaf.infn.it, Roberto.Cecchini@fi.infn.it, Flavia.Donno@pi.infn.it, Luca.DellAgnello@cnaf.infn.it, Alessandro.Cavalli@cnaf.infn.it Hello all. As promised I send you a short report of the list of actions/outcome of the meeting held at CERN between CDF, F. Prelz (EDG/WP1 manager) and myself (DataTAG task manager for task 4.4 - Interoperability between EU and US GRID testbeds). This plan would be confirmed as valid or adjusted after internal discussion in DataTAG. 1. Antonio Sidoti is the contact person between CDF-Grid and DataTAG. 2. Antonio Sidoti will be invited to the DataGrid tutorial session that will be held at CNAF in Bologna the week after Easter (2-5 April 2002). DataTAG will make sure that a user session is organized (as well as administrator sessions) and that Antonio is informed/invited. 3. Flavia will send Antonio examples of job submission to the DataGrid testbed and preliminary documentation (DONE!) 4. DataTAG will setup an LDAP VO (LDAP Virtual Organization) server for CDF collaborators. CDF/Antonio will take care of populating such server with the necessary entries. (Alessandro Italiano, Andrea Chierici, Luciano Gaido, could you please identify the resources to realize that and setup the server ? Thanks a lot. Could you please let me know when this can be done ?) Antonio, docs on doing that can be found at: http://www.globus.org/security/overview.html http://marianne.in2p3.fr/datagrid/documentation/ldap-doc.pdf http://server11.infn.it/testbed-grid/doc/infn-author.pdf 5. CDF/Antonio will address the INFN CDF community to requests valid certificates to the INFN CA (http://security.fi.infn.it/CA/). 6. DataTAG will install on the testbed at CNAF (CE and WN) an AFS client. Alessandro/Andrea could you please investigate what is involved to do so ? Also, DataTAG resources should be tagged with the RunTimeEnvironment tag DATATAG. Those with AFS installed should be tagged with the CDF RunTimeEnvironment tag (Alessandro Cavalli, could you please help with this ?) 7. In the first 2 weeks of April, CDF will try to use its simulation job on the DataTAG testbed to identify usage issues and feasibility of a distributed computing solution based on the EDG Middleware. 8. As a second step, other CDF sites in Europe will join : Spain ? UK ? What is the timescale ? DataTAG People at CNAF will register new DataTAG resources for CDF in the Information Index. Alessandro Cavalli, could you investigate the possibility of creating a DataTAG Information Index completely separated from the one used by DataGRID at CNAF ? 9. A CDF Replica Catalogue and a Storage Element will be setup on the DataTAG testbed to do simple experiments involving output registration in the GRID Catalogue for further processing ( sites involved: CNAF, Trieste ?) 10. By the end of April 2002, an EDG User Interface (portal to the EU EDG Grid) will be setup at FNAL on Italian dedicated hardware and possibly on the CDF central facility to test job submission on the European Grid from FNAL. 11. DataTAG (Roberto Cecchini ?) will coordinate the task of accepting EsnetCA signed certificates in the DataTAG testbed. 12. Luca Dell'Agnello (DataTAG/INFN) is investigating on the possibility of interfacing Kerberos5 tickets with the Globus Security Infrastracture (GSI). At FNAL Don Petravick is coordinating a similar effort. This is a step needed if we want to setup a CDF Grid Testbed at FNAL. 13. A more long term plan forsees the setup of an EDG Grid Testbed with FNAL facilities (FBS/SAM) fully integrated.... Thanks a lot for your time, patience and cooperation. Flavia