Problems (and solutions) in using EDG toolkit for CDF
Python
A glitch in UI installation on ncdf29.fnal.gov prevented all
EDG programs to run with messages about Python API incompatibility.
Solved by making sure python is /opt/edg/python/bin/python and not
/usr/bin/python. EDG needs Python v 2.1.
Output
To retireve job output in a given sub-dir, not /tmp, use
dg-job-get-output jobID -dir myoutput. myoutput directory must exist
Sandbox
Could not use relative path in outputsandbox (results/pippo.txt e.g.).
Could not find documentation about this.
Only examples in http://marianne.in2p3.fr/datagrid/documentation/
are with file names only, no link from that page to JDL and/or
similar docs.
Waiting in queue
No way to tell why job is submitted to testbed001.cnaf and waits there.
Are the other CE really busy ? How do I know WN status behind the CE ?
Is my job waiting becasue the WN are busy, or broken, or... ?
submit
Submitting a realistic job with 100~200MB executable takes
long time (6 min from Torinto to CNAF). What is used to copy the sandbox ?
AFS
no idea of how to find a CE/WN with AFS. Flavia says "ask CMS people"
dg-job-status
when the job is waiting on the CE queue there is no way to know why
and guess how long it will stay there. Also no way to know why it
was submitted to that CE instead of another one that maybe had free WN
missing CDF offline
It is possible to run a standard CDf job on a normal grid node,
that does not have CDF offline on. As long as no DB data nor any
configuration file is needed. Better to test first (simply run
without doing setup cdfsoft2) that the output is correct. Missing
configuration files may be things like input to MC generators etc.
There will be some DB connection timeouts, but should be harmless.
Nofification
How do I know when the job is done ? dg-job-submit -help says to Submit with -n stefanol.belforte@ts.infn.it option. Tried, seems to work, but message
about execution starting are at least 20 min late. And nothing was sent
about competion, still 4 hour after job is done and output retrieved.
Submit How to submit a job on a given CE:
dg-job-submit hello.jdl -resource grid001.pd.infn.it:2119/jobmanager-lsf-grid01pd
cfr. talk di Luciano Gaido a Villa gualino, feb 2002. http://server11.infn.it/testbed-grid/meetings/gualino.pdf pag 54.
Submit
I noticed that all my jobs are queued to the CNAF testbed CE, even if
already have jobs waiting there, have no indications that the other
possible
Certificates the automatic procedure /opt/edg/bin/pkcs12-extract
to extract the pem files does not work. Have to do (on Linux !!):
mkdir ~/.globus
cd ~/.globus
copy here the certificate p12 file, i.e. the certificate in p12 format
as obtained from Export from the netscape browser
make sure this file is readable only by the user
openssl pkcs12 -nocerts -in certificatename.p12 -out userkey.pem
openssl pkcs12 -clcerts -nokeys -in certificatename.p12 -out usercert.pem
chmod 600 userkey.pem
the certificate password (chosen when exporting it from netscape) will be
needed. Also at the time of the first openssl a PEM pass phrase is asked,
that is another password that will be stored in the userkey.pem file and
will be needed everytime that file is used to access the grid. Choose it
wisely.
Stefano Belforte
Last modified: Wed Apr 24 16:45:05 MET DST