SamUpload
A proposal for a tool that stores into SAM user's data.
This is something built around sam store command,
which handles single data storage into enstore.
Specifications
- granularity at user's level, not groups
- implement quota for users
- upload into enstore the default
- allow data to be fetched from outside a sam station
- allow for data to be imported into a SAM Cache other then enstore
- simple to use, easy, failsafe
- intended to be used for small data sets, ntuples e.g.
- limit data in each upload to O(10GB)
- data files in each upload are described simply by a dataset name
- no provision for other metadata, only information stored in SAM is
- data set name
- upload date
- data set size
- number of files
- list of file names with sizes and crc's
- user's name (32 characters)
- user's description comments (some arbitrary lenght)
- SAM archive of physics data set (skimmed samples, MC etc.) needs
solid metadata information and carefull checking, will have to be
done separately
Usage Scenarios
- User puts data on ICAF area at FNAL
- User logs in local desktop and
- get k5 ticket
- starts SamUploadGui
- give name of directory on icaf area
- receives the list of files,sizes, and quota before/after upload
- activate upload
- wait 60sec maximum for confirmation of scheduled transfer
- receives confirmation when data are in enstore/SAM on SamUploadGUI
if still open. By e-mail in any case
- When users activate SamUploadGui
- it authenticates user with SamUploader daemon running on fcdfsam.
There will be a way to specify a different location (SamUploader running
at CNAF e.g.), using the same method as CAF submitter
- SamUploader writes a mutex lock file, to make sure user is not
running two at the same time, if can't, fails. There will be a
way for user to force cleanup of abroted sessions requesting
cancellation of this lock. THe SamUplaoder will check the lock
at every step, and fail if missing.
- SamUploader does sanity checks on dset name and description:
characters are OK, lenght is OK, dset name is unique, etc.
- SamUploadGui has a way to get from SamUploader a list of
dset defined for this user and list it
- SamUploadGui locates user's icaf area on CAF in the same way as icaf_gftp.
- SamUploadGui gets the ICAF sub-directory from form and sends it to SamUploader
- SamUploader users kerberized rsh (or ssh on CNAF e.g.) to do
ls and du on the icaf area and retrieve sizes
- SamUploader checks user's quota (max/current) in a local ASCII
file on fcdfsam (evnetually it will be in SAM DB ?), maintained
by hand editing to begin with.
- If all OK, SamUploader Copy data from ICAF to fcdfdam (progress bar?).
SamUploader gives files new "unique" names by adding user's name
and e.g. a progressive dset ID (stored toghether with quota). Something
like belforte_12_bpipi-0.root, belforte_12_bpipi-1.root etc, so
that we can have wurthwein_12_bpipi-12.root and belforte can
just remake bpipi-*.root and store them again.
- SamUloader telll user this was OK and prepare a script that calls
sam store for each file
- SamUploader spawns the script
- when the script completes, SamUploader tell the GUI (if the GUI
is still there).
- The script runs inside a wrapping script (Maybe somethign more
robust is needed, I (Stefano) do not know hwo to do this).
so that exit status is
captured and e-mailed to user wether good or fail. Also the
wrapper updates the quota, cretes the SAM data set and removes
the lock. We can start
charging user only for datasets that made to enstore OK. In case
of partial success, we write only some files in ensore, do not
record them in SAM, and forget about the extra files (unless there
is a way to delete them from enstore).
- User quits the GUI
- User cleans up ICAF him/herself as needed
- User can use
SAM dataset editor
or browser
to veryfy that the
data set is there and get its content, can verify file location
with sam locate
- User/peac can now use SAM to process the files
Conceptual Design
FOr Fermilab/CAF:
- We need to write:
SamUploadGUI: looks very much like CafGui, uses same
.cafrc file to get mapping from site name to host/ports.
We will add a few entries to .cafrc if/as needed.
- SamUploader: looks a bit like submitter, or maybe like peac's gm,
since it talks to SAM, spawns a child to deal with each request from
the GUI, is daemonized etc etc. It will be python and talk to SAM by
spawning shell sam commands, or using SAM python i/f
- SamUploader has several tasks that should be modularized to
a set of python modules/scripts that it invokes
- SamUploader will have the usual .cfg file to define all
site specific things
- SamStoreTheseFiles script, this is a wrapper around
sam store command.
command that runs on fcdfsam to store users' files. It deals
with the python scritp needed for metadata (--descrip) and with the
pnfs path (--dest). It handles sam errors etc.
- On non-fnal location, CNAF e.g. we do not user sam store,
instead will move files to some local SAM cache disk and do a
sam add location.
Need to find a way to store CRC in this case.
- SamDefineDatasetForTheseFiles, once they are in enstore, it
cheks that they are there (sam locate ?, this is damn slow)
and then does the needed magic with sam create dataset and
sam create definition(see doc)
- We need to have
- One directory for every user in pnfs in enstore, like
/pnfs/cdf/sam/belforte with possibility to create
subdirs 001, 002 etc.
- Dmitri's magic that creates and uses automatically a new
subdir when the file count reaches the pnfns performance limit
- A way to store a file CRC even if the file is not in enstore
- A way to make a local SAM cache using a local dCache that
has no tape backend. Hopefully it is enough to write a
file there using pnfs in e.g. /pnfs/cnaf.infn.it/a/b/c/file
and then do a sam add location with location
dcap://cdfdca.cnaf.infn.it:dcap://cdfdca.cnaf.infn.it:25125/pnfs/cnaf.infn.it/a/b/c/file or something similar
-
-
-
-
-
-
-
Stefano Belforte
Last modified: Tue Dec 2 11:46:11 CET 2003