SamUpload

A proposal for a tool that stores into SAM user's data. This is something built around sam store command, which handles single data storage into enstore.

Specifications

  1. granularity at user's level, not groups
  2. implement quota for users
  3. upload into enstore the default
  4. allow data to be fetched from outside a sam station
  5. allow for data to be imported into a SAM Cache other then enstore
  6. simple to use, easy, failsafe
  7. intended to be used for small data sets, ntuples e.g.
  8. limit data in each upload to O(10GB)
  9. data files in each upload are described simply by a dataset name
  10. no provision for other metadata, only information stored in SAM is
  11. SAM archive of physics data set (skimmed samples, MC etc.) needs solid metadata information and carefull checking, will have to be done separately

Usage Scenarios

  1. User puts data on ICAF area at FNAL
  2. User logs in local desktop and
  3. When users activate SamUploadGui
  4. User quits the GUI
  5. User cleans up ICAF him/herself as needed
  6. User can use SAM dataset editor or browser to veryfy that the data set is there and get its content, can verify file location with sam locate
  7. User/peac can now use SAM to process the files

Conceptual Design

FOr Fermilab/CAF:
  1. We need to write:
      SamUploadGUI: looks very much like CafGui, uses same .cafrc file to get mapping from site name to host/ports. We will add a few entries to .cafrc if/as needed.
    1. SamUploader: looks a bit like submitter, or maybe like peac's gm, since it talks to SAM, spawns a child to deal with each request from the GUI, is daemonized etc etc. It will be python and talk to SAM by spawning shell sam commands, or using SAM python i/f
    2. SamUploader has several tasks that should be modularized to a set of python modules/scripts that it invokes
    3. SamUploader will have the usual .cfg file to define all site specific things
    4. SamStoreTheseFiles script, this is a wrapper around sam store command. command that runs on fcdfsam to store users' files. It deals with the python scritp needed for metadata (--descrip) and with the pnfs path (--dest). It handles sam errors etc.
      • On non-fnal location, CNAF e.g. we do not user sam store, instead will move files to some local SAM cache disk and do a sam add location. Need to find a way to store CRC in this case.
      • SamDefineDatasetForTheseFiles, once they are in enstore, it cheks that they are there (sam locate ?, this is damn slow) and then does the needed magic with sam create dataset and sam create definition(see doc)
  2. We need to have
    1. One directory for every user in pnfs in enstore, like /pnfs/cdf/sam/belforte with possibility to create subdirs 001, 002 etc.
    2. Dmitri's magic that creates and uses automatically a new subdir when the file count reaches the pnfns performance limit
    3. A way to store a file CRC even if the file is not in enstore
    4. A way to make a local SAM cache using a local dCache that has no tape backend. Hopefully it is enough to write a file there using pnfs in e.g. /pnfs/cnaf.infn.it/a/b/c/file and then do a sam add location with location dcap://cdfdca.cnaf.infn.it:dcap://cdfdca.cnaf.infn.it:25125/pnfs/cnaf.infn.it/a/b/c/file or something similar

Stefano Belforte
Last modified: Tue Dec 2 11:46:11 CET 2003