Data Store Share Use Cases

Segmented Storage and Shares

The store of a Data Store Server (DSS) can be distributed among several storage devices. This segmentation of the store allows to add new storage when needed.

Each segment of the store is called a share. Shares are identified by numbers. They correspond to directories in the store directory specified by property storeroot-dir in service.properties of DSS. In case that the store and the share are on different storage devices the share directory will be a symbolic link to the storage. This assumes that all storage is mount on the machine running DSS.

The incoming directory of a drop-box has to be on a storage device which also contains a share. On start-up DSS finds for each incoming directory the corresponding share. It scans all sub-folders of the store and picks the first one to whom it can move a file from incoming directory. During data set registration all data sets will be moved into the share found by this scan. Such shares are called incoming shares. All other shares are called extension shares.

No Segmentation

Initially distributed storage isn't needed because there will be not much data. This situation is characterized as follows:

  • There are one or more drop boxes. Their incoming directories are all on the same disk (say /disk1).
  • The store is also on /disk1.
  • There is one share (share 1) which is a sub-folder of the store (not a symbolic link). This sub-folder is created automatically. This is the incoming share. All data sets will be stored in share 1.

Use Case: Filling up disk space

This is a very typical use case where a segmented store is needed:

  • Data sets are gradually filling up disk space.
  • Before the disk is full additional disk space has to be installed.

This use case can be handled as follows:

  1. Add the new storage and mount it (say at /disk2).
  2. Create in the store folder a symbolic link (named '2') to the new share.
  3. Add to service.properties of DSS:
    service.properties
    # Post registration task added
    maintenance-plugins = <other maintenance plugins>, post-registration
    post-registration.class = ch.systemsx.cisd.etlserver.postregistration.PostRegistrationMaintenanceTask
    # It runs every 5 minute
    post-registration.interval = 300
    post-registration.cleanup-tasks-folder = cleanup-tasks
    # It does post-registration for every data set registered after some date
    post-registration.ignore-data-sets-before-date = <some date e.g. 2011-10-28>
    # File which stores the last post-registered data set
    post-registration.last-seen-data-set-file = ${storeroot-dir}/last-seen-data-set.txt
    # Post-registration shuffles data sets from share 1 to 2.
    post-registration.post-registration-tasks = eager-shuffling
    post-registration.eager-shuffling.class = ch.systemsx.cisd.etlserver.postregistration.EagerShufflingTask
    post-registration.eager-shuffling.share-finder.class = ch.systemsx.cisd.etlserver.plugins.SimpleShufflingShareFinder
    # Shuffling is done until share 2 is full, i.e. if there is less free space than specified as a minimum. 
    post-registration.eager-shuffling.share-finder.minimum-free-space-in-MB = 1024
    # Admin will get an e-mail if free space on share 2 drops below a specified limit.
    # The e-mail address is specified in etc/log.xml
    post-registration.eager-shuffling.free-space-limit-in-MB-triggering-notification = 10240
    # If share 2 is full post-registration should be stopped.
    post-registration.eager-shuffling.stop-on-no-share-found = true
    

Each time disk space become low the first two steps (with incremented share number) should be done.

  • No labels