Introduction

The Excel import service reads xls definitions for both types and entities and send them to openBIS. It is the replacement of the old master data scripts adding support for the creation of openBIS entities.

The goals are:

For common users an import format with the following features to avoid the shortcomings of the old format:

  • Recognisable labels as column names.
  • Multi-type imports.
  • Parents/Children creation and linking on a single import.

For advanced users like consultants and plugin developers a tool that allows to specify on an Excel sheet:

  • Metadata model.
  • Basic entity structures used for navigation.

Modes

To support different use cases the import service supports the next modes, specifying one of them is mandatory.

  • UPDATE IF EXISTS: This one should be the default mode to use to make incremental updates.
  • IGNORE EXISTING: This mode should be used when the intention is to ignore updates. Existing entities will be ignored. That way is possible to avoid unintentionally updating entities and at the same time adding new ones.
  • FAIL IF EXISTS: This mode should be used when the intention is to fail if anything is found. That way is possible to avoid making any unintentional changes.

Organising Definition Files

All data can be arranged according to the needs of the user, in any number of files and any number of worksheets. All files have to be in one directory.

The names of the files and worksheets are ignored by the service, the user is advised to use descriptive names that they can quickly remember/refer to later.

If there are dependencies between files they should be submitted together or an error will be shown.

Example:

We want to define vocabularies and sample types with properties using these vocabularies. We can arrange our files in several ways:

  1. put vocabulary and sample types in separate files named i.e vocabulary.xls and sample_types.xlsx respectively
  2. put vocabulary and sample types in different worksheets in the same xls file
  3. put everything in one worksheet in the same file

Organising Definitions

Type definitions:

The order of type definitions is not important for the Excel import service, with exception of Vocabularies, those need to be placed before the property types that use them.

Entity definitions:

Type definitions for the entities should already exist in the database at the time when entities are registered. Generally Entity definitions are placed at the end.

Text cell formatting (colours, fonts, font style, text decorations)

All types of formatting are permitted, and users are encouraged to use them to make their excel files more readable. Adding any non text element (table, clipart) will cause the import to fail.

(A valid, but not easily readable, example)

Definition, rows and sheet formatting

  • A valid sheet has to start with definition on the first row.
  • Each definition has to be separated by one empty row.
  • Two or more consecutive empty rows mark the end of the definitions.
  • Empty spaces at the beginning or end of headers are silently eliminated.


If any content is placed after two consecutive empty rows it will result in an error. This is to alert the user and avoid silently ignoring content.

Header rows NEED TO BE a valid attribute of the entity or entity type, property label or property code.

Any unintended header will result in an error. This is to avoid possible misspellings and avoid silently ignoring content.

Entity Types Definitions

All entity types can be created. There are differences due to the nature of the defined elements themselves.

Vocabulary and Vocabulary Term

Vocabulary

HeadersMandatory
VersionYes
CodeYes
DescriptionYes

Vocabulary Term

HeadersMandatory
VersionYes
CodeYes
LabelYes
DescriptionYes

Example

VOCABULARY_TYPE


VersionCodeDescription
1$STORAGE.STORAGE_VALIDATION_LEVELValidation Level
VersionCodeLabelDescription
1RACKRack Validation
1BOXBox Validation
1BOX_POSITIONBox Position Validation

Experiment Type

HeadersMandatory
VersionYes
CodeYes
DescriptionYes
Validation scriptYes
Ontology IdNo
Ontology VersionNo
Ontology Annotation IdNo

Example

EXPERIMENT_TYPE









VersionCodeDescriptionValidation script






2DEFAULT_EXPERIMENT
date_range_validation.py






Sample Type

HeadersMandatory
VersionYes
CodeYes
DescriptionYes
Auto generate codesYes
Validation scriptYes
Generate code prefixYes
Ontology IdNo
Ontology VersionNo
Ontology Annotation IdNo

Example

SAMPLE_TYPE




VersionCodeDescriptionAuto generate codesValidation scriptGenerated code prefix
2STORAGE_POSITION
TRUEstorage_position_validation.pySTO

Dataset Type

HeadersMandatory
VersionYes
CodeYes
DescriptionYes
Validation scriptYes
Ontology IdNo
Ontology VersionNo
Ontology Annotation IdNo

Example

DATASET_TYPE


VersionCodeDescriptionValidation script
1RAW_DATA

Property Type

A property type can exist unassigned to an entity type or assigned to an entity type.

HeadersMandatory AssignedMandatory Unassigned
VersionYesYes
CodeYesYes
MandatoryNoYes
Show in edit viewsNoYes
SectionNoYes
Property labelYesYes
Data typeYesYes
Vocabulary codeYesYes
DescriptionYesYes
MetadataNoNo
Dynamic scriptNoNo
Ontology IdNoNo
Ontology VersionNoNo
Ontology Annotation IdNoNo


A property type requires a data type to be defined, valid data types are.

Data typeDescription
INTEGER
REAL
VARCHARText of any length but displayed as a single line field.
MULTILINE_VARCHARText of any length but displayed as a multi line field.
HYPERLINK
BOOLEAN
CONTROLLEDVOCABULARY
XML
TIMESTAMP
DATE
SAMPLESample of any type.
SAMPLE:<SAMPLE_TYPE>Sample of the indicated type.


Example Unassigned Property

In this case, the property is registered without being assigned to a type, and  the block of property types uses the PROPERTY_TYPE block.

PROPERTY_TYPE







VersionCodeMandatoryShow in edit viewsSectionProperty labelData typeVocabulary codeDescription
1$WELL.COLOR_ENCODED_ANNOTATIONFALSETRUE
Color AnnotationCONTROLLEDVOCABULARY$WELL.COLOR_ENCODED_ANNOTATIONSColor Annotation for plate wells
1ANNOTATION.SYSTEM.COMMENTSFALSETRUE
CommentsVARCHAR
Comments
1ANNOTATION.REQUEST.QUANTITY_OF_ITEMSFALSETRUE
Quantity of ItemsINTEGER
Quantity of Items
2$BARCODEFALSEFALSE
Custom BarcodeVARCHAR
Custom Barcode

Example Assigned

In this case the property types are assigned to a sample type and the block of property types belong to the entity type block (SAMPLE_TYPE in this case).

SAMPLE_TYPE









VersionCodeDescriptionAuto generate codesValidation scriptGenerated code prefix




1ENTRY
TRUE
ENTRY




VersionCodeMandatoryShow in edit viewsSectionProperty labelData typeVocabulary codeDescriptionMetadataDynamic script
1$NAMEFALSETRUEGeneral infoNameVARCHAR
Name

1$SHOW_IN_PROJECT_OVERVIEWFALSETRUEGeneral infoShow in project overviewBOOLEAN
Show in project overview page

1$DOCUMENTFALSETRUEGeneral infoDocumentMULTILINE_VARCHAR
Document{ "custom_widget" : "Word Processor" }
1$ANNOTATIONS_STATEFALSEFALSE
Annotations StateXML
Annotations State

Entity Type Validation Script and Property Type Dynamic Script

Scripts have to reside in .py files in the scripts directory within the folder that contains the Excel files.

Within scripts, files can be organised in any suitable setup:

In order to refer to a validation or dynamic script (e.g. storage_position_validation.py below), the relative path (from the scripts directory) to the file has to be provided in the relevant column. See the example columns below.

Example

SAMPLE_TYPE









VersionCodeDescriptionAuto generate codesValidation scriptƒgreGenerated code prefix




2STORAGE_POSITION
TRUEstorage_position_validation.pySTO




VersionCodeMandatoryShow in edit viewsSectionProperty labelData typeVocabulary codeDescriptionMetadataDynamic script
1$STORAGE_POSITION.STORAGE_CODEFALSETRUEPhysical StorageStorage CodeVARCHAR
Storage Code

1$STORAGE_POSITION.STORAGE_RACK_ROWFALSETRUEPhysical StorageStorage Rack RowINTEGER
Number of Rows

1$STORAGE_POSITION.STORAGE_RACK_COLUMNFALSETRUEPhysical StorageStorage Rack ColumnINTEGER
Number of Columns

1$STORAGE_POSITION.STORAGE_BOX_NAMEFALSETRUEPhysical StorageStorage Box NameVARCHAR
Box Name

1$STORAGE_POSITION.STORAGE_BOX_SIZEFALSETRUEPhysical StorageStorage Box SizeCONTROLLEDVOCABULARY$STORAGE_POSITION.STORAGE_BOX_SIZEBox Size

1$STORAGE_POSITION.STORAGE_BOX_POSITIONFALSETRUEPhysical StorageStorage Box PositionVARCHAR
Box Position

1$STORAGE_POSITION.STORAGE_USERFALSETRUEPhysical StorageStorage User IdVARCHAR
Storage User Id

1$XMLCOMMENTSFALSEFALSE
CommentsXML
Comments log

1$ANNOTATIONS_STATEFALSEFALSE
Annotations StateXML
Annotations State



Entity Types Versioning

General Usage 

Version is a mandatory field for entity types, it just starts at 1; and during updating a type definition is expected to increment it; otherwise, the system will ignore the changes.

Explanation

Additionally, use the keyword FORCE to reinstall the type even if is present and has been deleted.

The system keeps track of what versions of entities have been installed storing this information, so in the future when one updates their types, the version specified in the spreadsheet is checked against the stored version.

For every TYPE found in the Excel sheet the next algorithm is performed:

IF ENTITY OR (TYPE.Version > STORED_VERSION) OR (TYPE.Version == FORCE): // If is a new version
	IF ITEM NOT EXISTS in openBIS:
		CREATE ITEM                             
	ELSE: // Doesn't exist branch
		IF FAIL_IF_EXISTS:
			THROW EXCEPTION
		IF UPDATE_IF_EXISTS:
			UPDATE ITEM
		ELSE IF IGNORE_EXISTING:
     		PASS // Ignore as requested
ELSE:
	PASS // Ignore object that have not been updated


Entity Definitions

Most entities can be created, excluding DataSets. There are differences due to the nature of the defined elements themselves.

General Rules:

  • Header order is arbitrary.
  • When referring to another entity only Identifiers are allowed. Sample Variables are the only exception.
  • Vocabulary values in property value rows can be referred to by either the vocabulary term code or the vocabulary term label.


If a mandatory header is missing it results in an error.

Repeated headers will result in an error, in case a Property shares Label with an Attribute is encouraged to use the property code instead.

Space

HeadersMandatory
CodeYes
DescriptionYes

Example

SPACE
CodeDescription
ELN_SETTINGSELN Settings
DEFAULT_LAB_NOTEBOOKDefault Lab Notebook
METHODSFolder for methods
MATERIALSFolder for th materials
STOCK_CATALOGFolder for the catalog
STOCK_ORDERSFolder for orders
PUBLICATIONSFolder for publications

Project

HeadersMandatory
IdentifierYes on UPDATES, ignored on INSERT
CodeYes
SpaceYes
DescriptionYes

Example

PROJECT


IdentifierCodeDescriptionSpace
/DEFAULT_LAB_NOTEBOOK/DEFAULT_PROJECTDEFAULT_PROJECTDefault ProjectDEFAULT_LAB_NOTEBOOK
/METHODS/PROTOCOLSPROTOCOLSProtocolsMETHODS
/STOCK_CATALOG/PRODUCTSPRODUCTSProductsSTOCK_CATALOG
/STOCK_CATALOG/SUPPLIERSSUPPLIERSSuppliersSTOCK_CATALOG
/STOCK_CATALOG/REQUESTSREQUESTSRequestsSTOCK_CATALOG
/STOCK_ORDERS/ORDERSORDERSOrdersSTOCK_ORDERS
/ELN_SETTINGS/TEMPLATESTEMPLATESTemplatesELN_SETTINGS
/PUBLICATIONS/PUBLIC_REPOSITORIESPUBLIC_REPOSITORIESPublic RepositoriesPUBLICATIONS

Experiment

HeadersMandatory
IdentifierYes on UPDATES, ignored on INSERT
CodeYes
ProjectYes
Property CodeNo
Property LabelNo

Example

EXPERIMENT



Experiment type



COLLECTION



IdentifierCodeProjectNameDefault object type
/METHODS/PROTOCOLS/GENERAL_PROTOCOLSGENERAL_PROTOCOLS/METHODS/PROTOCOLSGeneral ProtocolsGENERAL_PROTOCOL
/STOCK_CATALOG/PRODUCTS/PRODUCT_COLLECTIONPRODUCT_COLLECTION/STOCK_CATALOG/PRODUCTSProduct CollectionPRODUCT
/STOCK_CATALOG/SUPPLIERS/SUPPLIER_COLLECTIONSUPPLIER_COLLECTION/STOCK_CATALOG/SUPPLIERSSupplier CollectionSUPPLIER
/STOCK_CATALOG/REQUESTS/REQUEST_COLLECTIONREQUEST_COLLECTION/STOCK_CATALOG/REQUESTSRequest CollectionREQUEST
/STOCK_ORDERS/ORDERS/ORDER_COLLECTIONORDER_COLLECTION/STOCK_ORDERS/ORDERSOrder CollectionORDER
/ELN_SETTINGS/TEMPLATES/TEMPLATES_COLLECTIONTEMPLATES_COLLECTION/ELN_SETTINGS/TEMPLATESTemplate Collection
/PUBLICATIONS/PUBLIC_REPOSITORIES/PUBLICATIONS_COLLECTIONPUBLICATIONS_COLLECTION/PUBLICATIONS/PUBLIC_REPOSITORIESPublications CollectionPUBLICATION

Sample

HeadersMandatory
$No
IdentifierYes on UPDATES, ignored on INSERT
CodeNo
ProjectNo
ExperimentNo
Auto generate codeNo
ParentsNo
ChildrenNo
Property CodeNo
Property LabelNo

Example

SAMPLE





Sample type





ORDER





$IdentifierCodeSpaceProjectExperimentOrder Status

/ELN_SETTINGS/TEMPLATES/ORDER_TEMPLATEORDER_TEMPLATEELN_SETTINGS/ELN_SETTINGS/TEMPLATES/ELN_SETTINGS/TEMPLATES/TEMPLATES_COLLECTIONNot yet ordered

Defining Parent and Children in Samples

Parent and child columns can be used to define relations between samples. Samples can be addressed by:

  1. $ : Variables, only really useful during batch inserts for samples with autogenerated codes since Identifiers can't be known. Variables SHOULD start with $.
  2. Identifiers


Parents and children SHOULD be separated by an end of line, each sample should be in its own line.


SAMPLE







Sample type







ORDER







$ParentsChildrenIdentifierCodeSpaceProjectExperimentOrder Status



/ELN_SETTINGS/TEMPLATES/ORDER_TEMPLATE_AORDER_TEMPLATEELN_SETTINGS/ELN_SETTINGS/TEMPLATES/ELN_SETTINGS/TEMPLATES/TEMPLATES_COLLECTIONNot yet ordered
$B



/ELN_SETTINGS/TEMPLATES/ORDER_TEMPLATE_BORDER_TEMPLATEELN_SETTINGS/ELN_SETTINGS/TEMPLATES/ELN_SETTINGS/TEMPLATES/TEMPLATES_COLLECTIONNot yet ordered

/ELN_SETTINGS/TEMPLATES/ORDER_TEMPLATE_A

$B

/ELN_SETTINGS/TEMPLATES/ORDER_TEMPLATE_D/ELN_SETTINGS/TEMPLATES/ORDER_TEMPLATE_CORDER_TEMPLATEELN_SETTINGS/ELN_SETTINGS/TEMPLATES/ELN_SETTINGS/TEMPLATES/TEMPLATES_COLLECTIONNot yet ordered



/ELN_SETTINGS/TEMPLATES/ORDER_TEMPLATE_DORDER_TEMPLATEELN_SETTINGS/ELN_SETTINGS/TEMPLATES/ELN_SETTINGS/TEMPLATES/TEMPLATES_COLLECTIONNot yet ordered

Properties and Sample Variables

As a general rule, properties would only accept data of the specified type.

Sample properties would typically require an Identifier to be given but a variable '$' could be used instead for a sample declared at any point of the document, including cyclical dependencies. This is useful for scenarios where Sample codes are autogenerated and can't be known in advance.

Entity Versioning

They don't have versioning, only entity types have versioning.

Master Data as a Core Plugin

The master data plugin is an AS core plugin.


Directory structure (important) :

Use standard initialize-master-data.py handle as it is ingested by openbis on startup. Excel files should be organised in master-data directory in the same plugin and scripts should be contained in scripts directory under master-data.

Contents of initialize-master-data.py:

from ch.ethz.sis.openbis.generic.server.asapi.v3 import ApplicationServerApi
from ch.systemsx.cisd.openbis.generic.server import CommonServiceProvider
from ch.ethz.sis.openbis.generic.asapi.v3.dto.service.id import CustomASServiceCode
from ch.ethz.sis.openbis.generic.asapi.v3.dto.service import CustomASServiceExecutionOptions
from ch.systemsx.cisd.openbis.generic.server.jython.api.v1.impl import MasterDataRegistrationHelper
import sys

helper = MasterDataRegistrationHelper(sys.path)
api = CommonServiceProvider.getApplicationContext().getBean(ApplicationServerApi.INTERNAL_SERVICE_NAME)
sessionToken = api.loginAsSystem()
props = CustomASServiceExecutionOptions().withParameter('xls', helper.listXlsByteArrays()) \
    .withParameter('xls_name', 'ELN-LIMS-LIFE-SCIENCES').withParameter('update_mode', 'UPDATE_IF_EXISTS') \
    .withParameter('scripts', helper.getAllScripts())
result = api.executeCustomASService(sessionToken, CustomASServiceCode("xls-import-api"), props)


There are following parameters to fill (Easiest is to use MasterDataRegistrationHelper to evaluate parameter values):

  • 'xls': Array of excel files. It can be easily acquired by calling helper.listXlsByteArrays or listCsvByteArrays.
  • 'xls_name' - Name for the batch, it is used by versioning system.
  • 'update_mode' - See "Modes" section.
  • 'scripts' - if you have any scripts in your data, provide them here. It is easiest to get it with MasterDataRegistrationHelper getAllScripts function.

'results' object is a summary of what has been created.

Example

For an complete up to date example, please check the eln-lims-life-sciences plugin that ships with the installer or on the official Git repository:

https://sissource.ethz.ch/sispub/openbis/-/tree/master/openbis_standard_technologies/dist/core-plugins/eln-lims-life-sciences/1/as

Or download the complete plugin using the next link:

https://sissource.ethz.ch/sispub/openbis/-/archive/master/openbis-master.zip?path=openbis_standard_technologies/dist/core-plugins/eln-lims-life-sciences

Known Limitations

  • Property type assignments to entity types cannot be updated since the current V3 API does not support this functionality. This means that a change in the order of assignments or group names during an update will be ignored.



  • No labels