The openBIS for Illumina NGS comes with predefined master data with the following schema:
To get the data into openBIS we have written drop boxes. Each drop box has a expected format how the data from the sequencer should be put in and will be later on processed and registered in openBIS.
Check our subversion repository for latest source code of drop boxes:
Drop Boxes for openBIS Illumina NGS
create-flowcell-hiseq
110715_SN792_0054_BC035RACXX/ runParameters.xml RunInfo.xml
Parses the two Illumina provided files 'runParameters.xml' and 'RunInfo.xml'
and creates one Sample of type 'ILLUMINA_FLOW_CELL' and sets Sample properties
from those two XML files. Additionally the number of lanes are read out and
are created as contained samples of type 'ILLUMINA_FLOW_LANE'.
read-rta-timestamp
110715_SN792_0054_BC035RACXX/ RTAComplete.txt
Reads out the time stamp of the file 'RTAComplete.txt' and sets the
property for a finished sequencer called 'SEQUENCER_FINISHED' to the
time stamp of this file.
register-basecall-stats
C035RACXX_1/ Basecall_Stats_C035RACXX/ DemultiplexedBustardSummary.xml DemultiplexedBustardConfig.xml Makefile DemultiplexConfig.xml SampleSheet.mk support.txt
Registers an incoming directory as a 'BASECALL_STATS' data set in openBIS.
The name of the directory is used to search for the matching sample.
register-flowcell
130208_SN792_0204_BD1W0VACXX/ Basecalling_Netcopy_complete_Read2.txt Basecalling_Netcopy_complete.txt Config/ Data/ First_Base_Report.htm ImageAnalysis_Netcopy_complete_Read1.txt ImageAnalysis_Netcopy_complete_Read2.txt ImageAnalysis_Netcopy_complete.txt InterOp/ Logs/ PeriodicSaveRates/ Recipe/ RTAComplete.txt RunInfo.xml runParameters.xml Thumbnail_Images/
Registers an incoming directory as a data set in openBIS. The name of the directory is used to
search for the matching sample.
register-lane-hiseq
Project_130208_SN792_0204_BD1W0VACXX_2/ Sample_BSSE_QGF_LIBRARY_239_130208_SN792_0204_BD1W0VACXX_2/ BSSE_QGF_LIBRARY_239_130208_SN792_0204_BD1W0VACXX_2_GCCAAT_L002_R1_001.fastq.gz Sample_BSSE_QGF_LIBRARY_240_130208_SN792_0204_BD1W0VACXX_2/ BSSE_QGF_LIBRARY_240_130208_SN792_0204_BD1W0VACXX_2_CAGATC_L002_R1_001.fastq.gz Sample_BSSE_QGF_LIBRARY_241_130208_SN792_0204_BD1W0VACXX_2/ BSSE_QGF_LIBRARY_241_130208_SN792_0204_BD1W0VACXX_2_ACTTGA_L002_R1_001.fastq.gz BSSE_QGF_LIBRARY_241_130208_SN792_0204_BD1W0VACXX_2_ACTTGA_L002_R1_002.fastq.gz Sample_BSSE_QGF_LIBRARY_304_130208_SN792_0204_BD1W0VACXX_2/ BSSE_QGF_LIBRARY_304_130208_SN792_0204_BD1W0VACXX_2_ACAGTG_L002_R1_001.fastq.gz Sample_lane2/ BSSE_QGF_POOL_243_D1W0VACXX_lane2_Undetermined_L002_R1_001.fastq.gz
Processes each flow lane of a Sequencing run and attaches the fastq files to
the correct corresponding library samples
register-runstatistics
130208_SN792_0204_BD1W0VACXX/ Data/ reports/ Status_Files/ Status.htm
Registers an incoming directory as a 'RUNINFO' data set in openBIS. The name of
the directory is used to search for the matching sample.
register-unaligned
130208_SN792_0204_BD1W0VACXX/ Unaligned_1/ Basecall_Stats_D1W0VACXX/ DemultiplexConfig.xml DemultiplexedBustardConfig.xml DemultiplexedBustardSummary.xml Makefile nohup.out Project_130208_SN792_0204_BD1W0VACXX_1/ SampleSheet.mk support.txt Temp Undetermined_indices Unaligned_2/ Unaligned_3/ Unaligned_4/ Unaligned_5/ Unaligned_6/ Unaligned_7/ Unaligned_8/
Splits up this complex data set into two different data sets and
moves the corresponding file into those drop boxes