The openBIS for Illumina NGS comes with predefined master data with the following schema:

To get the data into openBIS we have written drop boxes. Each drop box has a expected format how the data from the sequencer should be put in and will be later on processed and registered in openBIS.

(lightbulb) Check our subversion repository for latest source code of drop boxes:

http://svncisd.ethz.ch/repos/cisd/deep_sequencing_unit/trunk/sourceTest/core-plugins/illumina-qgf/2/dss/drop-boxes/

Drop Boxes for openBIS Illumina NGS

create-flowcell-hiseq
Recommended Data Set Structure
110715_SN792_0054_BC035RACXX/
	runParameters.xml
	RunInfo.xml

Parses the two Illumina provided files 'runParameters.xml' and 'RunInfo.xml'
and creates one Sample of type 'ILLUMINA_FLOW_CELL' and sets Sample properties
from those two XML files. Additionally the number of lanes are read out and
are created as contained samples of type 'ILLUMINA_FLOW_LANE'.


read-rta-timestamp
Recommended Data Set Structure
110715_SN792_0054_BC035RACXX/
        RTAComplete.txt

Reads out the time stamp of the file 'RTAComplete.txt' and sets the
property for a finished sequencer called 'SEQUENCER_FINISHED' to the
time stamp of this file.


register-basecall-stats
Recommended Data Set Structure
C035RACXX_1/
	Basecall_Stats_C035RACXX/
	DemultiplexedBustardSummary.xml
	DemultiplexedBustardConfig.xml
	Makefile
	DemultiplexConfig.xml
	SampleSheet.mk
	support.txt

Registers an incoming directory as a 'BASECALL_STATS' data set in openBIS.
The name of the directory is used to search for the matching sample.

 

register-flowcell
Recommended Data Set Structure
130208_SN792_0204_BD1W0VACXX/
	Basecalling_Netcopy_complete_Read2.txt
	Basecalling_Netcopy_complete.txt
	Config/
	Data/
	First_Base_Report.htm
	ImageAnalysis_Netcopy_complete_Read1.txt
	ImageAnalysis_Netcopy_complete_Read2.txt
	ImageAnalysis_Netcopy_complete.txt
	InterOp/
	Logs/
	PeriodicSaveRates/
	Recipe/
	RTAComplete.txt
	RunInfo.xml
	runParameters.xml
	Thumbnail_Images/

Registers an incoming directory as a data set in openBIS. The name of the directory is used to
search for the matching sample.


register-lane-hiseq
Recommended Data Set Structure
Project_130208_SN792_0204_BD1W0VACXX_2/
	Sample_BSSE_QGF_LIBRARY_239_130208_SN792_0204_BD1W0VACXX_2/
		BSSE_QGF_LIBRARY_239_130208_SN792_0204_BD1W0VACXX_2_GCCAAT_L002_R1_001.fastq.gz
	Sample_BSSE_QGF_LIBRARY_240_130208_SN792_0204_BD1W0VACXX_2/
		BSSE_QGF_LIBRARY_240_130208_SN792_0204_BD1W0VACXX_2_CAGATC_L002_R1_001.fastq.gz
	Sample_BSSE_QGF_LIBRARY_241_130208_SN792_0204_BD1W0VACXX_2/
		BSSE_QGF_LIBRARY_241_130208_SN792_0204_BD1W0VACXX_2_ACTTGA_L002_R1_001.fastq.gz
		BSSE_QGF_LIBRARY_241_130208_SN792_0204_BD1W0VACXX_2_ACTTGA_L002_R1_002.fastq.gz
	Sample_BSSE_QGF_LIBRARY_304_130208_SN792_0204_BD1W0VACXX_2/
		BSSE_QGF_LIBRARY_304_130208_SN792_0204_BD1W0VACXX_2_ACAGTG_L002_R1_001.fastq.gz
	Sample_lane2/
		BSSE_QGF_POOL_243_D1W0VACXX_lane2_Undetermined_L002_R1_001.fastq.gz

Processes each flow lane of a Sequencing run and attaches the fastq files to
the correct corresponding library samples


register-runstatistics
Recommended Data Set Structure
130208_SN792_0204_BD1W0VACXX/
	Data/
		reports/
		Status_Files/
		Status.htm

Registers an incoming directory as a 'RUNINFO' data set in openBIS. The name of
the directory is used to search for the matching sample.


register-unaligned
130208_SN792_0204_BD1W0VACXX/
	Unaligned_1/
		Basecall_Stats_D1W0VACXX/
		DemultiplexConfig.xml
		DemultiplexedBustardConfig.xml
		DemultiplexedBustardSummary.xml
		Makefile
		nohup.out
		Project_130208_SN792_0204_BD1W0VACXX_1/
		SampleSheet.mk
		support.txt
		Temp
		Undetermined_indices
	Unaligned_2/
	Unaligned_3/
	Unaligned_4/
	Unaligned_5/
	Unaligned_6/
	Unaligned_7/
	Unaligned_8/

Splits up this complex data set into two different data sets and
moves the corresponding file into those drop boxes 

  • No labels