Submitting Data
|
Submitting Data
This site allows users to deposit their data into NCBI Probe Database. Currently our public submission tools are under development, however, we would be happy to assist users in deposition of large-scale data or isolated probes and associated experimental data.
For simple submissions please download corresponding templates below and send completed templates to
probe-admin@ncbi.nlm.nih.gov.
Note: in future submission formats and submission procedure can change to adjust to submitters' feedback and software development.
Due to tremendous variability of existing probe types and complexity of modern technologies it is impossible to develop simple and exhaustive templates for all of them. Therefore, if you did not find right template or have any questions please contact
probe-admin@ncbi.nlm.nih.gov
for additional templates and explanations.
Submission overview
Required files
- "[some name]_gen_info" - file with contact information and information that is common for all probes in the submission (text or document format)
- "[some name]_seq_info" - file with individual probes' data (tab-delimited or spreadsheet format)
Mandatory fields for all types of probes
- marked by asterisk (*) in contact and general information file
- #TRACKING - probe's unique identifier
- #PROBE_NAME - probe's name as it appears on the probe's title
- #PROBE_TYPE - registered probe types are found on Probe's home page
http://www.ncbi.nlm.nih.gov/sites/entrez?db=probe
- #DESIGN_ACCESSION - source or target sequence accession in GenBank (preferred), DDBJ, or the EMBL
on which probe design was based
Optional fields
All other fields are optional. However, we expect reasonable number of fields to be filled with data.
There are two types of optional fields:
- fields that are defined in the database such as #TGT_ACC (target accession), #VALIDATION, #TGT_TAXID and others
- fields that do not have special place in the database, for example, #PROTEIN_KNOCKDOWN%; such fields are concatenated and
displayed in "Result" or other appropriate section of the probe's report page
Basic rules
- sequences' directionality should be from 5' to 3'
- no fancy formatting in spreadsheets (uniform font, no borders, no colors other than default black)
- all fields in "general information" file (except for contact information) can be moved to data file and provided individually for each probe
- fields such as #MARKER_ALIASES, #PMID, #PRODUCT_SIZE, etc. can accept multiple values separated by simicolon
- any number of arbitrary fields can be included in the submission; examples: #ALLELE_NUMBER, #AMPLIFIES_IN_ORG, etc.; we reserve right to
archive and display these fields at our discretion; please explain meaning of these fields in email text or *_gen_info file
- fill empty cells with word 'NaN'
Additional comments
- probe's entry can contain more than one sequence
- sequence can have several features; obvious features, for example, universal primers can be briefly described in methodology text, however, some features do not have fixed position and are different for each probe (for example, "variation" feature);
please contact
probe-admin@ncbi.nlm.nih.gov for a template which will include features for sequences in your submission
- #VALIDATION field accepts values "comp fail", "comp success", "wet lab fail", "wet lab success".
The NCBI dbSTS and UniSTS databases
- dbSTS is a division of GenBank for archiving genetic markers that contain #AMPLICON sequences; please
visit dbSTS submission page for more information:
Submitting sequences to dbSTS
- UniSTS is a database of genetic markers associated with such information as genetic map and genomic position;
please visit UniSTS submission page for more information:
Submitting Map Data to UniSTS
- interface that will allow to submit data in dbSTS, UniSTS and Probe simultaneously is currently under development
Available templates
Template |
General info file (.doc) |
Sequence info file (.txt) |
Comment |
siRNA |
siRNA_gen_info.doc |
siRNA_seq_info.txt |
- for #TARGET_ACC provide RefSeq
(RefSeq Project)
accession of any mRNA transcript of the targeted gene (for example, NM_000461.4),
preferably with version
- if #TARGET_ACC is provided #ORGANISM_TAXID and #GENE_ID become unnecessary
- if sequences cannot be disclosed but mapping to target gene is desired, please contact
probe-admin@ncbi.nlm.nih.gov for options
- #SENSE_OH_* and #ANTISENSE_OH_* mean "sense overhang" and "antisense overhang", respectively
|
shRNA |
shRNA_gen_info.doc |
shRNA_seq_info.txt |
- for #TARGET_ACC provide RefSeq
(RefSeq Project)
accession of any mRNA transcript of the targeted gene (for example, NM_000461.4),
preferably with version
- if #TARGET_ACC is provided #ORGANISM_TAXID and #GENE_ID become unnecessary
- if sequences cannot be disclosed but mapping to target gene is desired, please contact
probe-admin@ncbi.nlm.nih.gov for options
- #HAIRPIN_SEQ is a sequence of "straightened out" hairpin; usually it contains sense and antisense sequences
separated by a spacer (loop);
example:
Pr008816793;
note: for software position count starts from 0, i. e.
#SENSE_POS1 and #SENSE_POS2 should be submitted as "0" and "20" and #ANTISENSE_POS1 and
#ANTISENSE_POS2 should be submitted as "32" and "52", respectively
|
dsRNA or esiRNA |
dsRNA_gen_info.doc |
dsRNA_seq_info.txt |
- for #TARGET_ACC provide RefSeq
(RefSeq Project)
accession of any mRNA transcript of the targeted gene (for example, NM_000461.4),
preferably with version
- #DESIGN_ACC is GenBank accession of sequence that was used for primer design
- if #TARGET_ACC or #DESIGN_ACC are known #ORGANISM_TAXID is not necessary
- #PRIMER_F2_* and #PRIMER_R2_* fields are for information about nested primers (if applicable)
|
antisense or morpholino |
antisense_gen_info.doc |
antisense_seq_info.txt |
- for #TARGET_ACC provide RefSeq
(RefSeq Project)
accession of any mRNA transcript of the targeted gene (for example, NM_000461.4),
preferably with version
- if #TARGET_ACC is provided #ORGANISM_TAXID and #GENE_ID become unnecessary
- if sequences cannot be disclosed but mapping to target gene is desired, please contact
probe-admin@ncbi.nlm.nih.gov for options
|
Primer set |
PrimerSet_gen_info.doc |
PrimerSet_seq_info.txt |
- #PRIMER_F2_* and #PRIMER_R2_* fields are for information about nested primers (if applicable)
- please c
|
TaqMan |
Taqman_gen_info.doc |
Taqman_seq_info.txt |
- #PRIMER_F2_* and #PRIMER_R2_* fields are for information about nested primers (if applicable)
|
genetic marker: STS, SSR, RFLP, AFLP, SSLP, SSCP |
marker_gen_info.doc |
marker_seq_info.txt |
- if your markers are associated with sequenced PCR products (#AMPLICON field)
and/or mapping data (map, chromosome/linkage group, map position, etc.) please contact
probe-admin@ncbi.nlm.nih.gov
for options (you might want to submit the sequenced amplicons and/or mapping data to GenBank or
UniSTS databases, respectively)
|
FISH (Fluorescence In Situ Hybridization) |
FISH_gen_info.doc |
FISH_seq_info.txt |
- #TARGET_TAXID and #NON_TARGET_TAXID fields can accept
multiple values delimited by semi-colon
-
please provide any other additional fields as necessary
- please use #COMMENT field to provide any
additional information about this probe
|
|