Databanks available at the Belgian EMBNet Node (BEN)
out the databanks at BEN
Search the databanks using
the Sequence Retieval System (SRS)
- EMBL / EMBLNEW
The European Molecular Biology Laboratory databank is meant to
contain all DNA and RNA sequences that have ever been published,
including sequences from patents. It is maintained at the European
Bioinformatics Institute, an outstation of the EMBL at Hinxton
(England). The last major release (51) contains 1438050 sequences
with a total of more than 931.10, bases. The table EMBL contains
the last release, the table EMBLNEW contains sequences that were
added since the last release and that BEN retrieves each night
from the EBI site.
- GENBANK / GENBANKNEW
GenBank is maintained at the National Center for Biotechnology
Information (NCBI, Bethesda, Maryland). NCBI is the American
counterpart of EMBL. An agreement exists between EMBL, GenBank and
the DNA Data Bank of Japan managers to exchange submitted
sequences and to attribute them the same accession number. At the
BEN site we need to keep only the small set of GenBank sequences
(a few thousands) that are not present in EMBL.
From EMBL/EMBLNEW and GENBANK/GENBANKNEW, we build everyday our
non-redundant nucleic acids databank for the GCG software.
- SWISSPROT / SWISSNEW
SWISS-PROT is maintained at the University of Geneva under the
direction of Amos Bairoch, in collaboration with the EBI. It is a
general protein sequence databank, containing translations of open
reading frames identified in DNA as well as results from genuine
protein sequencing. The sequences are accompanied by a carefully
edited and very valuable documentation, including many
cross-references to other databanks. BEN retrieves each week new
sequences from the Expasy site at Geneva
The Protein Information Resource is another general protein
databank, maintained at the National Biomedical Research
Foundation (Georgetown, Washington), in collaboration with the
Munich Information Center for Protein Sequences (MIPS, Germany).
At the BEN site we keep only the set of PIR sequences that are not
found in SWISS-PROT (the exclusion is based on cross-references in
the databank). We have however arranged that entries retrieved
from SWISSPROT/SWISSNEW at the BEN site contain a hyperlink
pointing to the corresponding PIR entry at the SRS server of the
EBI, so that the interested user can access it easily.
- GENPEPT / GENPEPTNEW -
translations of open reading frames identified in GenBank
sequences. At the BEN site a set non-redundant with SWISS-PROT and
PIR is updated weekly.
From SWISSPROTISWISSNEW, PIR and GENPEPTIGENPEPTNEW, we build
every week our non-redundant Protein databank for the GCG
- VECTOR -
vector-ig is a databank of vector sequences including plasmids,
phagemids, phasmids, cosmids, phages, and YAC'S.
Aidsbase contains sequences from HIV and SIV and a number of
related sequences. It is maintained at Los Alamos National
Laboratory (New Mexico).
The Protein Data Bank contains 3D-structures (atomic co-ordinates)
of proteins, nucleic acids and complex carbohydrates, as
determined by Xray crystallography or NMR. It is maintained at
Brookhaven National Laboratory (New York). At the BEN site, the
PDB is updated weekly. Under GCG, two databanks of sequences (one
with proteins and one with nucleic acids) extracted from the PDB
are maintained. The user can perform a fasta similarity search and
then retrieve the structures of sequences related to the query
sequence by SRS.
a databank with protein sequences extracted from PDB entries,
created at NBRF.
Homology-derived Secondary Structure of Proteins contains multiple
sequence alignments of proteins from the PDB with related proteins
from the SWISSPROT. It is maintained by C. Sander and R. Schneider
at the EMBL/EBI.
- PROSITE / PROSITEDOC -
a databank with the conserved domains of protein families. Most
domains are represented as patterns, some as profiles and still
others as sets of rules. Each PROSITE entry contains
cross-references to matching SWISS-PROT sequences, including the
false-positives and false-negatives. It is maintained by A.
Bairoch. The patterns from PROSITE are also accessible under GCG
Blocks contains gapless multiple sequence alignments corresponding
to the PROSITE entries. It is maintained by S. Henikoff &
J.Henikoff at the Fred Hutchinson Cancer Research Center (Seattle,
ProDom contains multiple sequence alignments of related proteins
from SWISS-PROT. It is maintained by E. Sonnhammer at the Sanger
Institute (Hinxton, England) and D. Kahn at the Institut National
de la Recherche Agronomique (Toulouse, France).
EC contains a list of the enzymes to which an Enzyme Commission
number has been attributed. It is maintained by A. Bairoch. With
SRS, it is possible to search enzymes by reagent, reaction product
- REBASE - contains
restriction enzymes. It is maintained by Dr. R.J. Roberts at New
England BioLabs. With SRS, it is possible to search restriction
enzymes specific for a particular site. The REBASE is also
accessible under GCG, where it is used to map DNA. (Information
about the commercial suppliers of the restriction enzymes is
available on-line with the GCG command :
- TFSITE / TFFACTOR -
TRANSFAC is a databank of eukaryotic transcription factors. It is
maintained by the Gesellschaft für Biotechnologische
Forschung (Braunschweig, Germany). At the BEN site only part of
the databank is installed, the table TFSITE contains information
about the binding sites, including references to EMBL sequences
that contain them, the table TFFACTOR contains information about
the factors themselves, including references to SWISS-PROT and PIR
entries with the sequence. BEN has also installed a pattern file
transfac.dat to be used with GCG programs as map.
- EPD - The Eukaryotic
Promoter Database contains references to EMBL sequences where RNA
polymerase 11 transcription start sites have been identified. It
is maintained by Philipp Bucher at the Institut Suisse de
Recherches Experimentales sur le Cancer (Lausanne, Switzerland).
At the BEN site there is also under GCG a sequence databank with
the 600 bases around the start site.
- CPGISLE - CpGlsle contains
information about CpG islands, including references to EMBL
sequences where they have been identified. It is maintained at the
Norwegian EMBnet Node.
- ECDC - The E. coli
database collection is a databank with information about the
Escherichia coli K12 genome. It is maintained at the Justus-Liebig
University (Giessen, Germany). At the BEN site, only the table
with the individual genes has been installed. With SRS, it is
possible to search the genes that have been located at a certain
position on the E. coli genetic map.
- SEQANALREF /
SEQANALREF is a databank of literature abstracts in the domain of
sequence analysis. It is maintained by A. Bairoch.
Back to the table of
Last updated: 5 September1997.
created by :Fred