<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
  xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>BMC Bioinformatics</title>
    <link>http://barf.jcowboy.org</link>
    <description>BMC Bioinformatics recent publications</description>
    <language>en-us</language>
    <image>
      <url>http://barf.jcowboy.org/pubmed.gif</url>
      <title>the data for this feed is provided by PubMed</title>
      <link>http://barf.jcowboy.org</link>
    </image>
    <item>
      <title>Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18474116</link>
      <description>Publication Date: 2008 May 13 PMID: 18474116&lt;br/&gt;Authors: Debarry, J. D. - Liu, R. - Bennetzen, J. L.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes. RESULTS: In order to facilitate a broader range of repeat analyses, the Assisted Automated Assembler of Repeat Families algorithm has been developed. This program, written in PERL and with numerous adjustable parameters, identifies sequence overlaps in small shotgun sequence datasets and walks them out to create long pseudomolecules representing the most abundant repeats in any genome. Testing of this program in maize indicated that it found and assembled all of the major repeats in one or more pseudomolecules, including coverage of the major Long Terminal Repeat retrotransposon families. Both Sanger sequence and 454 datasets were appropriate. CONCLUSIONS: These results now indicate that hundreds of higher eukaryotic genomes can be efficiently characterized for the nature, abundance and evolution of their major repetitive DNA components.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18474116&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Comprehensive inventory of protein complexes in the Protein Data Bank from consistent classification of interfaces.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18474114</link>
      <description>Publication Date: 2008 May 12 PMID: 18474114&lt;br/&gt;Authors: Bordner, A. J. - Gorin, A. A.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Protein-protein interactions are ubiquitous and essential for all cellular processes. High-resolution X-ray crystallographic structures of protein complexes can reveal the details of their function and provide a basis for many computational and experimental approaches. Differentiation between biological and non-biological contacts and reconstruction of the intact complex is a challenging computational problem. A successful solution can provide additional insights into the fundamental principles of biological recognition and reduce errors in many algorithms and databases utilizing interaction information extracted from the Protein Data Bank (PDB). RESULTS: We have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster is relevant based on a diverse set of properties; and (4) combining these scores for each PDB entry in order to predict the complex structure. The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions. These interfaces, as well as the predicted protein complexes, are available from the Protein Interface Server (PInS) website at http://pins.ornl.gov/. CONCLUSIONS: Our method demonstrates an almost two-fold reduction of the annotation error rate as evaluated on a large benchmark set of complexes validated from the literature. We also estimate relative contributions of each interface property to the accurate discrimination of biologically relevant interfaces and discuss possible directions for further improving the prediction method.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18474114&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18471287</link>
      <description>Publication Date: 2008 May 9 PMID: 18471287&lt;br/&gt;Authors: Dekhtyar, M. - Morin, A. - Sakanyan, V.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. RESULTS: We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I sigma 70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the alpha subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the sigma 70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. CONCLUSIONS: The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18471287&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Bioinformatic analyses of mammalian 5'-UTR sequence properties of mRNAs predicts alternative translation initiation sites.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18466625</link>
      <description>Publication Date: 2008 May 8 PMID: 18466625&lt;br/&gt;Authors: Wegrzyn, J. L. - Drudge, T. M. - Valafar, F. - Hook, V.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Utilization of alternative initiation sites for protein translation directed by non-AUG codons in mammalian mRNAs is observed with increasing frequency. Alternative initiation sites are utilized for the synthesis of important regulatory proteins that control distinct biological functions. It is, therefore, of high significance to define the parameters that allow accurate bioinformatic prediction of alternative translation initiation sites (aTIS). This study has investigated 5'-UTR regions of mRNAs to define consensus sequence properties and structural features that allow identification of alternative initiation sites for protein translation. RESULTS: Bioinformatic evaluation of 5'-UTR sequences of mammalian mRNAs was conducted for classification and identification of alternative translation initiation sites for a group of mRNA sequences that have been experimentally demonstrated to utilize alternative non-AUG initiation sites for protein translation. These are represented by the codons CUG, GUG, UUG, AUA, and ACG for aTIS. The first phase of this bioinformatic analysis implements a classification tree that evaluated 5'-UTRs for unique consensus sequence features near the initiation codon, characteristics of 5'-UTR nucleotide sequences, and secondary structural features in a decision tree that categorizes mRNAs into those with potential aTIS, and those without. The second phase addresses identification of the aTIS codon and its location. Critical parameters of 5'-UTRs were assessed by an Artificial Neural Network (ANN) for identification of the aTIS codon and its location. ANNs have previously been used for the purpose of AUG start site prediction and are applicable in complex. ANN analyses demonstrated that multiple properties were required for predicting aTIS codons; these properties included unique consensus nucleotide sequences at positions -7 and -6 combined with positions -3 and +4, 5'-UTR length, ORF length, predicted secondary structures, free energy features, upstream AUGs, and G/C ratio. Importantly, combined results of the classification tree and the ANN analyses provided highly accurate bioinformatic predictions of alternative translation initiation sites. CONCLUSIONS: This study has defined the unique properties of 5'-UTR sequences of mRNAs for successful bioinformatic prediction of alternative initiation sites utilized in protein translation. The ability to define aTIS through the described bioinformatic analyses can be of high importance for genomic analyses to provide full predictions of translated mammalian and human gene products required for cellular functions in health and disease.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18466625&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Automating dChip: toward reproducible sharing of microarray data analysis.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18466620</link>
      <description>Publication Date: 2008 May 8 PMID: 18466620&lt;br/&gt;Authors: Li, C.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: During the past decade, many software packages have been developed for analysis and visualization of various types of microarrays. We have developed and maintained the widely used dChip as a microarray analysis software package accessible to both biologist and data analysts. However, challenges arise when dChip users want to analyze large number of arrays automatically and share data analysis procedures and parameters. Improvement is also needed when the dChip user support team tries to identify the causes of reported analysis errors or bugs from users. RESULTS: We report here implementation and application of the dChip automation module. Through this module, dChip automation files can be created to include menu steps, parameters, and data viewpoints to run automatically. A data-packaging function allows convenient transfer from one user to another of the dChip software, microarray data, and analysis procedures, so that the second user can reproduce the entire analysis session of the first user. An analysis report file can also be generated during an automated run, including analysis logs, user comments, and viewpoint screenshots. CONCLUSIONS: The dChip automation module is a step toward reproducible research, and it can prompt a more convenient and reproducible mechanism for sharing microarray software, data, and analysis procedures and results. Automation data packages can also be used as publication supplements. Similar automation mechanisms could be valuable to the research community if implemented in other genomics and bioinformatics software packages.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18466620&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>CPSP-tools - exact and complete algorithms for high-throughput 3D lattice protein studies.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18462492</link>
      <description>Publication Date: 2008 May 7 PMID: 18462492&lt;br/&gt;Authors: Mann, M. - Will, S. - Backofen, R.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: The principles of protein folding and evolution pose problems of very high inherent complexity. Often these problems are tackled using simplified protein models, e.g. lattice proteins. The CPSP-tools package provides programs to solve exactly and completely the problems typical of studies using 3D lattice protein models. Among the tasks addressed are the prediction of (all) globally optimal and/or suboptimal structures as well as sequence design and neutral network exploration. RESULTS: In contrast to stochastic approaches, which are not capable of answering many fundamental questions, our methods are based on fast, non-heuristic techniques. The resulting tools are designed for high-throughput studies of 3D-lattice proteins utilising the Hydrophobic-Polar (HP) model. The source bundle is freely available at http://www.bioinf.uni-freiburg.de/sw/cpsp/ CONCLUSIONS: The CPSP-tools package is the first set of exact and complete methods for extensive, high-throughput studies of non-restricted 3D-lattice protein models. In particular, our package deals with cubic and face centered cubic (FCC) lattices.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18462492&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460187</link>
      <description>Publication Date: 2008 PMID: 18460187&lt;br/&gt;Authors: Spasic, I. - Schober, D. - Sansone, S. A. - Rebholz-Schuhmann, D. - Kell, D. B. - Paton, N. W.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually. RESULTS: We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts. CONCLUSIONS: We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460187&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Metrics for GO based protein semantic similarity: a systematic evaluation.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460186</link>
      <description>Publication Date: 2008 PMID: 18460186&lt;br/&gt;Authors: Pesquita, C. - Faria, D. - Bastos, H. - Ferreira, A. E. - Falcao, A. O. - Couto, F. M.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations. RESULTS: We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation. CONCLUSIONS: This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid simGIC was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460186&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Mapping proteins to disease terminologies: from UniProt to MeSH.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460185</link>
      <description>Publication Date: 2008 PMID: 18460185&lt;br/&gt;Authors: Mottaz, A. - Yip, Y. L. - Ruch, P. - Veuthey, A. L.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Although the UniProt KnowledgeBase is not a medical-oriented database, it contains information on more than 2,000 human proteins involved in pathologies. However, these annotations are not standardized, which impairs the interoperability between biological and clinical resources. In order to make these data easily accessible to clinical researchers, we have developed a procedure to link diseases described in the UniProtKB/Swiss-Prot entries to the MeSH disease terminology. RESULTS: We mapped disease names extracted either from the UniProtKB/Swiss-Prot entry comment lines or from the corresponding OMIM entry to the MeSH. Different methods were assessed on a benchmark set of 200 disease names manually mapped to MeSH terms. The performance of the retained procedure in term of precision and recall was 86% and 64% respectively. Using the same procedure, more than 3,000 disease names in Swiss-Prot were mapped to MeSH with comparable efficiency. CONCLUSIONS: This study is a first attempt to link proteins in UniProtKB to the medical resources. The indexing we provided will help clinicians and researchers navigate from diseases to genes and from genes to diseases in an efficient way. The mapping is available at: http://research.isb-sib.ch/unimed.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460185&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Gene Ontology annotations: what they mean and where they come from.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460184</link>
      <description>Publication Date: 2008 PMID: 18460184&lt;br/&gt;Authors: Hill, D. P. - Smith, B. - McAndrews-Hill, M. S. - Blake, J. A.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;To address the challenges of information integration and retrieval, the computational genomics community increasingly has come to rely on the methodology of creating annotations of scientific literature using terms from controlled structured vocabularies such as the Gene Ontology (GO). Here we address the question of what such annotations signify and of how they are created by working biologists. Our goal is to promote a better understanding of how the results of experiments are captured in annotations, in the hope that this will lead both to better representations of biological reality through annotation and ontology development and to more informed use of GO resources by experimental scientists.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460184&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Ontology Design Patterns for bio-ontologies: a case study on the Cell Cycle Ontology.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460183</link>
      <description>Publication Date: 2008 PMID: 18460183&lt;br/&gt;Authors: Aranguren, M. E. - Antezana, E. - Kuiper, M. - Stevens, R.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Bio-ontologies are key elements of knowledge management in bioinformatics. Rich and rigorous bio-ontologies should represent biological knowledge with high fidelity and robustness. The richness in bio-ontologies is a prior condition for diverse and efficient reasoning, and hence querying and hypothesis validation. Rigour allows a more consistent maintenance. Modelling such bio-ontologies is, however, a difficult task for bio-ontologists, because the necessary richness and rigour is difficult to achieve without extensive training. RESULTS: Analogous to design patterns in software engineering, Ontology Design Patterns are solutions to typical modelling problems that bio-ontologists can use when building bio-ontologies. They offer a means of creating rich and rigorous bio-ontologies with reduced effort. The concept of Ontology Design Patterns is described and documentation and application methodologies for Ontology Design Patterns are presented. Some real-world use cases of Ontology Design Patterns are provided and tested in the Cell Cycle Ontology. Ontology Design Patterns, including those tested in the Cell Cycle Ontology, can be explored in the Ontology Design Patterns public catalogue that has been created based on the documentation system presented (http://odps.sourceforge.net/). CONCLUSIONS: Ontology Design Patterns provide a method for rich and rigorous modelling in bio-ontologies. They also offer advantages at different development levels (such as design, implementation and communication) enabling, if used, a more modular, well-founded and richer representation of the biological knowledge. This representation will produce a more efficient knowledge management in the long term.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460183&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Reconstructing networks of pathways via significance analysis of their intersections.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460182</link>
      <description>Publication Date: 2008 PMID: 18460182&lt;br/&gt;Authors: Francesconi, M. - Remondini, D. - Neretti, N. - Sedivy, J. M. - Cooper, L. N. - Verondini, E. - Milanesi, L. - Castellani, G.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Significance analysis at single gene level may suffer from the limited number of samples and experimental noise that can severely limit the power of the chosen statistical test. This problem is typically approached by applying post hoc corrections to control the false discovery rate, without taking into account prior biological knowledge. Pathway or gene ontology analysis can provide an alternative way to relax the significance threshold applied to single genes and may lead to a better biological interpretation. RESULTS: Here we propose a new analysis method based on the study of networks of pathways. These networks are reconstructed considering both the significance of single pathways (network nodes) and the intersection between them (links).We apply this method for the reconstruction of networks of pathways to two gene expression datasets: the first one obtained from a c-Myc rat fibroblast cell line expressing a conditional Myc-estrogen receptor oncoprotein; the second one obtained from the comparison of Acute Myeloid Leukemia and Acute Lymphoblastic Leukemia derived from bone marrow samples. CONCLUSION: Our method extends statistical models that have been recently adopted for the significance analysis of functional groups of genes to infer links between these groups. We show that groups of genes at the interface between different pathways can be considered as relevant even if the pathways they belong to are not significant by themselves.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460182&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>On deducing causality in metabolic networks.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460181</link>
      <description>Publication Date: 2008 PMID: 18460181&lt;br/&gt;Authors: Bodei, C. - Bracciali, A. - Chiarugi, D.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Metabolic networks present a complex interconnected structure, whose understanding is in general a non-trivial task. Several formal approaches have been developed to support the investigation of such networks. One of the relevant problems in this context is the comprehension of causality dependencies amongst the molecules involved in the metabolic process. RESULTS: We apply techniques from formal methods and computational logic to develop an abstract qualitative model of metabolic networks in order to determine possible causal dependencies. Keeping in mind both expressiveness and ease of use, we aimed at providing: i) a minimal notation to represent causality in biochemical interactions, and ii) an automated tool allowing human experts to easily vary conditions of in silico experiments. We exploit a reading of chemical reactions in terms of logical implications: starting from a description of a metabolic network in terms of reaction rules and initial conditions, chains of reactions, causally depending one from the another, can be automatically deduced. Both the components of the initial state and the clauses ruling reactions can be easily varied and a new trial of the experiment started, according to a what-if investigation strategy. Our approach aims at exploiting computational logic as a formal modeling framework, amongst the several available, that is naturally close to human reasoning. It directly leads to executable implementations and may support, in perspective, various reasoning schemata. Indeed, our abstractions are supported by a computational counterpart, based on a Prolog implementation, which allows for a representation language closely correspondent to the adopted chemical abstract notation. The proposed approach has been validated by results regarding gene knock-out and essentiality for a model of the metabolic network of Escherichia coli K12, which show a relevant coherence with available wet-lab experimental data. CONCLUSIONS: Starting from the presented work, our goal is to provide an effective analysis toolkit, supported by an efficient full-fledged computational counterpart, with the aim of fruitfully driving in vitro experiments by effectively pruning non promising directions.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460181&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Stochastic models for the in silico simulation of synaptic processes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460180</link>
      <description>Publication Date: 2008 PMID: 18460180&lt;br/&gt;Authors: Bracciali, A. - Brunelli, M. - Cataldo, E. - Degano, P.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Research in life sciences is benefiting from a large availability of formal description techniques and analysis methodologies. These allow both the phenomena investigated to be precisely modeled and virtual experiments to be performed in silico. Such experiments may result in easier, faster, and satisfying approximations of their in vitro/vivo counterparts. A promising approach is represented by the study of biological phenomena as a collection of interactive entities through process calculi equipped with stochastic semantics. These exploit formal grounds developed in the theory of concurrency in computer science, account for the not continuous, nor discrete, nature of many phenomena, enjoy nice compositional properties and allow for simulations that have been demonstrated to be coherent with data in literature. RESULTS: Motivated by the need to address some aspects of the functioning of neural synapses, we have developed one such model for synaptic processes in the calyx of Held, which is a glutamatergic synapse in the auditory pathway of the mammalia. We have developed such a stochastic model starting from existing kinetic models based on ODEs of some sub-components of the synapse, integrating other data from literature and making some assumptions about non-fully understood processes. Experiments have confirmed the coherence of our model with known biological data, also validating the assumptions made. Our model overcomes some limitations of the kinetic ones and, to our knowledge, represents the first model of synaptic processes based on process calculi. The compositionality of the approach has permitted us to independently focus on tuning the models of the pre- and post- synaptic traits, and then to naturally connect them, by dealing with &quot;interface&quot; issues. Furthermore, we have improved the expressiveness of the model, e.g. by embedding easy control of element concentration time courses. Sensitivity analysis over several parameters of the model has provided results that may help clarify the dynamics of synaptic transmission, while experiments with the model of the complete synapse seem worth explaining short-term plasticity mechanisms. CONCLUSIONS: Specific presynaptic and postsynaptic mechanisms can be further analysed under various conditions, for instance by studying the presynaptic behaviour under repeated activations. The level of details of the description can be refined, for instance by further specifying the neurotransmitter generation and release steps. Taking advantage of the compositionality of the approach, an enhanced model could then be composed with other neural models, designed within the same framework, in order to obtain a more detailed and comprehensive model. In the long term, we are interested, in particular, in addressing models of synaptic plasticity, i.e. activity dependent mechanisms, which are the bases of memory and learning processes. More on the computer science side, we plan to follow some directions to improve the underlying computational model and the linguistic primitives it provides as suggested by the experiments carried out, e.g. by introducing a suitable notion of (spatial) locality.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460180&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>RDFScape: Semantic Web meets systems biology.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460179</link>
      <description>Publication Date: 2008 PMID: 18460179&lt;br/&gt;Authors: Splendiani, A.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: The recent availability of high-throughput data in molecular biology has increased the need for a formal representation of this knowledge domain. New ontologies are being developed to formalize knowledge, e.g. about the functions of proteins. As the Semantic Web is being introduced into the Life Sciences, the basis for a distributed knowledge-base that can foster biological data analysis is laid. However, there still is a dichotomy, in tools and methodologies, between the use of ontologies in biological investigation, that is, in relation to experimental observations, and their use as a knowledge-base. RESULTS: RDFScape is a plugin that has been developed to extend a software oriented to biological analysis with support for reasoning on ontologies in the semantic web framework. We show with this plugin how the use of ontological knowledge in biological analysis can be extended through the use of inference. In particular, we present two examples relative to ontologies representing biological pathways: we demonstrate how these can be abstracted and visualized as interaction networks, and how reasoning on causal dependencies within elements of pathways can be implemented. CONCLUSIONS: The use of ontologies for the interpretation of high-throughput biological data can be improved through the use of inference. This allows the use of ontologies not only as annotations, but as a knowledge-base from which new information relevant for specific analysis can be derived.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460179&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>AMMO-Prot: amine system project 3D-model finder.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460178</link>
      <description>Publication Date: 2008 PMID: 18460178&lt;br/&gt;Authors: Navas-Delgado, I. - Montanez, R. - Pino-Angeles, A. - Moya-Garcia, A. A. - Urdiales, J. L. - Sanchez-Jimenez, F. - Aldana-Montes, J. F.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Amines are biogenic amino acid derivatives, which play pleiotropic and very important yet complex roles in animal physiology. For many other relevant biomolecules, biochemical and molecular data are being accumulated, which need to be integrated in order to be effective in the advance of biological knowledge in the field. For this purpose, a multidisciplinary group has started an ontology-based system named the Amine System Project (ASP) for which amine-related information is the validation bench. RESULTS: In this paper, we describe the Ontology-Based Mediator developed in the Amine System Project (http://asp.uma.es) using the infrastructure of Semantic Directories, and how this system has been used to solve a case related to amine metabolism-related protein structures. CONCLUSIONS: This infrastructure is used to publish and manage not only ontologies and their relationships, but also metadata relating to the resources committed with the ontologies. The system developed is available at http://asp.uma.es/WebMediator.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460178&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Ontology-based, Tissue MicroArray oriented, image centered tissue bank.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460177</link>
      <description>Publication Date: 2008 PMID: 18460177&lt;br/&gt;Authors: Viti, F. - Merelli, I. - Caprera, A. - Lazzari, B. - Stella, A. - Milanesi, L.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Tissue MicroArray technique is becoming increasingly important in pathology for the validation of experimental data from transcriptomic analysis. This approach produces many images which need to be properly managed, if possible with an infrastructure able to support tissue sharing between institutes. Moreover, the available frameworks oriented to Tissue MicroArray provide good storage for clinical patient, sample treatment and block construction information, but their utility is limited by the lack of data integration with biomolecular information. RESULTS: In this work we propose a Tissue MicroArray web oriented system to support researchers in managing bio-samples and, through the use of ontologies, enables tissue sharing aimed at the design of Tissue MicroArray experiments and results evaluation. Indeed, our system provides ontological description both for pre-analysis tissue images and for post-process analysis image results, which is crucial for information exchange. Moreover, working on well-defined terms it is then possible to query web resources for literature articles to integrate both pathology and bioinformatics data. CONCLUSIONS: Using this system, users associate an ontology-based description to each image uploaded into the database and also integrate results with the ontological description of biosequences identified in every tissue. Moreover, it is possible to integrate the ontological description provided by the user with a full compliant gene ontology definition, enabling statistical studies about correlation between the analyzed pathology and the most commonly related biological processes.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460177&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Ontology-guided data preparation for discovering genotype-phenotype relationships.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460176</link>
      <description>Publication Date: 2008 PMID: 18460176&lt;br/&gt;Authors: Coulet, A. - Smail-Tabbone, M. - Benlian, P. - Napoli, A. - Devignes, M. D.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Complexity and amount of post-genomic data constitute two major factors limiting the application of Knowledge Discovery in Databases (KDD) methods in life sciences. Bio-ontologies may nowadays play key roles in knowledge discovery in life science providing semantics to data and to extracted units, by taking advantage of the progress of Semantic Web technologies concerning the understanding and availability of tools for knowledge representation, extraction, and reasoning. RESULTS: This paper presents a method that exploits bio-ontologies for guiding data selection within the preparation step of the KDD process. We propose three scenarios in which domain knowledge and ontology elements such as subsumption, properties, class descriptions, are taken into account for data selection, before the data mining step. Each of these scenarios is illustrated within a case-study relative to the search of genotype-phenotype relationships in a familial hypercholesterolemia dataset. The guiding of data selection based on domain knowledge is analysed and shows a direct influence on the volume and significance of the data mining results. CONCLUSIONS: The method proposed in this paper is an efficient alternative to numerical methods for data selection based on domain knowledge. In turn, the results of this study may be reused in ontology modelling and data integration.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460176&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Terminologies for text-mining; an experiment in the lipoprotein metabolism domain.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460175</link>
      <description>Publication Date: 2008 PMID: 18460175&lt;br/&gt;Authors: Alexopoulou, D. - Wachter, T. - Pickersgill, L. - Eyre, C. - Schroeder, M.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. RESULTS: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. CONCLUSIONS: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. AVAILABILITY: The TFIDF term recognition is available as Web Service, described at http://gopubmed4.biotec.tu-dresden.de/IdavollWebService/services/Candidate TermGeneratorService?wsdl.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460175&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GeneFisher-P: variations of GeneFisher as processes in Bio-jETI.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460174</link>
      <description>Publication Date: 2008 PMID: 18460174&lt;br/&gt;Authors: Lamprecht, A. L. - Margaria, T. - Steffen, B. - Sczyrba, A. - Hartmeier, S. - Giegerich, R.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: PCR primer design is an everyday, but not trivial task requiring state-of-the-art software. We describe the popular tool GeneFisher and explain its recent restructuring using workflow techniques. We apply a service-oriented approach to model and implement GeneFisher-P, a process-based version of the GeneFisher web application, as a part of the Bio-jETI platform for service modeling and execution. We show how to introduce a flexible process layer to meet the growing demand for improved user-friendliness and flexibility. RESULTS: Within Bio-jETI, we model the process using the jABC framework, a mature model-driven, service-oriented process definition platform. We encapsulate remote legacy tools and integrate web services using jETI, an extension of the jABC for seamless integration of remote resources as basic services, ready to be used in the process. Some of the basic services used by GeneFisher are in fact already provided as individual web services at BiBiServ and can be directly accessed. Others are legacy programs, and are made available to Bio-jETI via the jETI technology.The full power of service-based process orientation is required when more bioinformatics tools, available as web services or via jETI, lead to easy extensions or variations of the basic process. This concerns for instance variations of data retrieval or alignment tools as provided by the European Bioinformatics Institute (EBI). CONCLUSIONS: The resulting service- and process-oriented GeneFisher-P demonstrates how basic services from heterogeneous sources can be easily orchestrated in the Bio-jETI platform and lead to a flexible family of specialized processes tailored to specific tasks.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460174&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Bio-jETI: a service integration, design, and provisioning platform for orchestrated bioinformatics processes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460173</link>
      <description>Publication Date: 2008 PMID: 18460173&lt;br/&gt;Authors: Margaria, T. - Kubczak, C. - Steffen, B.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout. METHODS: Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution. CONCLUSIONS: As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460173&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>PARPST: a PARallel algorithm to find peptide sequence tags.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460172</link>
      <description>Publication Date: 2008 PMID: 18460172&lt;br/&gt;Authors: Brunetti, S. - Lodi, E. - Mori, E. - Stella, M.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Protein identification is one of the most challenging problems in proteomics. Tandem mass spectrometry provides an important tool to handle the protein identification problem. RESULTS: We developed a work-efficient parallel algorithm for the peptide sequence tag problem. The algorithm runs on the concurrent-read, exclusive-write PRAM in O(n) time using log n processors, where n is the number of mass peaks in the spectrum. The algorithm is able to find all the sequence tags having score greater than a parameter or all the sequence tags of maximum length. Our tests on 1507 spectra in the Open Proteomics Database shown that our algorithm is efficient and effective since achieves comparable results to other methods. CONCLUSIONS: The proposed algorithm can be used to speed up the database searching or to identify post-translational modifications, comparing the homology of the sequence tags found with the sequences in the biological database.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460172&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GraphFind: enhancing graph searching by low support data mining techniques.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460171</link>
      <description>Publication Date: 2008 PMID: 18460171&lt;br/&gt;Authors: Ferro, A. - Giugno, R. - Mongiovi, M. - Pulvirenti, A. - Skripin, D. - Shasha, D.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, a key role is played by systems that search for all exact or approximate occurrences of a query graph. To deal efficiently with graph searching, advanced methods for indexing, representation and matching of graphs have been proposed. RESULTS: This paper presents GraphFind. The system implements efficient graph searching algorithms together with advanced filtering techniques that allow approximate search. It allows users to select candidate subgraphs rather than entire graphs. It implements an effective data storage based also on low-support data mining. CONCLUSIONS: GraphFind is compared with Frowns, GraphGrep and gIndex. Experiments show that GraphFind outperforms the compared systems on a very large collection of small graphs. The proposed low-support mining technique which applies to any searching system also allows a significant index space reduction.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460171&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Semantic Web for bioinformatics: goals, tools, systems, applications.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460170</link>
      <description>Publication Date: 2008 PMID: 18460170&lt;br/&gt;Authors: Cannata, N. - Schroder, M. - Marangoni, R. - Romano, P.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460170&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460207</link>
      <description>Publication Date: 2008 May 6 PMID: 18460207&lt;br/&gt;Authors: Chica, C. - Labarga, A. - Gould, C. M. - Lopez, R. - Gibson, T. J.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: The structure of many eukaryotic cell regulatory proteins is highly modular. They are assembled from globular domains, segments of natively disordered polypeptides and short linear motifs. The latter are involved in protein interactions and formation of regulatory complexes. The function of such proteins, which may be difficult to define, is the aggregate of the subfunctions of the modules. It is therefore desirable to efficiently predict linear motifs with some degree of accuracy, yet sequence database searches return results that are not significant. RESULTS: We have developed a method for scoring the conservation of linear motif instances. It requires only primary sequence-derived information (e.g. multiple alignment and sequence tree) and takes into account the degenerate nature of linear motif patterns. On our benchmarking, the method accurately scores 86% of the known positive instances, while distinguishing them from random matches in 78% of the cases. The conservation score is implemented as a real time application designed to be integrated into other tools. It is currently accessible via a Web Service or through a graphical interface. CONCLUSIONS: The conservation score improves the prediction of linear motifs, by discarding those matches that are unlikely to be functional because they have not been conserved during the evolution of the protein sequences. It is especially useful for instances in non-structured regions of the proteins, where a domain masking filtering strategy is not applicable.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460207&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Inferring the role of transcription factors in regulatory networks.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18460200</link>
      <description>Publication Date: 2008 May 6 PMID: 18460200&lt;br/&gt;Authors: Veber, P. - Guziolowski, C. - Le Borgne, M. - Radulescu, O. - Siegel, A.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays. RESULTS: We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of E. coli extracted from the literature (1529 nodes and 3802 edges), and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to S. cerevisiae transcriptional network (2419 nodes and 4344 interactions), by combining ChIP-chip data and 15 expression profiles . We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions). In addition, we report predictions for 14.5% of all interactions. CONCLUSIONS: Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine a significant portion of regulatory effects. This is a key practical asset compared to statistical methods for network reconstruction. We demonstrate that our approach is able to provide accurate predictions, even when the network is incomplete and the data is noisy.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18460200&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>The pairwise disconnectivity index as a new metric for the topological analysis of regulatory networks.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18454847</link>
      <description>Publication Date: 2008 May 2 PMID: 18454847&lt;br/&gt;Authors: Potapov, A. P. - Goemann, B. - Wingender, E.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Currently, there is a gap between purely theoretical studies of the topology of large bioregulatory networks and the practical traditions and interests of experimentalists. While the theoretical approaches emphasize the global characterization of regulatory systems, the practical approaches focus on the role of distinct molecules and genes in regulation. To bridge the gap between these opposite approaches, one needs to combine 'general' with 'particular' properties and translate abstract topological features of large systems into testable functional characteristics of individual components. Here, we propose a new topological parameter - the pairwise disconnectivity index of a network's element - that is capable of such bridging. RESULTS: The pairwise disconnectivity index quantifies how crucial an individual element is for sustaining the communication ability between connected pairs of vertices in a network that is displayed as a directed graph. Such an element might be a vertex (i.e., molecules, genes), an edge (i.e., reactions, interactions), as well as a group of vertices and/or edges. The index can be viewed as a measure of topological redundancy of regulatory paths which connect different parts of a given network and as a measure of sensitivity (robustness) of this network to the presence (absence) of each individual element. Accordingly, we introduce the notion of a path-degree of a vertex in terms of its corresponding incoming, outgoing and mediated paths, respectively. The pairwise disconnectivity index has been applied to the analysis of several regulatory networks from various organisms. The importance of an individual vertex or edge for the coherence of the network is determined by the particular position of the given element in the whole network. CONCLUSIONS: Our approach enables to evaluate the effect of removing each element (i.e., vertex, edge, or their combinations) from a network. The greatest potential value of this approach is its ability to systematically analyze the role of every element, as well as groups of elements, in a regulatory network.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18454847&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18452616</link>
      <description>Publication Date: 2008 May 1 PMID: 18452616&lt;br/&gt;Authors: Kurgan, L. - Cios, K. - Chen, K.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. RESULTS: SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. CONCLUSIONS: The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18452616&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A copula method for modeling directional dependence of genes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18447957</link>
      <description>Publication Date: 2008 May 1 PMID: 18447957&lt;br/&gt;Authors: Kim, J. M. - Jung, Y. S. - Sungur, E. A. - Han, K. H. - Park, C. - Sohn, I.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Genes interact with each other as basic building blocks of life, forming a complicated network. The relationship between groups of genes with different functions can be represented as gene networks. With the deposition of huge microarray data sets in public domains, study on gene networking is now possible. In recent years, there has been an increasing interest in the reconstruction of gene networks from gene expression data. Recent work includes linear models, Boolean network models, and Bayesian networks. Among them, Bayesian networks seem to be the most effective in constructing gene networks. A major problem with the Bayesian network approach is the excessive computational time. This problem is due to the interactive feature of the method that requires large search space. Since fitting a model by using the copulas does not require iterations, elicitation of the priors, and complicated calculations of posterior distributions, the need for reference to extensive search spaces can be eliminated leading to manageable computational affords. Bayesian network approach produces a discretely expression of conditional probabilities. Discreteness of the characteristics is not required in the copula approach which involves use of uniform representation of the continuous random variables. Our method is able to overcome the limitation of Bayesian network method for gene-gene interaction, i.e. information loss due to binary transformation. RESULTS: We analyzed the gene interactions for two gene data sets (one group is eight histone genes and the other group is 19 genes which include DNA polymerases, DNA helicase, type B cyclin genes, DNA primases, radiation sensitive genes, repaire related genes, replication protein A encoding gene, DNA replication initiation factor, securin gene, nucleosome assembly factor, and a subunit of the cohesin complex) by adopting a measure of directional dependence based on a copula function. We have compared our results with those from other methods in the literature. Although microarray results show a transcriptional co-regulation pattern and do not imply that the gene products are physically interactive, this tight genetic connection may suggest that each gene product has either direct or indirect connections between the other gene products. Indeed, recent comprehensive analysis of a protein interaction map revealed that those histone genes are physically connected with each other, supporting the results obtained by our method. CONCLUSIONS: The results illustrate that our method can be an alternative to Bayesian networks in modeling gene interactions. One advantage of our approach is that dependence between genes is not assumed to be linear. Another advantage is that our approach can detect directional dependence. We expect that our study may help to design artificial drug candidates, which can block or activate biologically meaningful pathways. Moreover, our copula approach can be extended to investigate the effects of local environments on protein-protein interactions. The copula mutual information approach will help to propose the new variant of ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks): an algorithm for the reconstruction of gene regulatory networks.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18447957&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18447951</link>
      <description>Publication Date: 2008 Apr 30 PMID: 18447951&lt;br/&gt;Authors: Churbanov, A. - Winters-Hilt, S.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: The Baum-Welch learning procedure for Hidden Markov Models (HMMs) provides a powerful tool for tailoring HMM topologies to data for use in knowledge discovery and clustering. A linear memory procedure recently proposed by Miklos, I. and Meyer, I.M. describes a memory sparse version of the Baum-Welch algorithm with modifications to the original probabilistic table topologies to make memory use independent of sequence length (and linearly dependent on state number). The original description of the technique has some errors that we amend. We then compare the corrected implementation on a variety of data sets with conventional and checkpointing implementations. RESULTS: We provide a correct recurrence relation for the emission parameter estimate and extend it to parameter estimates of the Normal distribution. To accelerate estimation of the prior state probabilities, and decrease memory use, we reverse the originally proposed forward sweep. We describe different scaling strategies necessary in all real implementations of the algorithm to prevent underflow. In this paper we also describe our approach to a linear memory implementation of the Viterbi decoding algorithm (with linearity in the sequence length, while memory use is approximately independent of state number). We demonstrate the use of the linear memory implementation on an extended Duration Hidden Markov Model (DHMM) and on an HMM with a spike detection topology. Comparing the various implementations of the Baum-Welch procedure we find that the checkpointing algorithm produces the best overall tradeoff between memory use and speed. In cases where sequence length is very large (for Baum-Welch), or state number is very large (for Viterbi), the linear memory methods outlined may offer some utility. CONCLUSIONS: Our performance-optimized Java implementations of Baum-Welch algorithm are available at http://logos.cs.uno.edu/~achurban. The described method and implementations will aid sequence alignment, gene structure prediction, HMM profile training, nanopore ionic flow blockades analysis and many other domains that require efficient HMM training with EM.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18447951&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GENOMEPOP: A program to simulate genomes in populations.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18447924</link>
      <description>Publication Date: 2008 Apr 30 PMID: 18447924&lt;br/&gt;Authors: Carvajal-Rodriguez, A.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific dN/dS estimation. RESULTS: I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios. CONCLUSIONS: GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTRxMG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page [1].&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18447924&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>An efficient method for the prediction of deleterious multiple-point mutations in the secondary structure of RNAs using suboptimal folding solutions.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18445289</link>
      <description>Publication Date: 2008 Apr 29 PMID: 18445289&lt;br/&gt;Authors: Churkin, A. - Barash, D.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: RNAMute is an interactive Java application which, given an RNA sequence, calculates the secondary structure of all single point mutations and organizes them into categories according to their similarity to the predicted structure of the wild type. The secondary structure predictions are performed using the Vienna RNA package. A more efficient implementation of RNAmute is needed, however, to extend from the case of single point mutations to the general case of multiple point mutations, which may often be desired for computational predictions alongside mutagenesis experiments. But analyzing multiple point mutations, a process that requires traversing all possible mutations, becomes highly expensive since the running time is O(n;m) for a sequence of length n with m-point mutations. Using Vienna's RNAsubopt, we present a method that selects only those mutations, based on stability considerations, which are likely to be conformational rearranging. The approach is best examined using the dot plot representation for RNA secondary structure. RESULTS: Using RNAsubopt, the suboptimal solutions for a given wild-type sequence are calculated once. Then, specific mutations are selected that are most likely to cause a conformational rearrangement. For an RNA sequence of about 100 nts and 3-point mutations (n=100, m=3), for example, the proposed method reduces the running time from several hours or even days to several minutes, thus enabling the practical application of RNAMute to the analysis of multiple-point mutations. CONCLUSIONS: A highly efficient addition to RNAMute that is as user friendly as the original application but that facilitates the practical analysis of multiple-point mutations is presented. Such an extension can now be exploited prior to site-directed mutagenesis experiments by virologists, for example, who investigate the change of function in an RNA virus via mutations that disrupt important motifs in its secondary structure. A complete explanation of the application, called MultiRNAMute, is available at http://www.cs.bgu.ac.il/~RNAMute/Efficient .&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18445289&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Highly accurate sigmoidal fitting of real-time PCR data by introducing a parameter for asymmetry.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18445269</link>
      <description>Publication Date: 2008 Apr 29 PMID: 18445269&lt;br/&gt;Authors: Spiess, A. N. - Feig, C. - Ritz, C.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Fitting four-parameter sigmoidal models is one of the methods established in the analysis of quantitative real-time PCR (qPCR) data. We had observed that these models are not optimal in the fitting outcome due to the inherent constraint of symmetry around the point of inflection. Thus, we found it necessary to employ a mathematical algorithm that circumvents this problem and which utilizes an additional parameter for accommodating asymmetrical structures in sigmoidal qPCR data. RESULTS: The four-parameter models were compared to their five-parameter counterparts by means of nested F-tests based on the residual variance, thus acquiring a statistical measure for higher performance. For nearly all qPCR data we examined, five-parameter models resulted in a significantly better fit. Furthermore, accuracy and precision for the estimation of efficiencies and calculation of quantitative ratios were assessed with four independent dilution datasets and compared to the most commonly used quantification methods. It could be shown that the five-parameter model exhibits an accuracy and precision more similar to the non-sigmoidal quantification methods. CONCLUSIONS: The five-parameter sigmoidal models outperform the established four-parameter model with high statistical significance. The estimation of essential PCR parameters such as PCR efficiency, threshold cycles and initial template fluorescence is more robust and has smaller variance. The model is implemented in the qpcR package for the freely available statistical R environment. The package can be downloaded from the author's homepage.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18445269&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Studying the functional conservation of cis-regulatory modules and their transcriptional output.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18442418</link>
      <description>Publication Date: 2008 Apr 29 PMID: 18442418&lt;br/&gt;Authors: Bauer, D. C. - Bailey, T. L.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: Cis-regulatory modules (CRMs) are distinct, genomic regions surrounding the target gene that can independently activate the promoter to drive transcription. The activation of a CRM is controlled by the binding of a certain combination of transcription factors (TFs). It would be of great benefit if the transcriptional output mediated by a specific CRM could be predicted. Of equal benefit would be identifying in silico a specific CRM as the driver of the expression in a specific tissue or situation. We extend a recently developed biochemical modeling approach to manage both prediction tasks. Given a set of TFs, their protein concentrations, and the positions and binding strengths of each of the TFs in a putative CRM, the model predicts the transcriptional output of the gene. Our approach predicts the location of the regulating CRM by using predicting TF binding sites in regions near the gene as input to the model and searching for the region that yields a predicted transcription rate most closely matching the known rate. RESULTS: Here we show the ability of the model on the example of one of the CRMs regulating the eve gene, MSE2. A model trained on the MSE2 in D. melanogaster was applied to the surrounding sequence of the eve gene in seven other Drosophila species. The model successfully predicts the correct MSE2 location and output in six out of eight Drosophila species we examine. CONCLUSIONS: The model is able to generalize from D. melanogaster to other Drosophila species and accurately predicts the location and transcriptional output of MSE2 in those species. However, we also show that the current model is not specific enough to function as a genome-wide CRM scanner, because it incorrectly predicts other genomic regions to be MSE2s.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18442418&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>ConStruct: improved construction of RNA consensus structures.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18442401</link>
      <description>Publication Date: 2008 Apr 28 PMID: 18442401&lt;br/&gt;Authors: Wilm, A. - Linnenbrink, K. - Steger, G.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Aligning homologous non-coding RNAs (ncRNAs) correctly in terms of sequence and structure is an unresolved problem, due to both mathematical complexity and imperfect scoring functions. High quality alignments, however, are a prerequisite for most consensus structure prediction approaches, homology searches, and tools for phylogeny inference. Automatically created ncRNA alignments often need manual corrections, yet this manual refinement is tedious and error-prone. RESULTS: We present an extended version of ConStruct, a semi-automatic, graphical tool suitable for creating RNA alignments correct in terms of both consensus sequence and consensus structure. To this purpose ConStruct combines sequence alignment, thermodynamic data and various measures of covariation. One important feature is that the user is guided during the alignment correction step by a consensus dotplot, which displays all thermodynamically optimal base pairs and the corresponding covariation. Once the initial alignment is corrected, optimal and suboptimal secondary structures as well as tertiary interaction can be predicted. We demonstrate ConStruct's ability to guide the user in correcting an initial alignment, and show an example for optimal secondary consensus structure prediction on very hard to align SECIS elements. Moreover we use ConStruct to predict tertiary interactions from sequences of the internal ribosome entry site of CrP-like viruses. In addition we show that alignments specifically designed for benchmarking can be easily be optimized using ConStruct, although they share very little sequence identity. CONCLUSIONS: ConStruct's graphical interface allows for an easy alignment correction based on and guided by predicted and known structural constraints. It combines several algorithms for prediction of secondary consensus structure and even tertiary interactions. The ConStruct package can be downloaded at http://www.biophys.uni-duesseldorf.de/construct3/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18442401&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>An efficient visualization tool for the analysis of protein mutation matrices.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18442400</link>
      <description>Publication Date: 2008 Apr 28 PMID: 18442400&lt;br/&gt;Authors: David, M. P. - Lapid, C. M. - Daria, V. R.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: It is useful to develop a tool that would effectively describe protein mutation matrices specifically geared towards the identification of mutations that produce either wanted or unwanted effects, such as an increase or decrease in affinity, or a predisposition towards misfolding. Here, we describe a tool where such mutations are efficiently identified, categorized and visualized. To categorize the mutations, amino acids in a mutation matrix are arrang according to one of three sets of physicochemical characteristics, namely hydrophilicity, size and polarizability, and charge and polarity. The magnitude and frequences of mutations for an alignment are subsequently described using color information and scaling factors. RESULTS: To illustrate the capabilities of our approach, the technique is used to visualize and to compare mutation patterns in evolving sequences with diametrically opposite characteristics. Results show the emergence of distinct patterns not immediately discernible from the raw matrices. CONCLUSION: Our technique enables effective categorization and visualization of mutations by using specifically-arranged mutation matrices. This tool has a number of possible applications in protein engineering, notably in simplifying the identification of mutations and/or mutation trends that are associated with specific engineered protein characteristics and behavior.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18442400&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Gene prediction in metagenomic fragments: a large scale machine learning approach.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18442389</link>
      <description>Publication Date: 2008 Apr 28 PMID: 18442389&lt;br/&gt;Authors: Hoff, K. J. - Tech, M. - Lingner, T. - Daniel, R. - Morgenstern, B. - Meinicke, P.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. RESULTS: We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. CONCLUSION: Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded at http://orphelia.gobics.de/datasets/&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18442389&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Normalization for triple-target microarray experiments.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18442385</link>
      <description>Publication Date: 2008 Apr 28 PMID: 18442385&lt;br/&gt;Authors: Martin-Magniette, M. L. - Aubert, J. - Bar-Hen, A. - Elftieh, S. - Magniette, F. - Renou, J. P. - Daudin, J. J.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Most microarray studies are made using labelling with one or two dyes which allows the hybridization of one or two samples on the same slide. In such experiments, the most frequently used dyes are Cy3 and Cy5. Recent improvements in the technology (dye-labelling, scanner and, image analysis) allow hybridization up to four samples simultaneously. The two additional dyes are Alexa 488 and Alexa 494. The triple-target or four-target technology is very promising, since it allows us more flexibility in the design of experiments, an increase in the statistical power when comparing gene expressions induced by different conditions and a scaled down number of slides. However, there have been few methods proposed for statistical analysis of such data. Moreover the lowess correction of the global dye effect is available for only two-color experiments, and even if its application can be derived, it does not allow simultaneous correction of the raw data. RESULTS: We propose a two-step normalization procedure for triple-target experiments. First the dye bleeding is evaluated and corrected if necessary. Then the signal in each channel is normalized using a generalized lowess procedure to correct a global dye bias. The normalization procedure is validated using triple-self experiments and by comparing the results of triple-target and two-color experiments. Although the focus is on triple-target microarrays, the proposed method can be used to normalize p differently labelled targets co-hybridized on a same array, for any value of p greater than 2. CONCLUSIONS: The proposed normalization procedure is effective: the technical biases are reduced, the number of false positives is under control in the analysis of differentially expressed genes, and the triple-target experiments are more powerful than the corresponding two-color experiments. There is room for improving the microarray experiments by simultaneously hybridizing more than two samples.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18442385&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Binning sequences using very sparse labels within a metagenome.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18442374</link>
      <description>Publication Date: 2008 PMID: 18442374&lt;br/&gt;Authors: Chan, C. K. - Hsu, A. L. - Halgamuge, S. K. - Tang, S. L.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity. RESULTS: The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM), and called Seeded GSOM (S-GSOM). We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds.The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the &gt;/= 10 reads datasets and comparable in the &gt; or = 8 kb benchmark tests. CONCLUSION: In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of the methods tested. Most importantly, the proposed method does not require knowledge from known genomes and uses only very few labels (one per species is sufficient in most cases), which are extracted from the metagenome itself. These advantages make it a very attractive binning method. S-GSOM outperformed the binning methods that depend on already-sequenced genomes, and compares well to the current most advanced binning method, PhyloPythia.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18442374&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>M-BISON: microarray-based integration of data sources using networks.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18439292</link>
      <description>Publication Date: 2008 Apr 25 PMID: 18439292&lt;br/&gt;Authors: Daigle, B. J. Jr - Altman, R. B.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes. RESULTS: M-BISON improves signal detection on a range of simulated data, particularly when using very noisy microarray data. We also applied the method to the task of predicting heat shock-related differentially expressed genes in S. cerevisiae, using an hsf1 mutant microarray dataset and conserved yeast DNA sequence motifs. Our results demonstrate that M-BISON improves the analysis quality and makes predictions that are easy to interpret in concert with incorporated knowledge. Specifically, M-BISON increases the AUC of DE gene prediction from .541 to .623 when compared to a method using only microarray data, and M-BISON outperforms a related method, GeneRank. Furthermore, by analyzing M-BISON predictions in the context of the background knowledge, we identified YHR124W as a potentially novel player in the yeast heat shock response. CONCLUSIONS: This work provides a solid foundation for the principled integration of imperfect biological knowledge with gene expression data and other high-throughput data sources.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18439292&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A new protein linear motif benchmark for multiple sequence alignment software.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18439277</link>
      <description>Publication Date: 2008 PMID: 18439277&lt;br/&gt;Authors: Perrodou, E. - Chica, C. - Poch, O. - Gibson, T. J. - Thompson, J. D.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. RESULTS: We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. CONCLUSION: We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18439277&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18439255</link>
      <description>Publication Date: 2008 Apr 25 PMID: 18439255&lt;br/&gt;Authors: Katoh, K. - Toh, H.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized. RESULTS: We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2) a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage. CONCLUSION: The BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS) criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at http://align.bmr.kyushu-u.ac.jp/mafft/software/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18439255&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Correction: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18435841</link>
      <description>Publication Date: 2008 Apr 24 PMID: 18435841&lt;br/&gt;Authors: Kim, H. - Park, H. - Drake, B. L.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: The original article we published [Kim et al: BMC Bioinformatics 2007, 8(Suppl 9):S6] lacked adequate attribution and credit to a previously published work [Homayouri et al: Bioinformatics 2005, 1:104-115]. Thus, we would like to make the following corrections and we would like to apologize to the author group and the readers.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18435841&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18433478</link>
      <description>Publication Date: 2008 Apr 23 PMID: 18433478&lt;br/&gt;Authors: Cheng, K. O. - Law, N. F. - Siu, W. C. - Liew, A. W.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only [1]. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis. RESULTS: We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm. CONCLUSIONS: We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18433478&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Discovering biclusters in gene expression data based on high-dimensional linear geometries.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18433477</link>
      <description>Publication Date: 2008 Apr 23 PMID: 18433477&lt;br/&gt;Authors: Gan, X. - Liew, A. W. - Yan, H.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. RESULTS: In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes. CONCLUSION: We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18433477&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Conserved co-expression for candidate disease gene prioritization.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18433471</link>
      <description>Publication Date: 2008 PMID: 18433471&lt;br/&gt;Authors: Oti, M. - van Reeuwijk, J. - Huynen, M. A. - Brunner, H. G.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Genes that are co-expressed tend to be involved in the same biological process. However, co-expression is not a very reliable predictor of functional links between genes. The evolutionary conservation of co-expression between species can be used to predict protein function more reliably than co-expression in a single species. Here we examine whether co-expression across multiple species is also a better prioritizer of disease genes than is co-expression between human genes alone. RESULTS: We use co-expression data from yeast (S. cerevisiae), nematode worm (C. elegans), fruit fly (D. melanogaster), mouse and human and find that the use of evolutionary conservation can indeed improve the predictive value of co-expression. The effect that genes causing the same disease have higher co-expression than do other genes from their associated disease loci, is significantly enhanced when co-expression data are combined across evolutionarily distant species. We also find that performance can vary significantly depending on the co-expression datasets used, and just using more data does not necessarily lead to better prioritization. Instead, we find that dataset quality is more important than quantity, and using a consistent microarray platform per species leads to better performance than using more inclusive datasets pooled from various platforms. CONCLUSION: We find that evolutionarily conserved gene co-expression prioritizes disease candidate genes better than human gene co-expression alone, and provide the integrated data as a new resource for disease gene prioritization tools.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18433471&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Extraction of semantic biomedical relations from text using conditional random fields.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18433469</link>
      <description>Publication Date: 2008 PMID: 18433469&lt;br/&gt;Authors: Bundschus, M. - Dejori, M. - Stetter, M. - Tresp, V. - Kriegel, H. P.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: The increasing amount of published literature in biomedicine represents an immense source of knowledge, which can only efficiently be accessed by a new generation of automated information extraction tools. Named entity recognition of well-defined objects, such as genes or proteins, has achieved a sufficient level of maturity such that it can form the basis for the next step: the extraction of relations that exist between the recognized entities. Whereas most early work focused on the mere detection of relations, the classification of the type of relation is also of great importance and this is the focus of this work. In this paper we describe an approach that extracts both the existence of a relation and its type. Our work is based on Conditional Random Fields, which have been applied with much success to the task of named entity recognition. RESULTS: We benchmark our approach on two different tasks. The first task is the identification of semantic relations between diseases and treatments. The available data set consists of manually annotated PubMed abstracts. The second task is the identification of relations between genes and diseases from a set of concise phrases, so-called GeneRIF (Gene Reference Into Function) phrases. In our experimental setting, we do not assume that the entities are given, as is often the case in previous relation extraction work. Rather the extraction of the entities is solved as a subproblem. Compared with other state-of-the-art approaches, we achieve very competitive results on both data sets. To demonstrate the scalability of our solution, we apply our approach to the complete human GeneRIF database. The resulting gene-disease network contains 34758 semantic associations between 4939 genes and 1745 diseases. The gene-disease network is publicly available as a machine-readable RDF graph. CONCLUSION: We extend the framework of Conditional Random Fields towards the annotation of semantic relations from text and apply it to the biomedical domain. Our approach is based on a rich set of textual features and achieves a performance that is competitive to leading approaches. The model is quite general and can be extended to handle arbitrary biological entities and relation types. The resulting gene-disease network shows that the GeneRIF database provides a rich knowledge source for text mining. Current work is focused on improving the accuracy of detection of entities as well as entity boundaries, which will also greatly improve the relation extraction performance.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18433469&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>VariVis: a visualisation toolkit for variation databases.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18430251</link>
      <description>Publication Date: 2008 PMID: 18430251&lt;br/&gt;Authors: Smith, T. D. - Cotton, R. G.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: With the completion of the Human Genome Project and recent advancements in mutation detection technologies, the volume of data available on genetic variations has risen considerably. These data are stored in online variation databases and provide important clues to the cause of diseases and potential side effects or resistance to drugs. However, the data presentation techniques employed by most of these databases make them difficult to use and understand. RESULTS: Here we present a visualisation toolkit that can be employed by online variation databases to generate graphical models of gene sequence with corresponding variations and their consequences. The VariVis software package can run on any web server capable of executing Perl CGI scripts and can interface with numerous Database Management Systems and &quot;flat-file&quot; data files. VariVis produces two easily understandable graphical depictions of any gene sequence and matches these with variant data. While developed with the goal of improving the utility of human variation databases, the VariVis package can be used in any variation database to enhance utilisation of, and access to, critical information.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18430251&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GAPscreener: An Automatic Tool for Screening Human Genetic Association Literature in PubMed Using the Support Vector Machine Technique.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18430222</link>
      <description>Publication Date: 2008 Apr 22 PMID: 18430222&lt;br/&gt;Authors: Yu, W. - Clyne, M. - Dolan, S. M. - Yesupriya, A. - Wulf, A. - Liu, T. - Khoury, M. J. - Gwinn, M.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. RESULTS: The data source for this research was the HuGE Navigator (www.hugenavigator.net), formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. CONCLUSION: GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge from http://www.hugenavigator.net/HuGENavigator/HNDescription/opensource_GAP.ht m.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18430222&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Major copy proportion analysis of tumor samples using SNP arrays.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18426588</link>
      <description>Publication Date: 2008 PMID: 18426588&lt;br/&gt;Authors: Li, C. - Beroukhim, R. - Weir, B. A. - Winckler, W. - Garraway, L. A. - Sellers, W. R. - Meyerson, M.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most common genetic variations in the human genome and are useful as genomic markers. Oligonucleotide SNP microarrays have been developed for high-throughput genotyping of up to 900,000 human SNPs and have been used widely in linkage and cancer genomics studies. We have previously used Hidden Markov Models (HMM) to analyze SNP array data for inferring copy numbers and loss-of-heterozygosity (LOH) from paired normal and tumor samples and unpaired tumor samples. RESULTS: We proposed and implemented major copy proportion (MCP) analysis of oligonucleotide SNP array data. A HMM was constructed to infer unobserved MCP states from observed allele-specific signals through emission and transition distributions. We used 10 K, 100 K and 250 K SNP array datasets to compare MCP analysis with LOH and copy number analysis, and showed that MCP performs better than LOH analysis for allelic-imbalanced chromosome regions and normal contaminated samples. The major and minor copy alleles can also be inferred from allelic-imbalanced regions by MCP analysis. CONCLUSION: MCP extends tumor LOH analysis to allelic imbalance analysis and supplies complementary information to total copy numbers. MCP analysis of mixing normal and tumor samples suggests the utility of MCP analysis of normal-contaminated tumor samples. The described analysis and visualization methods are readily available in the user-friendly dChip software.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18426588&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18426580</link>
      <description>Publication Date: 2008 Apr 21 PMID: 18426580&lt;br/&gt;Authors: Zhang, Y. - Xuan, J. - de Los Reyes, B. G. - Clarke, R. - Ressom, H. W.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network (TRN) with strong similarity to the structure of the underlying genetic regulatory modules. Decomposing the TRN into a small set of recurring regulatory patterns, called network motifs (NM), facilitates the inference. Identifying NMs defined by specific transcription factors (TF) establishes the framework structure of a TRN and allows the inference of TF-target gene relationship. This paper introduces a computational framework for utilizing data from multiple sources to infer TF-target gene relationships on the basis of NMs. The data include time course gene expression profiles, genome-wide location analysis data, binding sequence data, and gene ontology (GO) information. RESULTS: The proposed computational framework was tested using gene expression data associated with cell cycle progression in yeast. Among 800 cell cycle related genes, 85 were identified as candidate TFs and classified into four previously defined NMs. The NMs for a subset of TFs are obtained from literature. Support vector machine (SVM) classifiers were used to estimate NMs for the remaining TFs. The potential downstream target genes for the TFs were clustered into 34 biologically significant groups. The relationships between TFs and potential target gene clusters were examined by training recurrent neural networks whose topologies mimic the NMs to which the TFs are classified. The identified relationships between TFs and gene clusters were evaluated using the following biological validation and statistical analyses: (1) Gene set enrichment analysis (GSEA) to evaluate the clustering results; (2) Leave-one-out cross-validation (LOOCV) to ensure that the SVM classifiers assign TFs to NM categories with high confidence; (3) Binding site enrichment analysis (BSEA) to determine enrichment of the gene clusters for the cognate binding sites of their predicted TFs; (4) Comparison with previously reported results in the literatures to confirm the inferred regulations. CONCLUSIONS: The major contribution of this study is the development of a computational framework to assist the inference of TRN by integrating heterogeneous data from multiple sources and by decomposing a TRN into NM-based modules. The inference capability of the proposed framework is verified statistically (e.g., LOOCV) and biologically (e.g., GSEA, BSEA, and literature validation). The proposed framework is useful for inferring small NM-based modules of TF-target gene relationships that can serve as a basis for generating new testable hypotheses.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D18426580&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Missing value imputation improves clustering and interpretation of gene expression microarray data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=18423022</link>
      <description>Publication Date: 2008 Apr 18 PMID: 18423022&lt;br/&gt;Authors: Tuikkala, J. - Elo, L. L. - Nevalainen, O. S. - Aittokallio, T.&lt;br/&gt;Journal: BMC Bioinformatics&lt;br/&gt;&lt;br/&gt;ABSTRACT: BACKGROUND: Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used. RESULTS: We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods. CONCLUSIONS: The results demonstrate that, while missing values are still severely complicating