<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
  xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Bioinformatics</title>
    <link>http://barf.jcowboy.org</link>
    <description>Bioinformatics recent publications</description>
    <language>en-us</language>
    <image>
      <url>http://barf.jcowboy.org/pubmed.gif</url>
      <title>the data for this feed is provided by PubMed</title>
      <link>http://barf.jcowboy.org</link>
    </image>
    <item>
      <title>Predicting Kinase Substrates using Conservation of Local Motif Density.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22302575</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22302575&lt;br/&gt;Authors: Lai, A. C. - Nguyen Ba, A. N. - Moses, A. M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Protein kinases represent critical links in cell signaling. A central problem in computational biology is to systematically identify their substrates. RESULTS: This study introduces a new method to predict kinase substrates by extracting evolutionary information from multiple sequence alignments in a manner that is tolerant to degenerate motif positioning. Given a known consensus, the new method (ConDens) compares the observed density of matches to a null model of evolution and does not require labeled training data. We confirmed that ConDens has improved performance compared to several existing methods in the field. Further, we show that it is generalizable and can predict interesting substrates for several important eukaryotic kinases where training data is not available.Availability and Implementation: ConDens can be found at http://www.moseslab.csb.utoronto.ca/andyl/. CONTACT: alan.moses@utoronto.ca SUPPLEMENTARY INFORMATION: Supplementary data is available at bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22302575&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single-end reads.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22302574</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22302574&lt;br/&gt;Authors: Sun, R. - Love, M. I. - Zemojtel, T. - Emde, A. K. - Chung, H. R. - Vingron, M. - Haas, S. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We developed Breakpointer, a fast algorithm to locate breakpoints of structural variants (SVs) from single-end reads produced by next-generation sequencing (NGS). By taking advantage of local non-uniform read distribution and misalignments created by SVs, Breakpointer scans the alignment of single-end reads to identify regions containing potential breakpoints. The detection of such break-points can indicate insertions longer than the read length and SVs located in repetitve regions which might be missd by other methods. Thus, Breakpointer complements existing methods to locate SVs from single-end reads. AVAILABILITY: https://github.com/ruping/Breakpointer CONTACT: ruping@molgen.mpg.de.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22302574&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Gencrypt: One-way cryptographic hashes to detect overlapping individuals across samples.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22302573</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22302573&lt;br/&gt;Authors: Turchin, M. C. - Hirschhorn, J. N.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Meta-analysis across genome-wide association studies is a common approach for discovering genetic associations. However, in some meta-analysis efforts, individual-level data cannot be broadly shared by study investigators due to privacy and IRB concerns. In such cases, researchers cannot confirm that each study represents a unique group of people, leading to potentially inflated test statistics and false positives. To resolve this problem, we created a software tool, Gencrypt, which utilizes a security protocol known as one-way cryptographic hashes to allow overlapping participants to be identified without sharing individual-level data. AVAILABILITY: Gencrypt is freely available under the GNU general public license v3 at http://www.broadinstitute.org/software/gencrypt/ CONTACT: joelh@broadinstitute.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22302573&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>pymzML - Python module for high throughput bioinformatics on mass spectrometry data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22302572</link>
      <description>Publication Date: 2012 Feb 2 PMID: 22302572&lt;br/&gt;Authors: Bald, T. - Barth, J. - Niehues, A. - Specht, M. - Hippler, M. - Fufezan, C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: pymzML is an extension to Python that offers a) an easy access to mass spectrometry (MS) data which allows the rapid development of tools, b) a very fast parser for mzML data, the standard data format in mass spectrometry and c) a set of functions to compare or handle spectra.Availability and Implementation: pymzML requires Python2.6.5+ and is fully compatible with Python3. The module is freely available on http://pymzml.github.com or pypi, is published under LGPL license and requires no additional modules to be installed. CONTACT: christian@fufezan.net.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22302572&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Evaluating the Drosophila Bicoid morphogen gradient system through dissecting the noise in transcriptional bursts.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22302571</link>
      <description>Publication Date: 2012 Feb 2 PMID: 22302571&lt;br/&gt;Authors: He, F. - Ren, J. - Wang, W. - Ma, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: We describe a statistical model to dissect the noise in transcriptional bursts in a developmental system. RESULTS: We assume that, at any given moment of time, each copy of a native gene inside a cell can exist in either a bursting (active) or non-bursting (inactive) state. The experimentally measured total noise in the transcriptional states of a gene in a population of cells can be mathematically dissected into two contributing components, internal and external. While internal noise quantifies the stochastic nature of transcriptional bursts, external noise is caused by cell-to-cell differences including fluctuations in activator concentration. We use our developed methods to analyze the Drosophila Bicoid (Bcd) morphogen gradient system. For its target gene hunchback (hb), the noise properties can be recapitulated by a simplified gene regulatory model in which Bcd acts as the only input, suggesting that the external noise in hb transcription is primarily derived from fluctuations in the Bcd activator input. However, such a simplified gene regulatory model is insufficient to predict the noise properties of another Bcd target gene, orthodentical (otd), suggesting that otd transcription is sensitive to additional external fluctuations beyond those of Bcd. Our results show that analysis of the relationship between input and output noise can reveal important insights into how a morphogen gradient system works. Our study also advances the knowledge about transcription at a fundamental level.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22302571&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Reconstructing Disease Phenome-genome Association by Bi-Random Walk.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22302570</link>
      <description>Publication Date: 2012 Feb 2 PMID: 22302570&lt;br/&gt;Authors: Xie, M. - Hwang, T. - Kuang, R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Promising results were recently reported in utilizing network information in phenotype-similarity network and gene-interaction network with graph-based learning to derive new disease phenotype-gene associations. However, a more fundamental understanding of how the network information is relevant to disease phenotype-gene associations is lacking. In this paper, we analyze the circular bigraphs (CBGs) in OMIM phenotype-gene association networks, and introduce a bi-random walk (BiRW) algorithm to capture the CBG patterns in the networks for unveiling the associations between the complete collection of disease phenotypes (phenome) and genes. BiRW performs separate random walk simultaneously on gene interaction network and phenotype similarity network to explore gene paths and phenotype paths in CBGs of different sizes. RESULTS: In the analysis of OMIM associations, we discovered that 81% of the associations are covered by CBG patterns of path-length up to 3 with variability by 21 disease classes, and there is a clear correlation between the CBG coverage and the predictability of the phenotype-gene associations. Some prominent examples are cancers, nutritional diseases, dermatological diseases, bone diseases, cardiovascular diseases and respiratory diseases. Experiments on recovering known associations in cross-validation and predicting new associations in a test set validated that BiRW effectively improved prediction performance over existing methods by ranking more known associations in the top 100 out of more than 12,000 candidate genes. The investigation of the global disease phenome-genome association map also revealed interesting new predictions and phenotype-gene modules by disease classes. AVAILABILITY: http://compbio.cs.umn.edu/BiRW CONTACT: kuang@cs.umn.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22302570&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>ArchTEx: accurate extraction and visualization of Next-Generation Sequence data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22302569</link>
      <description>Publication Date: 2012 Feb 2 PMID: 22302569&lt;br/&gt;Authors: Lai, W. K. - Bard, J. E. - Buck, M. J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The extension of mapped sequence tags is a common step in the analysis of single-end Next-Generation Sequencing (NGS) data from protein localization and chromatin studies. The optimal extension can vary depending on experimental and technical conditions. Improper extension of sequence tags can obscure or mislead the interpretation of NGS results. We present an algorithm, ArchTEx (Architectural Tag Extender), which identifies the optimal extension of sequence tags based on the maximum correlation between forward and reverse tags and extracts and visualizes sites of interest using the predicted extension.Availability and Implementation: ArchTEx requires Java 1.6 or newer. Source code and the compiled program are freely available at http://sourceforge.net/projects/archtex/ CONTACT: mjbuck@buffalo.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22302569&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Hadoop-BAM: Directly manipulating next generation sequencing data in the cloud.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22302568</link>
      <description>Publication Date: 2012 Feb 2 PMID: 22302568&lt;br/&gt;Authors: Niemenmaa, M. - Kallio, A. - Schumacher, A. - Klemela, P. - Korpelainen, E. - Heljanko, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Hadoop-BAM is a novel library for the scalable manipulation of aligned next generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large scale distributed processing. In this paper we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps. AVAILABILITY: Available under the open-source MIT license at http://sourceforge.net/projects/hadoop-bam/ CONTACT: matti.niemenmaa@aalto.fi.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22302568&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A gene-based test of association using canonical correlation analysis.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22296789</link>
      <description>Publication Date: 2012 Jan 31 PMID: 22296789&lt;br/&gt;Authors: Tang, C. S. - Ferreira, M. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Canonical correlation analysis measures the association between two sets of multidimensional variables. We reasoned that CCA could provide an efficient and powerful approach for both univariate and multivariate gene-based tests of association without the need for permutation testing. RESULTS: Compared to a commonly used permutation-based approach, CCA (1) is faster; (2) has appropriate type-I error rate for normally distributed quantitative traits; (3) provides comparable power for small to medium-sized genes (&lt;100 kb); (4) provides greater power when the causal variants are uncommon; (5) provides considerably less power for larger genes (100 kb or greater) when the causal variants have a broad MAF spectrum. Application to a GWAS of leukocyte levels identified SAFB and a histone gene cluster as novel putative loci harboring multiple independent variants regulating lymphocyte and neutrophil counts. AVAILABILITY: http://genepi.qimr.edu.au/staff/manuelF/gene/main.html CONTACT: manuel.ferreira@qimr.edu.au SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22296789&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>WegoLoc: accurate prediction of protein subcellular localization using weighted Gene Ontology terms.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22296788</link>
      <description>Publication Date: 2012 Jan 31 PMID: 22296788&lt;br/&gt;Authors: Chi, S. M. - Nam, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We present an accurate and fast web server, WegoLoc for predicting subcellular localization of proteins based on sequence similarity and weighted Gene Ontology (GO) information. A term weighting method in the text categorization process is applied to GO terms for a support vector machine classifier. As a result, WegoLoc surpasses the state-of-the-art methods for previously used test datasets. WegoLoc supports three eukaryotic kingdoms (animals, fungi and plants) and provides human specific analysis, and covers several sets of cellular locations. In addition, WegoLoc provides (1) multiple possible localizations of input protein(s) as well as their corresponding probability scores, (2) weights of GO terms representing the contribution of each GO term in the prediction, and (3) a BLAST E-value for the best hit with GO terms. If the similarity score does not meet a given threshold, an amino acid composition-based prediction is applied as a backup method. AVAILABILITY: WegoLoc is freely available at the web site http://www.btool.org/WegoLoc. CONTACT: smchiks@ks.ac.kr, dougnam@unist.ac.kr SUPPLEMENTARY INFORMATION: Supplementary Material and User's guide are also available at http://www.btool.org/WegoLoc.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22296788&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Predicting relapse prior to transplantation in chronic myeloid leukemia by integrating expert knowledge and expression data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22296787</link>
      <description>Publication Date: 2012 Jan 31 PMID: 22296787&lt;br/&gt;Authors: Yeung, K. Y. - Gooley, T. A. - Zhang, A. - Raftery, A. E. - Radich, J. P. - Oehler, V. G.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Selecting a small number of signature genes for accurate classification of samples is essential for the development of diagnostic tests. However, many genes are highly correlated in gene expression data, and hence, many possible sets of genes are potential classifiers. Because treatment outcomes are poor in advanced chronic myeloid leukemia (CML), we hypothesized that expression of classifiers of advanced phase CML when detected in early CML (chronic phase (CP) CML), correlates with subsequent poorer therapeutic outcome. RESULTS: We developed a method that integrates gene expression data with expert knowledge and predicted functional relationships using iterative Bayesian model averaging. Applying our integrated method to CML, we identified small sets of signature genes that are highly predictive of disease phases and that are more robust and stable than using expression data alone. The accuracy of our algorithm was evaluated using cross validation on the gene expression data. We then tested the hypothesis that gene sets associated with advanced phase CML would predict relapse after allogeneic transplantation in 176 independent CP CML cases. Our gene signatures of advanced phase CML are predictive of relapse even after adjustment for known risk factors associated with transplant outcomes. AVAILABILITY: The source codes and datasets used are available from our supplementary web site. CONTACT: kayee@u.washington.edu SUPPLEMENTARY INFORMATION: http://expression.washington.edu/publications/kayee/integratedBMA.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22296787&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Inhibition of HIV-1 protease: the rigidity perspective.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22291339</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22291339&lt;br/&gt;Authors: Heal, J. W. - Jimenez-Roldan, J. E. - Wells, S. A. - Freedman, R. B. - Romer, R. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: HIV-1 protease is a key drug target due to its role in the life cycle of the HIV-1 virus. Rigidity analysis using the software First is a computationally inexpensive method for inferring functional information from protein crystal structures. We evaluate the rigidity of 206 high-resolution (2 A or better) X-ray crystal structures of HIV-1 protease and compare the effects of different inhibitors binding to the enzyme. RESULTS: Inhibitor binding has little effect on the overall rigidity of the protein homodimer, including the rigidity of the active site. The principal effect of inhibitor binding on rigidity is to constrain the flexibility of the beta-hairpin flaps, which move to allow access to the active site of the enzyme. We show that commercially available antiviral drugs which target HIV-1 protease can be divided into two classes, those which significantly affect flap rigidity and those which do not. The non-peptidic inhibitor tipranavir is distinctive in its consistently strong effect on flap rigidity. CONTACT: jack.heal@warwick.ac.uk; r.roemer@warwick.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22291339&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285832</link>
      <description>Publication Date: 2012 Jan 28 PMID: 22285832&lt;br/&gt;Authors: Liu, C. M. - Wong, T. - Wu, E. - Luo, R. - Yiu, S. M. - Li, Y. - Wang, B. - Yu, C. - Chu, X. - Zhao, K. - Li, R. - Lam, T. W.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: SOAP3 is the first short read alignment tool that leverages the multi-processors in a graphic processing unit (GPU) to achieve a drastic improvement in speed. We adapted the compressed full-text index (BWT) used by SOAP2 in view of the advantages and disadvantages of GPU. When tested with millions of Illumina Hiseq 2000 length-100bp reads, SOAP3 takes less than 30 seconds to align a million read pairs onto the human reference genome and is at least 7.5 and 20 times faster than BWA and Bowtie, respectively. For aligning reads with up to 4 mismatches, SOAP3 aligns slightly more reads than BWA and Bowtie; this is because SOAP3, unlike BWA and Bowtie, is not heuristic-based and always reports all answers. AVAILABILITY: SOAP3 is available at: http://www.cs.hku.hk/2bwt-tools/soap3; and http://soap.genomics.org.cn/soap3.html. CONTACT: liruiqiang@gmail.com, twlam@cs.hku.hk.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285832&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A new approach to bias correction in RNA-Seq.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285831</link>
      <description>Publication Date: 2012 Jan 28 PMID: 22285831&lt;br/&gt;Authors: Jones, D. C. - Ruzzo, W. L. - Peng, X. - Katze, M. G.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Quantification of sequence abundance in RNA-Seq experiments is often conflated by protocol-specific sequence bias. The exact sources of the bias are unknown, but may be influenced by PCR amplification, or differing primer affinities and mixtures, for example. The result is decreased accuracy in many applications, such as de novo gene annotation and transcript quantification. RESULTS: We present a new method to measure and correct for these influences using a simple graphical model. Our model does not rely on existing gene annotations, and model selection is performed automatically making it applicable with few assumptions. We evaluate our method on several data sets, and by multiple criteria, demonstrating that it effectively decreases bias and increases uniformity. Additionally, we provide theoretical and empirical results showing that the method is unlikely to have any effect on unbiased data, suggesting it can be applied with little risk of spurious adjustment. AVAILABILITY: The method is implemented in the seqbias R/Bioconductor package, available freely under the LGPL license from http://bioconductor.org. CONTACT: dcjones@cs.washington.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285831&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A co-module approach for elucidating drug-disease associations and revealing their molecular basis.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285830</link>
      <description>Publication Date: 2012 Jan 28 PMID: 22285830&lt;br/&gt;Authors: Zhao, S. - Li, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Understanding how drugs and diseases are associated in the molecular level is of critical importance to unveil disease mechanisms and treatments. Until recently, few studies attempt to discover important gene modules shared by both drugs and diseases. RESULTS: Here, we propose a novel presentation of drug-gene-disease relationship, a 'co-module', which is characterized by closely related drugs, diseases and genes. We first define a network-based gene closeness profile to relate drug to disease. Then, we develop a Bayesian partition method to identify drug-gene-disease co-modules underlying the gene closeness data. Genes share similar notable patterns with respect not only to the drugs but also the diseases within a co-module. Simulations show that our method, comCIPHER, achieves a better performance compared with a popular co-module detection method, PPA. We apply comCIPHER to a set consisting of 723 drugs, 275 diseases and 1442 genes and demonstrate our co-module approach is able to identify new drug-disease associations and highlight their molecular basis. Disease co-morbidity emerges as well. Three co-modules are further illustrated in which new drug applications, including the anti-metastasis activity of Pranlukast and Arbutamine for obesity, as well as potential side-effects, e.g. hypotension for Triamterene, are computationally identified. AVAILABILITY: The compiled version of comCIPHER can be found at http://bioinfo.au.tsinghua.edu.cn/comCIPHER/ CONTACT: shaoli@mail.tsinghua.edu.cn.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285830&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Measurement of genome-wide RNA synthesis and decay rates with Dynamic Transcriptome Analysis (DTA).</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285829</link>
      <description>Publication Date: 2012 Jan 28 PMID: 22285829&lt;br/&gt;Authors: Schwalb, B. - Schulz, D. - Sun, M. - Zacher, B. - Dumcke, S. - Martin, D. - Cramer, P. - Tresch, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Standard transcriptomics measures total cellular RNA levels. Our understanding of gene regulation would be greatly improved if we could measure RNA synthesis- and decay rates on a genome-wide level. To that end, the Dynamic Transcriptome Analysis (DTA) method has been developed. DTA combines metabolic RNA labeling with standard transcriptomics to measure these rates in a precise and non-perturbing manner. Here we present the open source R/Bioconductor software package DTA. It implements all required bioinformatics steps that allow the accurate absolute quantification and comparison of RNA turnover. AVAILABILITY: DTA is part of R/Bioconductor. To download and install DTA refer to http://bioconductor.org/packages/2.10/bioc/html/DTA.html. CONTACT: schwalb@lmb.uni-muenchen.de.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285829&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>HandAlign: Bayesian multiple sequence alignment, phylogeny, and ancestral reconstruction.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285828</link>
      <description>Publication Date: 2012 Jan 28 PMID: 22285828&lt;br/&gt;Authors: Westesson, O. - Barquist, L. - Holmes, I.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We describe handalign, a software package for Bayesian reconstruction of phylogenetic history. The underlying model of sequence evolution describes indels and substitutions. Alignments, trees, and model parameters are all treated as jointly-dependent random variables and sampled via Metropolis-Hastings Markov chain Monte Carlo (MCMC), enabling systematic statistical parameter inference and hypothesis testing. handalign implements several different MCMC proposal kernels, allows sampling from arbitrary target distributions via Hastings ratios, and uses standard file formats for trees, alignments and models.Availability and Implementation: Installation and usage instructions are at http://biowiki.org/HandAlign CONTACT: ihh@berkeley.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285828&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Chado Controller: advanced annotation management with a community annotation system.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285827</link>
      <description>Publication Date: 2012 Jan 28 PMID: 22285827&lt;br/&gt;Authors: Guignon, V. - Droc, G. - Alaux, M. - Baurens, F. C. - Garsmeur, O. - Poiron, C. - Carver, T. - Rouard, M. - Bocs, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. AVAILABILITY: The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form CONTACT: valentin.guignon@cirad.fr, stephanie.sidibe-bocs@cirad.fr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285827&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>The Transcriptome Analysis and Comparison Explorer - T-ACE: a platform-independent, graphical tool to process large RNAseq data sets of non-model organisms.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285826</link>
      <description>Publication Date: 2012 Jan 27 PMID: 22285826&lt;br/&gt;Authors: Philipp, E. E. - Kraemer, L. - Mountfort, D. - Schilhabel, M. - Schreiber, S. - Rosenstiel, P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence data sets in model and non-model organisms. However the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally T-ACE provides within and between transcriptome analysis modules on the level of expression, GO-terms, KEGG-pathways and protein-domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/ Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open source code provides a framework which can be customized according to the different needs of the user and transcriptome project.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285826&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>TurboKnot: Rapid Prediction of Conserved RNA Secondary Structures Including Pseudoknots.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285566</link>
      <description>Publication Date: 2012 Jan 27 PMID: 22285566&lt;br/&gt;Authors: Seetin, M. G. - Mathews, D. H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Many RNA molecules function without being translated into proteins, and function depends on structure. Pseudoknots are motifs in RNA secondary structures that are difficult to predict but are also often functionally important. RESULTS: TurboKnot is a new algorithm for predicting the secondary structure, including pseudoknotted pairs, conserved across multiple sequences. TurboKnot finds 81.6% of all known base pairs in the systems tested, and 75.6% of predicted pairs were found in the known structures. Pseudoknots are found with half or better of the false-positive rate of previous methods. AVAILABILITY: The program is available for download under an open source license as part of the RNAstructure package at: http://rna.urmc.rochester.edu. CONTACT: david_mathews@urmc.rochester.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285566&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Genotype Calling from Next Generation Sequencing Data using Haplotype Information of Reads.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285565</link>
      <description>Publication Date: 2012 Jan 27 PMID: 22285565&lt;br/&gt;Authors: Zhi, D. - Wu, J. - Liu, N. - Zhang, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Low coverage sequencing provides an economic strategy for whole genome sequencing. When sequencing a set of individuals, genotype calling can be challenging due to low sequencing coverage. Linkage disequilibrium (LD) based refinement of genotyping calling is essential to improve the accuracy. Current LD-based methods use read counts or genotype likelihoods at individual potential polymorphic sites. Reads that span multiple potential polymorphic sites (jumping reads) can provide additional haplotype information overlooked by current methods. RESULTS: In this paper we introduce a new Hidden Markov Model (HMM)-based method that can take into account jumping reads information across adjacent potential polymorphic sites and implement it in the HapSeq program. Our method extends the HMM in Thunder (Li, et al., 2010) and explicitly models jumping reads information as emission probabilities conditional on the states of adjacent potential polymorphic sites. Our simulation results show that, compared to Thunder, HapSeq reduces the genotyping error rate by 30%, from 0.86% to 0.60%. The results from the 1000 Genomes Project show that HapSeq reduces the genotyping error rate by 12% and 9%, from 2.24% and 2.76% to 1.97% and 2.50% for individuals with European and African ancestry, respectively. We expect our program can improve genotyping qualities of the large number of ongoing and planned whole genome sequencing projects. CONTACT: dzhi@ms.soph.uab.edu and kzhang@ms.soph.uab.edu. AVAILABILITY: The software package HapSeq and its manual can be found and downloaded at www.ssg.uab.edu/hapseq/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285565&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SynaptomeDB: an ontology-based knowledgebase for synaptic genes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285564</link>
      <description>Publication Date: 2012 Jan 27 PMID: 22285564&lt;br/&gt;Authors: Pirooznia, M. - Wang, T. - Avramopoulos, D. - Valle, D. - Thomas, G. - Huganir, R. L. - Goes, F. S. - Potash, J. B. - Zandi, P. P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The synapse is integral to the function of the brain and may be an important source of dysfunction underlying many neuro-psychiatric disorders. Consequently, it is an excellent candidate for large-scale genomic and proteomic study. However, while the tools and databases available for the annotation of high-throughput DNA and protein are generally robust, a comprehensive resource dedicated to the integration of information about the synapse is lacking. RESULTS: We present an integrated database, called SynaptomeDB, to retrieve and annotate genes comprising the synaptome. These genes encode components of the synapse including neurotransmitters and their receptors, adhesion/cytoskeletal proteins, scaffold proteins, membrane transporters. SynaptomeDB integrates various and complex data sources for synaptic genes and proteins. AVAILABILITY: http://psychiatry.igm.jhmi.edu/SynaptomeDB/&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285564&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Functional microRNA targets in protein coding sequences.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285563</link>
      <description>Publication Date: 2012 Jan 27 PMID: 22285563&lt;br/&gt;Authors: Reczko, M. - Maragkakis, M. - Alexiou, P. - Grosse, I. - Hatzigeorgiou, A. G.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Experimental evidence has accumulated showing that microRNA (miRNA) binding sites within protein coding sequences (CDSs) are functional in controlling gene expression. RESULTS: Here we report a computational analysis of such miRNA target sites, based on features extracted from existing mammalian high throughput immunoprecipitation and sequencing data. The analysis is performed independently for the CDS and the 3' Untranslated Regions (3'UTRs) and reveals different sets of features and models for the two regions. The two models are combined into a novel computational model for miRNA target genes, DIANA-microT-CDS, which achieves higher sensitivity compared to other popular programs and the model that uses only the 3'UTR target sites. Further analysis indicates that genes with shorter 3'UTRs are preferentially targeted in the CDS, suggesting that evolutionary selection might favor additional sites on the CDS in cases where there is restricted space on the 3'UTR. AVAILABILITY: The results of DIANA-microT-CDS are available at www.microrna.gr/microT-CDS. CONTACT: hatzigeorgiou@fleming.gr SUPPLEMENTARY INFORMATION: The source of DIANA-microT-CDS is available at 195.251.21.4/martin/microT-CDS.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285563&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>JointSNVMix : A Probabilistic Model For Accurate Detection Of Somatic Mutations In Normal/Tumour Paired Next Generation Sequencing Data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285562</link>
      <description>Publication Date: 2012 Jan 27 PMID: 22285562&lt;br/&gt;Authors: Roth, A. - Morin, R. - Ding, J. - Crisan, A. - Ha, G. - Giuliany, R. - Bashashati, A. - Hirst, M. - Turashvili, G. - Oloumi, A. - Marra, M. A. - Aparicio, S. - Shah, S. P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Identification of somatic single nucleotide variants (SNVs) in tumour genomes is a necessary step in defining the mutational landscapes of cancers. Experimental designs for genome-wide ascertainment of somatic mutations now routinely include next generation sequencing (NGS) of tumour DNA and matched constitutional DNA from the same individual. This allows investigators to control for germline polymorphisms and distinguish somatic mutations that are unique to the tumour, thus reducing the burden of labour-intensive and expensive downstream experiments needed to verify initial predictions. In order to make full use of such paired datasets, computational tools for simultaneous analysis of tumour-normal paired sequence data are required, but are currently under-developed and under-represented in the bioinformatics literature. RESULTS: In this contribution, we introduce two novel probabilistic graphical models called JointSNVMix1 and JointSNVMix2 for jointly analysing paired tumour-normal digital allelic count data from NGS experiments. In contrast to independent analysis of the tumour and normal data, our method allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework. AVAILABILITY: The JointSNVMix models and four other models discussed in the paper are part of the JointSNVMix software package available for download at http://compbio.bccrc.ca. CONTACT: sshah@bccrc.ca.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285562&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Literature Mining of Host-Pathogen Interactions: Comparing Feature-based Supervised Learning and Language-based Approaches.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285561</link>
      <description>Publication Date: 2012 Jan 30 PMID: 22285561&lt;br/&gt;Authors: Thieu, T. - Joshi, S. - Warren, S. - Korkin, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: In an infectious disease, the pathogen's strategy to enter the host organism and breach its immune defenses often involves interactions between the host and pathogen proteins. Currently, the experimental data on host-pathogen interactions (HPIs) are scattered across multiple databases, which are often specialized to target a specific disease or host organism. An accurate and efficient method for the automated extraction of HPIs from biomedical literature is crucial for creating a unified repository of HPI data. RESULTS: Here, we introduce and compare two new approaches to automatically detect whether the title or abstract of a PubMed publication contains HPI data, and extract the information about organisms and proteins involved in the interaction. The first approach is a feature-based supervised learning method using support vector machines (SVMs). The SVM models are trained on the features derived from the individual sentences. These features include names of the host/pathogen organisms and corresponding proteins or genes, keywords describing HPI-specific information, more general protein-protein interaction information, experimental methods, and other statistical information. The language-based method employed a link grammar parser combined with semantic patterns derived from the training examples. The approaches have been trained and tested on manually curated HPI data. When compared to a naive approach based on the existing protein-protein interaction literature mining method, our approaches demonstrated higher accuracy and recall in the classification task. The most accurate, feature-based, approach achieved 73% to 76% accuracy, depending on the test protocol. AVAILABILITY: Both approaches are available through PHILM web-server: http://korkinlab.org/philm.html CONTACT: korkin@korkinlab.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285561&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>vHOG, a multi-species vertebrate ontology of homologous organs groups.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285560</link>
      <description>Publication Date: 2012 Jan 27 PMID: 22285560&lt;br/&gt;Authors: Niknejad, A. - Comte, A. - Parmentier, G. - Roux, J. - Bastian, F. B. - Robinson-Rechavi, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Most anatomical ontologies are species-specific, whereas a framework for comparative studies is needed. We describe the vertebrate Homologous Organs Groups ontology, vHOG, used to compare expression patterns between species. RESULTS: vHOG is a multi-species anatomical ontology for the vertebrate lineage. It is based on the HOGs used in the Bgee database of gene expression evolution. vHOG version 1.4 includes 1,184 terms, follows OBO principles, and is based on the Common Anatomy Reference Ontology (CARO). vHOG only describes structures with historical homology relations between model vertebrate species. The mapping to species-specific anatomical ontologies is provided as a separate file, so that no homology hypothesis is stated within the ontology itself. Each mapping has been manually reviewed, and we provide support codes and references when available.Availability and Implementation: vHOG is available from the Bgee download site (http://bgee.unil.ch/), as well as from the OBO Foundry and the NCBO Bioportal websites. CONTACT: bgee@isb-sib.ch.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285560&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Assumption weighting for incorporating heterogeneity into meta-analysis of genomic data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22285559</link>
      <description>Publication Date: 2012 Jan 27 PMID: 22285559&lt;br/&gt;Authors: Li, Y. - Ghosh, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: There is now a large literature on statistical methods for the meta-analysis of genomic data from multiple studies. However, crucial assumption for performing many of these analyses is that the data exhibit small between-study variation or that this heterogeneity can be sufficiently modelled probabilistically. RESULTS: In this article, we propose &quot;assumption weighting,&quot; which exploits a weighted hypothesis testing framework proposed by Genovese et al. (2006, Biometrika 93, 506 - 524) to incorporate tests of between-study variation into the meta-analysis context. This methodology is fast and computationally simple to implement. Several weighting schemes are considered and compared using simulation studies. In addition, we illustrate application of the proposed methodology using data from several high-profile stem cell gene expression datasets. AVAILABILITY: http://works.bepress.com/debashis_ghosh/50/ CONTACT: ghoshd@psu.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22285559&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Using biologically interrelated experiments to identify pathway genes in Arabidopsis.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22271267</link>
      <description>Publication Date: 2012 Jan 23 PMID: 22271267&lt;br/&gt;Authors: Kim, K. - Jiang, K. - Melinda Teng, S. L. - Feldman, L. J. - Huang, H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Pathway genes are considered as a group of genes that work cooperatively in the same pathway constituting a fundamental functional grouping in a biological process. Identifying pathway genes has been one of the major tasks in understanding biological processes. However, due to the difficulty in characterizing/inferring different types of biological gene relationships, as well as several computational issues arising from dealing with high-dimensional biological data, deducing genes in pathways remains challenging. RESULTS: In this work we elucidate higher-level gene-gene interactions by evaluating conditional dependencies between genes, i.e., the relationships between genes after removing the influences of a set of previously known pathway genes. These previously known pathway genes serve as seed genes in our model and will guide the detection of other genes involved in the same pathway. The detailed statistical techniques involve the estimation of a precision matrix whose elements are known to be proportional to partial correlations (i.e., conditional dependencies) between genes under appropriate normality assumptions. Likelihood ratio tests on two forms of precision matrices are further performed to see if a candidate pathway gene is conditionally independent of all the previously known pathway genes. When used effectively, this is a promising approach to recover gene relationships that would have otherwise been missed by standard methods. The advantage of the proposed method is demonstrated using both simulation studies and real datasets. We also demonstrated the importance of taking into account experimental dependencies in the simulation and real data studies. CONTACT: hhuang@stat.berkeley.edu, ljfeldman@berkeley.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22271267&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Mining and integration of pathway diagrams from imaging data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22267504</link>
      <description>Publication Date: 2012 Jan 20 PMID: 22267504&lt;br/&gt;Authors: Kozhenkov, S. - Baitaluk, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Pathway diagrams from PubMed and World Wide Web (WWW) contain valuable highly curated information difficult to reach without tools specifically designed and customized for the biological semantics and high-content density of the images. There is currently no search engine or tool that can analyze pathway images, extract their pathway components (molecules, genes, proteins, organelles, cells, organs, etc.), and indicate their relationships. RESULTS: Here we describe a resource of pathway diagrams retrieved from article and web-page images through optical character recognition (OCR), in conjunction with data-mining and data integration methods. The recognized pathways are integrated into the BiologicalNetworks research environment linking them to a wealth of data available in the BiologicalNetworks' knowledgebase, which integrates data from &gt;100 public data sources and the biomedical literature. Multiple search and analytical tools are available that allow the recognized cellular pathways, molecular networks and cell/tissue/organ diagrams to be studied in the context of integrated knowledge, experimental data and the literature. AVAILABILITY: BiologicalNetworks software and the pathway repository are freely available at www.biologicalnetworks.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22267504&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Boolean network model of the FA/BRCA pathway.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22267503</link>
      <description>Publication Date: 2012 Jan 20 PMID: 22267503&lt;br/&gt;Authors: Rodriguez, A. - Sosa, D. - Torres, L. - Molina, B. - Frias, S. - Mendoza, L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Fanconi Anemia is a chromosomal instability syndrome originated by inherited mutations that impair the FA/BRCA pathway, which is committed to the repair of DNA interstrand cross-links. The disease displays increased spontaneous chromosomal aberrations and hypersensitivity to agents that create DNA interstrand cross-links. In spite of DNA damage, FA/BRCA deficient cells are able to progress throughout the cell cycle, probably due to the activity of alternative DNA repair pathways, or due to defects in the checkpoints that monitor DNA integrity. RESULTS: We propose a Boolean network model of the FA/BRCA pathway, checkpoint proteins, and some alternative DNA repair pathways. To our knowledge this is the largest network model incorporating a DNA repair pathway. Our model is able to simulate the interstrand cross-link repair process mediated by the FA/BRCA pathway, the activation of checkpoint proteins observed by recurrent DNA damage, as well as the repair of DNA double strand breaks and DNA adducts. We generated a series of simulations for mutants, some of which have never been reported and thus constitute predictions about the function of the FA/BRCA pathway. Finally, our model suggests alternative DNA repair pathways that become active whenever the FA/BRCA pathway is defective. SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online. CONTACT: sarafrias@yahoo.com; lmendoza@biomedicas.unam.mx.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22267503&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>An effective statistical evaluation of ChIPseq dataset similarity.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22262674</link>
      <description>Publication Date: 2012 Jan 19 PMID: 22262674&lt;br/&gt;Authors: Chikina, M. - Troyanskaya, O.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: ChIPseq is rapidly becoming a common technique for investigating protein-DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non-biological experimental and post-processing variation. RESULTS: We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact p-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions).Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly. AVAILABILITY: Source code is available for download at Availability: Source code is available for download athttp://sonorus.princeton.edu/IntervalStats/IntervalStats.tar.gz CONTACT: ogt@princeton.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22262674&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>medpie: An information extraction package for medical message board posts.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22262673</link>
      <description>Publication Date: 2012 Jan 19 PMID: 22262673&lt;br/&gt;Authors: Benton, A. - Holmes, J. H. - Hill, S. - Chung, A. - Ungar, L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We have developed medpie, a software package for preparing medical message board corpora and extracting patient mentions and statistics for drugs, herbs, and adverse effects experienced from them. The package is divided into web-crawling, HTML-cleaning, de-identification, and information extraction modules. It also includes a sample controlled vocabulary of drugs, herbs, and adverse effect terms. AVAILABILITY: http://www.cis.upenn.edu/~ungar/medpie.zipDependencies: Python 2.6 or 2.7 CONTACT: ungar@cis.upenn.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22262673&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Construction and completion of flux balance models from pathway databases.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22262672</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22262672&lt;br/&gt;Authors: Latendresse, M. - Krummenacker, M. - Trupp, M. - Karp, P. D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Flux balance analysis (FBA) is a well-known technique for genome-scale modeling of metabolic flux. Typically, an FBA formulation requires the accurate specification of four sets: biochemical reactions, biomass metabolites, nutrients and secreted metabolites. The development of FBA models can be time consuming and tedious because of the difficulty in assembling completely accurate descriptions of these sets, and in identifying errors in the composition of these sets. For example, the presence of a single non-producible metabolite in the biomass will make the entire model infeasible. Other difficulties in FBA modeling are that model distributions, and predicted fluxes, can be cryptic and difficult to understand. RESULTS: We present a multiple gap-filling method to accelerate the development of FBA models using a new tool, called MetaFlux, based on mixed integer linear programming (MILP). The method suggests corrections to the sets of reactions, biomass metabolites, nutrients and secretions. The method generates FBA models directly from Pathway/Genome Databases. Thus, FBA models developed in this framework are easily queried and visualized using the Pathway Tools software. Predicted fluxes are more easily comprehended by visualizing them on diagrams of individual metabolic pathways or of metabolic maps. MetaFlux can also remove redundant high-flux loops, solve FBA models once they are generated and model the effects of gene knockouts. MetaFlux has been validated through construction of FBA models for Escherichia coli and Homo sapiens. AVAILABILITY: Pathway Tools with MetaFlux is freely available to academic users, and for a fee to commercial users. Download from: biocyc.org/download.shtml. CONTACT: mario.latendresse@sri.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22262672&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>AnnTools: A Comprehensive and Versatile Annotation Toolkit for Genomic Variants.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22257670</link>
      <description>Publication Date: 2012 Jan 18 PMID: 22257670&lt;br/&gt;Authors: Makarov, V. - O'Grady, T. - Cai, G. - Lihm, J. - Buxbaum, J. D. - Yoon, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: AnnTools is a versatile bioinformatics application designed for comprehensive annotation of a full spectrum of human genome variation: novel and known single nucleotide substitutions SNP/SNV), short insertions/deletions (INDEL) and structural variants/ copy number variation (SV/CNV). The variants are interpreted by interrogating data compiled from 15 constantly updated sources. In addition to detailed functional characterization of the coding variants, AnnTools searches for overlaps with regulatory elements, disease/trait associated loci, known segmental duplications and artifact prone regions, thereby offering an integrated and comprehensive analysis of genomic data. The tool conveniently accepts user-provided tracks for custom annotation and offers flexibility in input data formats. The output is generated in the universal Variant Call Format (VCF). High annotation speed makes AnnTools suitable for high-throughput sequencing facilities, while a low-memory footprint and modest CPU requirements allow it to operate on a personal computer. The application is freely available for public use; the package includes installation scripts and a set of helper tools. AVAILABILITY: http://anntools.sourceforge.net/ CONTACT: vladimir.makarov@mssm.edu, chris.yoon@mssm.edu SUPPLEMENTARY INFORMATION: http://anntools.sourceforge.net/&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22257670&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>The sva package for removing batch effects and other unwanted variation in high-throughput experiments.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22257669</link>
      <description>Publication Date: 2012 Jan 17 PMID: 22257669&lt;br/&gt;Authors: Leek, J. T. - Johnson, W. E. - Parker, H. S. - Jaffe, A. E. - Storey, J. D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Heterogeneity and latent variables are now widely recognized as major sources of bias and variability in high-throughput experiments. The most well-known source of latent variation in genomic experiments are batch effects - when samples are processed on different days, in different groups, or by different people. However, there are also a large number of other variables that may have a major impact on high-throughput measurements. Here we describe the sva package for identifying, estimating, and removing unwanted sources of variation in high-throughput experiments. The sva package supports surrogate variable estimation with the sva function, direct adjustment for known batch effects with the ComBat function, and adjustment for batch and latent variables in prediction problems with the fsva function. AVAILABILITY: The R package sva is freely available from http://www.bioconductor.org. CONTACT: jleek@jhsph.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22257669&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>COPICAT: A software system for predicting interactions between proteins and chemical compounds.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22257668</link>
      <description>Publication Date: 2012 Jan 17 PMID: 22257668&lt;br/&gt;Authors: Sakakibara, Y. - Hachiya, T. - Uchida, M. - Nagamine, N. - Sugawara, Y. - Yokota, M. - Nakamura, M. - Popendorf, K. - Komori, T. - Sato, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Since tens of millions of chemical compounds have been accumulated in public chemical databases, fast comprehensive computational methods to predict interactions between chemical compounds and proteins are needed for virtual screening of lead compounds. Previously, we proposed a novel method for predicting protein-chemical interactions using two-layer Support Vector Machine classifiers that require only readily available biochemical data, i.e., amino acid sequences of proteins and structure formulas of chemical compounds.In this paper, the method has been implemented as the COPICAT web service, with an easy-to-use front-end interface. Users can simply submit a protein-chemical interaction prediction job using a pre-trained classifier, or can even train their own classification model by uploading training data. COPICAT's fast and accurate computational prediction has enhanced lead compound discovery against a database of tens of millions of chemical compounds, implying that the search space for drug discovery is extended by more than 1,000 times compared with currently well-used high-throughput screening methodologies. AVAILABILITY: The COPICAT server is available at http://copicat.dna.bio.keio.ac.jp. All functions, including the prediction function are freely available via anonymous login without registration. Registered users, however, can use the system more intensively. CONTACT: yasu@bio.keio.ac.jp.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22257668&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>FX: an RNA-Seq analysis tool on the cloud.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22257667</link>
      <description>Publication Date: 2012 Jan 17 PMID: 22257667&lt;br/&gt;Authors: Hong, D. - Rhie, A. - Park, S. S. - Lee, J. - Ju, Y. S. - Kim, S. - Yu, S. B. - Bleazard, T. - Park, H. S. - Rhee, H. - Chong, H. - Yang, K. S. - Lee, Y. S. - Kim, I. H. - Lee, J. S. - Kim, J. I. - Seo, J. S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: FX is an RNA-Seq analysis tool, which runs in parallel on cloud computing infrastructure, for the estimation of gene expression levels and genomic variant calling. In the mapping of short RNA-Seq reads, FX uses a transcriptome-based reference primarily, generated from approximately 160,000 mRNA sequences from RefSeq, UCSC and Ensembl databases. This approach reduces the misalignment of reads originating from splicing junctions. Unmapped reads not aligned on known transcripts are then mapped on the human genome reference. FX allows analysis of RNA-Seq data on cloud computing infrastructures, supporting access through a user-friendly web interface. AVAILABILITY: FX is freely available on the web at (http://fx.gmi.ac.kr), and can be installed on local Hadoop clusters. Guidance for the installation and operation of FX can be found under the 'Documentation' menu on the website. CONTACT: jeongsun@snu.ac.kr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online and on TIARA (http://tiara.gmi.ac.kr).&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22257667&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Exact coalescent simulation of new haplotype data from existing reference haplotypes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22257666</link>
      <description>Publication Date: 2012 Jan 17 PMID: 22257666&lt;br/&gt;Authors: Kang, C. J. - Marjoram, P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: We introduce a coalescent-based method (RECOAL) for the simulation of new haplotype data from a reference population of haplotypes. A coalescent genealogy for the reference haplotype data is sampled from the appropriate posterior probability distribution, then a coalescent genealogy is simulated which extends the sampled genealogy to include new haplotype data. The new haplotype data will therefore contain both some of the existing polymorphic sites and new polymorphisms added based on the structure of the simulated coalescent genealogy. This allows exact coalescent simulation of new haplotype data, compared to other methods which are more approximate in nature. RESULTS: We demonstrate performance of our method using a variety of data simulated under a coalescent model, before applying it to data from the 1000 Genomes project. AVAILABILITY: The source code is freely available for download at ftp://popgen.usc.edu CONTACT: chulkang@usc.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22257666&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SNP calling using genotype model selection on high-throughput sequencing data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22253293</link>
      <description>Publication Date: 2012 Jan 16 PMID: 22253293&lt;br/&gt;Authors: You, N. - Murillo, G. - Su, X. - Zeng, X. - Xu, J. - Ning, K. - Zhang, S. - Zhu, J. - Cui, X.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for. RESULTS: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. AVAILABILITY: The GeMS package can be downloaded from https://sites.google.com/a/bioinformatics.ucr.edu/xinping-cui/home/softwar e or http://computationalbioenergy.org/software.html. CONTACT: xinping.cui@ucr.edu SUPPLEMENTARY INFORMATION: Supplementary figures and tables are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22253293&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>CODEX: Exploration of semantic changes between ontology versions.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22253292</link>
      <description>Publication Date: 2012 Jan 16 PMID: 22253292&lt;br/&gt;Authors: Hartung, M. - Gross, A. - Rahm, E.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Life science ontologies substantially change over time to meet the requirements of their users and to include the newest domain knowledge. Thus, an important task is to know what has been modified between two versions of an ontology (diff). This diff should contain all performed changes as compact and understandable as possible. We present CODEX (Complex Ontology Diff Explorer), a tool that allows determining semantic changes between two versions of an ontology which users can interactively analyze in multiple ways.Availability and Implementation: CODEX is available under http://www.izbi.de/codex and is supported by all major browsers. It is implemented in Java based on Google Web Toolkit technology. Additionally, users can access a web service interface to use the diff functionality in their applications and analyses. CONTACT: hartung@izbi.uni-leipzig.de.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22253292&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>ReLA, a local alignment search tool for the identification of distal and proximal gene regulatory regions and their conserved transcription factor binding sites.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22253291</link>
      <description>Publication Date: 2012 Jan 16 PMID: 22253291&lt;br/&gt;Authors: Gonzalez, S. - Montserrat-Sentis, B. - Sanchez, F. - Puiggros, M. - Blanco, E. - Ramirez, A. - Torrents, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The prediction and annotation of the genomic regions involved in gene expression has been largely explored. Most of the energy has been devoted to the development of approaches that detect transcription start sites (TSS), leaving the identification of regulatory regions and their functional transcription factor binding sites (TFBSs) largely unexplored and with important quantitative and qualitative methodological gaps. RESULTS: We have developed ReLA (for REgulatory region Local Alignment tool), a unique tool optimized with the Smith-Waterman algorithm that allows local searches of conserved TFBS clusters and the detection of regulatory regions proximal to genes and enhancer regions. ReLA's performance shows specificities of 81 and 50% when tested on experimentally validated proximal regulatory regions and enhancers, respectively. AVAILABILITY: The source code of ReLA's is freely available and can be remotely used through our web-server under http://www.bsc.es/cg/rela. CONTACT: David Torrents (david.torrents@bsc.es).&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22253291&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>IMA: An R package for high-throughput analysis of Illumina's 450K Infinium methylation data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22253290</link>
      <description>Publication Date: 2012 Jan 16 PMID: 22253290&lt;br/&gt;Authors: Wang, D. - Yan, L. - Hu, Q. - Sucheston, L. E. - Higgins, M. J. - Ambrosone, C. B. - Johnson, C. S. - Smiraglia, D. J. - Liu, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The Illumina Infinium HumanMethylation450 BeadChip is a newly designed high-density microarray for quantifying the methylation level of over 450,000 CpG sites within human genome. IMA (Illumina Methylation Analyzer) is a computational package designed to automate the pipeline for exploratory analysis and summarization of site-level and region-level methylation changes in epigenetic studies utilizing the 450K DNA methylation microarray. The pipeline loads the data from Illumina platform and provides user-customized functions commonly required to perform exploratory methylation analysis for individual sites as well as annotated regions. AVAILABILITY: IMA is implemented in the R language and is freely available from http://www.rforge.net/IMA. CONTACT: song.liu@roswellpark.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22253290&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Application of Canonical Correlation Analysis for Identifying Viral Integration Preferences.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22247281</link>
      <description>Publication Date: 2012 Jan 12 PMID: 22247281&lt;br/&gt;Authors: Gumus, E. - Kursun, O. - Sertbas, A. - Ustek, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Gene therapy aims at using viral vectors for attaching helpful genetic code to target genes. Therefore, it is of great importance to develop methods that can discover significant patterns around viral integration sites. Canonical correlation analysis (CCA) is an unsupervised statistical tool that is used to describe the relations between two related views of the same semantic object, which fits well for identifying such salient patterns. RESULTS: Proposed method is demonstrated on a sequence dataset obtained from a study on HIV-1 preferred integration regions. The subsequences on the left and right sides of the integration points are given to the method as the two views, and statistically significant relations are found between sequence-driven features derived from these two views, which suggests that the viral preference must be the factor responsible for this correlation. We found that there are significant correlations at x = 5 indicating a palindromic behavior surrounding the viral integration site, which complies with the previously reported results. AVAILABILITY: Developed software tool is available at http://ce.istanbul.edu.tr/bioinformatics/hiv1/ CONTACT: egumus@istanbul.edu.tr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22247281&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Estimation of Pairwise Sequence Similarity of Mammalian Enhancers with Word Neighbourhood Counts.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22247280</link>
      <description>Publication Date: 2012 Jan 12 PMID: 22247280&lt;br/&gt;Authors: Goke, J. - Schulz, M. H. - Lasserre, J. - Vingron, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide data sets. RESULTS: We present the standardised alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignmentfree measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2.Conclusion: N2 represents an improvement over previous alignment-free similarity measures without compromising speed which makes it a good candidate for large-scale sequence comparison of regulatory sequences. AVAILABILITY: The software is part of the open source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at: http://www.seqan.de/projects/alf.html CONTACT: JG: goeke@molgen.mpg.de; MV: vingron@molgen.mpg.de.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22247280&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Robust Rank Aggregation for gene list integration and meta-analysis.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22247279</link>
      <description>Publication Date: 2012 Jan 12 PMID: 22247279&lt;br/&gt;Authors: Kolde, R. - Laur, S. - Adler, P. - Vilo, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The continued progress in developing technological platforms, availability of many published experimental data sets, as well as different statistical methods to analyze those data have allowed approaching the same research question using various methods simultaneously. To get the best out of all these alternatives we need to integrate their results in an unbiased manner. Prioritized gene lists are a common result presentation method in genomic data analysis applications. Thus the rank aggregation methods can become a useful and general solution for the integration task. RESULTS: Standard rank aggregation methods are often ill-suited for biological settings where the gene lists are inherently noisy. As a remedy we propose a novel robust rank aggregation (RRA) method. Our method detects genes that are ranked consistently better than expected under null hypothesis of uncorrelated inputs and assigns a significance score for each gene. The underlying probabilistic model makes the algorithm parameter free and robust to outliers, noise and errors. Significance scores also provide a rigorous way to keep only the statistically relevant genes in the final list. These properties make our approach robust and compelling for many settings. AVAILABILITY: All the methods are implemented as a GNU R package ROBUSTRANKAGGREG, freely available at the Comprehensive R Archive Network http://cran.r-project.org/. CONTACT: vilo@ut.ee.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22247279&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>nEASE: a Method for Gene Ontology Sub-Classification of High Throughput Gene Expression Data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22247278</link>
      <description>Publication Date: 2012 Jan 13 PMID: 22247278&lt;br/&gt;Authors: Chittenden, T. W. - Howe, E. A. - Taylor, J. M. - Mar, J. C. - Aryee, M. J. - Gomez, H. - Sultana, R. - Braisted, J. - Nair, S. J. - Quackenbush, J. - Holmes, C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: High-throughput technologies can identify genes whose expression profiles correlate with specific phenotypes; however, placing these genes into a biological context remains challenging. To help address this issue, we developed nested Expression Analysis Systematic Explorer (nEASE). nEASE complements traditional gene ontology enrichment approaches by determining statistically enriched gene ontology sub-terms within a list of genes based on co-annotation. Here, we overview an open-source software version of the nEASE algorithm. nEASE can be used either stand-alone or as part of a pathway discovery pipeline. AVAILABILITY: nEASE is implemented within the Multiple Experiment Viewer (MeV) software package available at http://www.tm4.org/mev. CONTACT: cholmes@stats.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22247278&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Mining and Evaluation of Molecular Relationships in Literature.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22247277</link>
      <description>Publication Date: 2012 Jan 13 PMID: 22247277&lt;br/&gt;Authors: Senger, C. - Gruning, B. A. - Erxleben, A. - Doring, K. - Patel, H. - Flemming, S. - Merfort, I. - Gunther, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Specific information on newly-discovered proteins is often difficult to find in literature. Particularly if only sequences and no common names of proteins or genes are available, preceding sequence similarity searches can be crucial for the process of information collection. In drug research it is important to know whether a small molecule targets only one specific protein or whether similar or homologous proteins are also influenced which may account for possible side effects. RESULTS: prolific (protein-literature investigation for interacting compounds) provides a one-step solution to investigate available information on given protein names, sequences, similar proteins or sequences on the gene level. Co-occurrences of UniProtKB/Swiss- Prot proteins and PubChem compounds in all PubMed abstracts are retrievable. Concise &quot;heat-maps&quot; and tables display frequencies of co-occurrences. They provide links to processed literature with highlighted found protein and compound synonyms. Evaluation with manually curated drug-protein relationships showed that up to 69% could be discovered by automatic text-processing. Examples are presented to demonstrate the capabilities of prolific. AVAILABILITY: The web-application is available at http://prolific.pharmaceutical-bioinformatics.de and a web-service at http://www.pharmaceuticalbioinformatics.de/prolific/soap/prolific.wsdl. CONTACT: stefan.guenther@pharmazie.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22247277&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>BOCTOPUS: Improved topology prediction of transmembrane beta barrel proteins.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22247276</link>
      <description>Publication Date: 2012 Jan 13 PMID: 22247276&lt;br/&gt;Authors: Hayat, S. - Elofsson, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Transmembrane beta barrel proteins (TMBs) are found in the outer membrane of gram-negative bacteria, chloroplast and mitochondria. They play a major role in the translocation machinery, pore formation, membrane anchoring and ion exchange. TMBs are also promising targets for antimicrobial drugs and vaccines. Given the difficulty in membrane protein structure determination, computational methods to identify TMBs and predict the topology of TMBs are important. RESULTS: Here, we present BOCTOPUS; an improved method for the topology prediction of TMBs by employing a combination of Support Vector Machines (SVMs) and Hidden Markov Models (HMM). The SVMs and HMMs account for local and global residue preferences, respectively. Based on a 10-fold cross-validation test, BOCTOPUS performs better than all existing methods, reaching a Q3 accuracy of 87%. Further, BOCTOPUS predicted the correct number of strands for 83% proteins in the data set. BOCTOPUS might also help in reliable identification of TMBs by using it as an additional filter to methods specialized in this task. AVAILABILITY: BOCTOPUS is freely available as a web-server at: http://boctopus.cbr.su.se/ The data sets used for training and evaluations are also available from this site. CONTACT: arne@bioinfo.se.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22247276&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Roundup 2.0: Enabling comparative genomics for over 1800 genomes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22247275</link>
      <description>Publication Date: 2012 Jan 13 PMID: 22247275&lt;br/&gt;Authors: Deluca, T. F. - Cui, J. - Jung, J. Y. - St Gabriel, K. C. - Wall, D. P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Roundup is an online database of gene orthologs for over 1800 genomes, including 226 Eukaryota, 1447 Bacteria, 113 Archaea, and 21 Viruses. Orthologs are inferred using the Reciprocal Smallest Distance algorithm. Users may query Roundup for single-linkage clusters of orthologous genes based on any group of genomes. Annotated query results may be viewed in a variety of ways including as clusters of orthologs and as phylogenetic profiles. Genomic results may be downloaded in formats suitable for functional as well as phylogenetic analysis, including the recent OrthoXML standard. In addition, gene IDs can be retrieved using FASTA sequence search. All orthology results and source code are freely available. AVAILABILITY: http://roundup.hms.harvard.edu CONTACT: Dr. Dennis P. Wall: dpwall@hms.harvard.edu; Todd F. DeLuca: todd_deluca@hms.harvard.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22247275&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Scalable and Portable Framework For Massively Parallel Variable Selection in Genetic Association Studies.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238272</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238272&lt;br/&gt;Authors: Chen, G. K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The deluge of data emerging from high throughput sequencing technologies poses large large analytical challenges when testing for association to disease. We introduce a scalable framework for variable selection, implemented in C++ and OpenCL, that fits regularized regression across multiple Graphics Processing Units (GPUs). Open source code and documentation can be found at a Google Code repository under the URL http://code.google.com/p/parallellasso/. CONTACT: gary.k.chen@usc.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238272&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Surrogate Variable Analysis Using Partial Least Squares (SVA-PLS) in Gene Expression Studies.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238271</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238271&lt;br/&gt;Authors: Chakraborty, S. - Datta, S. - Datta, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard ANOVA/regression is implemented to identify the relative effects of these genes over the two types of samples from their respective arrays of expression levels. But, this technique becomes fundamentally flawed when there are unaccounted sources of variability in these arrays (latent variables attributable to different biological, environmental or other factors relevant in the context). These factors distort the true picture of differential gene expression between the two tissue types and introduce spurious signals of expression heterogeneity. As a result many genes which are actually differentially expressed are not detected, whereas many others are falsely identified as positives. Moreover, these distortions can be different for different genes. Thus, it is also not possible to get rid of these variations by simple array normalizations. This both-way error can lead to a serious loss in sensitivity and specificity, thereby causing a severe inefficiency in the underlying multiple testing problem. In this work, we attempt to identify the hidden effects of the underlying latent factors in a gene-expression profiling study by Partial Least Squares (PLS) and apply ANCOVA technique with the PLS-identified signatures of these hidden effects as covariates, in order to identify the genes that are truly differentially expressed between the two concerned tissue types. RESULTS: We compare the performance of our method SVA-PLS with standard ANOVA and a relatively recent technique of surrogate variable analysis (SVA), on a wide variety of simulation settings (incorporating different effects of the hidden variable, under situations with varying signal intensities and gene groupings). In all settings, our method yields the highest sensitivity while maintaining relatively reasonable values for the specificity, False Discovery Rate (FDR) and False Non-Discovery Rate (FNR). Application of our method to gene-expression profiling for Acute Megakaryoblastic Leukemia (AMKL) shows that our method detects an additional six genes, that are missed by both the standard ANOVA method as well as SVA, but may be relevant to this disease, as can be seen from mining the existing literature. AVAILABILITY: The R code for our method, SVA-PLS, is freely available on the supplementary website http://www.somnathdatta.org/Supp/SVPLS/ CONTACT: s0chak10@louisville.edu; susmita.datta@louisville.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238271&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>BESC knowledgebase public portal.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238270</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238270&lt;br/&gt;Authors: Syed, M. H. - Karpinets, T. V. - Parang, M. - Leuze, M. R. - Park, B. H. - Hyatt, D. - Brown, S. D. - Moulton, S. - Galloway, M. D. - Uberbacher, E. C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;The BioEnergy Science Center (BESC) is undertaking large experimental campaigns to understand the biosynthesis and biodegradation of biomass and to develop biofuel solutions. BESC is generating large volumes of diverse data, including genome sequences, omics data, and assay results. The purpose of the BESC Knowledgebase is to serve as a centralized repository for experimentally generated data and to provide an integrated, interactive, and user-friendly analysis framework. The Portal makes available tools for visualization, integration and analysis of data either produced by BESC or obtained from external resources. AVAILABILITY: http://besckb.ornl.gov CONTACT: syedmh@ornl.gov.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238270&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GROMACS Molecule &amp; Liquid Database.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238269</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238269&lt;br/&gt;Authors: van der Spoel, D. - van Maaren, P. J. - Caleman, C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The molecular dynamics simulation package GROMACS (Hess et al., 2008) is a widely used tool used in a broad range of different applications within physics, chemistry and biology. It is freely available, user friendly and extremely efficient. The GROMACS software is force field agnostic, and compatible with many molecular dynamics force fields; coarse-grained, unified atom, all atom as well as polarizable models based on the charge on a spring concept. To validate simulations it is necessary to compare results from the simulations to experimental data. To ease the process of setting up topologies and structures for simulations, as well as providing pre-calculated physical properties along with experimental values for the same we provide a web based database, containing 145 organic molecules at present. RESULTS: Liquid properties of 145 organic molecules have been simulated using two different force fields, OPLS all atom (Jorgensen &amp; Tirado-Rives, 2005) and Generalized Amber Force Field (GAFF) (Wang et al., 2004). So far, eight properties have been calculated (the density, enthalpy of vaporization, surface tension, heat capacity at constant volume and pressure, isothermal compressibility, volumetric expansion coefficient and the static dielectric constant.) The results, together with experimental values are available through the database, along with liquid structures and topologies for the 145 molecules, in the two force fields. AVAILABILITY: The database is freely available under http://virtualchemistry.org. CONTACT: spoel@xray.bmc.uu.se, carl.caleman@cfel.de.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238269&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Predicting folding free energy changes upon single point mutations.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238268</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238268&lt;br/&gt;Authors: Zhang, Z. - Wang, L. - Gao, D. - Zhang, J. - Zhenirovskyy, M. - Alexov, E.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The folding free energy is an important characteristic of proteins stability and is directly related to protein's wild type function. The changes of protein's stability due to naturally occurring mutations, missense mutations, are typically causing diseases. Single point mutations made in vitro are frequently used to assess the contribution of given amino acid to the stability of the protein. In both cases, it is desirable to predict the change of the folding free energy upon single point mutations in order to either provide insights of the molecular mechanism of the change or to design new experimental studies. RESULTS: We report an approach which predicts the free energy change upon single point mutation by utilizing the 3D structure of the wild type protein. It is based on variation of the molecular mechanics Generalized Born (MMGB) method, scaled with optimized parameters (sMMGB) and utilizing specific model of unfolded state. The corresponding mutations are built in silico and the predictions are tested against large dataset of 1109 mutations with experimentally measured changes of the folding free energy. Benchmarking resulted in RMSD = 1.78 kcal/mol and slope of the linear regression fit between the experimental data and the calculations was 1.04. The sMMGB is compared with other leading methods of predicting folding free energy changes upon single mutations and results discussed with respect to various parameters. AVAILABILITY: All the pdb files we used in this paper can be downloaded from http://compbio.clemson.edu/downloadDir/mentaldisorders/sMMGB_pdb.rar SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: ealexov@clemson.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238268&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>De novo motif discovery facilitates identification of interactions between transcription factors in Saccharomyces cerevisiae.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238267</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238267&lt;br/&gt;Authors: Chen, M. J. - Chou, L. C. - Hsieh, T. T. - Lee, D. D. - Liu, K. W. - Yu, C. Y. - Oyang, Y. J. - Tsai, H. K. - Chen, C. Y.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Gene regulation involves complicated mechanisms such as cooperativity between a set of transcription factors (TFs). Previous studies have used target genes shared by two TFs as a clue to infer TF-TF interactions. However, this task remains challenging because the target genes with low binding affinity are frequently omitted by experimental data, especially when a single strict threshold is employed. This paper aims at improving the accuracy of inferring TF-TF interactions by incorporating motif discovery as a fundamental step when detecting overlapping targets of TFs based on ChIP-chip data. RESULTS: The proposed method, simTFBS, outperforms three naive methods that adopt fixed thresholds when inferring TF-TF interactions based on ChIP-chip data. In addition, simTFBS is compared with two advanced methods and demonstrates its advantages in predicting TF-TF interactions. By comparing simTFBS with predictions based on the set of available annotated yeast TF binding motifs, we demonstrate that the good performance of simTFBS is indeed coming from the additional motifs found by the proposed procedures. CONTACT: hktsai@iis.sinica.edu.tw; chienyuchen@ntu.edu.tw SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238267&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238266</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238266&lt;br/&gt;Authors: Emde, A. K. - Schulz, M. H. - Weese, D. - Sun, R. - Vingron, M. - Kalscheuer, V. M. - Haas, S. A. - Reinert, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map. RESULTS: Here we present a method for &quot;split&quot; read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared to alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant. AVAILABILITY: SplazerS is available from http://www.seqan.de/projects/splazers. CONTACT: emde@inf.f-berlin.de.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238266&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Estimating Abundances of Retroviral Insertion Sites from DNA Fragment Length Data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238265</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238265&lt;br/&gt;Authors: Berry, C. C. - Gillet, N. A. - Melamed, A. - Gormley, N. - Bangham, C. R. - Bushman, F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The relative abundance of retroviral insertions in a host genome is important in understanding the persistence and pathogenesis of both natural retroviral infections and retroviral gene therapy vectors. It could be estimated from a sample of cells if only the host genomic sites of retroviral insertions could be directly counted.When host genomic DNA is randomly broken via sonication and then amplified, amplicons of varying lengths are produced. The number of unique lengths of amplicons of an insertion site tends to increase according to its abundance, providing a basis for estimating relative abundance. However, as abundance increases amplicons of the same length arise by chance leading to a non-linear relation between the number of unique lengths and relative abundance. The difficulty in calibrating this relation is compounded by sample specific variations in the relative frequencies of clones of each length. RESULTS: A likelihood function is proposed for the discrete lengths observed in each of a collection of insertion sites and is maximized with a hybrid Expectation-Maximization algorithm. Patient data illustrate the method and simulations show that relative abundance can be estimated with little bias, but that variation in highly abundant sites can be large.In replicated patient samples, variation exceeds what the model implies - requiring adjustment as in Efron (2004) or using jackknife standard errors. Consequently, it is advantageous to collect replicate samples to strengthen inferences about relative abundance. AVAILABILITY: An R package implements the algorithm described here. It is available at http://soniclength.r-forge.r-project.org/ CONTACT: ccberry@ucsd.edu SUPPLEMENTARY INFORMATION: An on-line Supplement is available at BioInformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238265&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Interactome-Transcriptome integration for predicting distant metastasis in breast cancer.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238264</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238264&lt;br/&gt;Authors: Garcia, M. - Millat-Carus, R. - Bertucci, F. - Finetti, P. - Birnbaum, D. - Bidaut, G.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: High-throughput gene-expression profiling yields genomic signatures that allow the prediction of clinical conditions including patient outcome. However, these signatures have limitations, such as dependency on the training set, and worse, lack of generalization. RESULTS: We propose a novel algorithm called ITI (Interactome-Transcriptome Integration), to extract a genomic signature predicting distant metastasis in breast cancer by superimposition of large-scale protein-protein interaction data over a compendium of several gene-expression data sets. Training on two different compendia showed that the estrogen receptor-specific signatures obtained are more stable (11-35% stability), can be generalized on independent data, and performs better than previously published methods (53-74% accuracy).Availability and SUPPLEMENTARY INFORMATION: The ITI algorithm source code and supplementary material from analysis are available under CeCILL from the ITI companion web site: http://bioinformatique.marseille.inserm.fr/iti. CONTACT: maxime.garcia@inserm.fr, ghislain.bidaut@inserm.fr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238264&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MetExtract: A new software tool for the automated comprehensive extraction of metabolite-derived LC/MS signals in metabolomics research.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238263</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238263&lt;br/&gt;Authors: Bueschl, C. - Kluger, B. - Berthiller, F. - Lirk, G. - Winkler, S. - Krska, R. - Schuhmacher, R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Liquid chromatography - mass spectrometry (LC/MS) is a key technique in metabolomics. Since the efficient assignment of MS signals to true biological metabolites becomes feasible in combination with in vivo stable isotopic labelling, our aim was to provide a new software tool for this purpose. RESULTS: An algorithm and a program (MetExtract) have been developed to search for metabolites in in vivo labelled biological samples. The algorithm makes use of the chromatographic characteristics of the LC/MS data and detects MS peaks fulfilling the criteria of stable isotopic labelling. As a result of all calculations, the algorithm specifies a list of m/z values, the corresponding number of atoms of the labelling element (e.g. carbon) together with retention time and extracted adduct-, fragment-, and polymer ions. Its function was evaluated using native (12)C- and uniformly (13)C-labelled standard substances. AVAILABILITY: MetExtract is available free of charge and warranty at http://code.google.com/p/metextract/. Precompiled executables are available for Windows operating systems. CONTACT: rainer.schuhmacher@boku.ac.at.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238263&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>The evolution of nitrogen fixation in cyanobacteria.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238262</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238262&lt;br/&gt;Authors: Latysheva, N. - Junker, V. L. - Palmer, W. J. - Codd, G. A. - Barker, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Fixed nitrogen is an essential requirement for the biosynthesis of cellular nitrogenous compounds. Some cyanobacteria can fix nitrogen, contributing significantly to the nitrogen cycle, agriculture, and biogeochemical history of Earth. The rate and position on the species phylogeny of gains and losses of this ability, as well as of the underlying nif genes, are controversial. RESULTS: We use probabilistic models of trait evolution to investigate the presence and absence of cyanobacterial nitrogen-fixing ability. We estimate rates of change on the species phylogeny, pinpoint probable changes and reconstruct the state and nif gene complement of the ancestor. Our results are consistent with a nitrogen-fixing cyanobacterial ancestor, repeated loss of nitrogen fixation and vertical descent, with little horizontal transfer of the genes involved. CONTACT: db60@st-andrews.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238262&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SSuMMo: Rapid analysis, comparison and visualisation of microbial communities.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238261</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238261&lt;br/&gt;Authors: Leach, A. L. - Chong, J. P. - Redeker, K. R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Next generation sequencing methods are generating increasingly massive datasets, yet still do not fully capture genetic diversity in the richest environments. To understand such complicated and elusive systems, effective tools are needed to assist with delineating the differences found in and between community datasets. RESULTS: The Small Subunit Markov Modeler (SSuMMo) was developed to probabilistically assign SSU rRNA gene fragments from any sequence dataset to recognised taxonomic clades, producing consistent, comparable cladograms. Accuracy tests predicted over 90% of genera correctly for sequences downloaded from public reference databases. Sequences from a next generation sequence dataset, sampled from lean, overweight and obese individuals, were analyzed to demonstrate parallel visualisation of comparable datasets. SSuMMo shows potential as a valuable curatorial tool, as numerous incorrect and outdated taxonomic entries and annotations were identified in public databases.Availability and Implementation: SSuMMo is GPLv3 open source Python software, available at http://code.google.com/p/ssummo/. Taxonomy and HMM databases can be downloaded from http://bioltfws1.york.ac.uk/ssummo/. CONTACT: albl500@york.ac.uk SUPPLEMENTARY INFORMATION: Supplemental materials are available at Bioinformatics Online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238261&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>PHACTS, a computational approach to classifying the lifestyle of phages.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238260</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238260&lt;br/&gt;Authors: McNair, K. - Bailey, B. A. - Edwards, R. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Bacteriophages have two distinct lifestyles: virulent and temperate. The virulent lifestyle has many implications for phage therapy, genomics, and microbiology. Determining which lifestyle a newly sequenced phage falls into is currently determined using standard culturing techniques. Such laboratory work is not only costly and time consuming, but also cannot be used on phage genomes constructed from environmental sequencing. Therefore a computational method that utilizes the sequence data of phage genomes is needed. RESULTS: PHACTS utilizes a novel similarity algorithm and a supervised Random Forest classifier to make a prediction whether the lifestyle of a phage, described by its proteome, is virulent or temperate. The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage. PHACTS predictions are shown to have a 99% precision rate.Availability and Implementation: PHACTS was implemented in the PERL programming language and utilizes the FASTA program [29] and the R programming language library &quot;Random Forest&quot; [30]. The PHACTS software is open source and is available as downloadable stand-alone version or can be accessed online as a user-friendly web interface. The source code, help files and online version are available at http://www.phantome.org/PHACTS/. CONTACT: katelyn@rohan.sdsu.edu*, redwards@sciences.sdsu.edu*&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238260&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>RIBER/DIBER: a software suite for crystal content analysis in the studies of protein-nucleic acid complexes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238259</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238259&lt;br/&gt;Authors: Chojnowski, G. - Bujnicki, J. M. - Bochtler, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Co-crystallization experiments of proteins with nucleic acids do not guarantee that both components are present in the crystal. We have previously developed DIBER to predict crystal content when protein and DNA are present in the crystallization mix. Here, we present RIBER, which should be used when protein and RNA are in the crystallization drop. The combined RIBER/DIBER suite builds on machine learning techniques to make reliable, quantitative predictions of crystal content for non-expert users and high throughput crystallography.Availability and Implementation: The program source code, Linux binaries and a webserver are available at http://diber.iimcb.gov.pl/. RIBER/DIBER requires diffraction data to at least 3.0 A resolution in MTZ or CIF (webserver only) format. The program is written in C/C++ and relies on the CCP4 (Winn, et al., 2011) and Clipper (Cowtan, 2003) libraries for handling diffraction data. The LIBSVM (Chang and Lin, 2011) library is used for decision making. The RIBER/DIBER code is subject to the GNU Public License. CONTACT: gchojnowski@genesilico.pl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238259&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>IMID: Integrated molecular interaction database.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238258</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238258&lt;br/&gt;Authors: Balaji, S. - McClendon, C. - Chowdhary, R. - Liu, J. S. - Zhang, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Molecular interaction information, such as protein-protein interactions and protein-small molecule interactions, is indispensable for understanding the mechanism of biological processes and discovering treatments for diseases. Many databases have been built by manual annotation of literature to organize such information into structured form. However, most databases focus on only one type of interactions, which are often not well annotated and integrated with related functional information. RESULTS: In this study, we integrate molecular interaction information from literature by automatic information extraction and from manually annotated databases. We further integrate the relationships between protein/gene and other bio-entity terms including gene ontology (GO) terms, pathways, species and diseases to build an integrated molecular interaction database (IMID). Interactions can be selected by their associated probabilities. IMID allows complex and versatile queries for context-specific molecular interactions, which are not available currently in other molecular interaction databases. AVAILABILITY: The database is located at www.integrativebiology.org. CONTACT: jinfeng@stat.fsu.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238258&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>CHROMATRA: a Galaxy tool for visualizing genome-wide chromatin signatures.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22238257</link>
      <description>Publication Date: 2012 Jan 11 PMID: 22238257&lt;br/&gt;Authors: Hentrich, T. - Schulze, J. M. - Emberly, E. - Kobor, M. S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: CHROMATRA (CHROmatin Mapping Across TRAnscripts) is a visualization tool available as plug-in for the Galaxy platform. It allows detailed yet concise presentations of data derived from ChIP-chip or ChIP-seq experiments by visualizing enrichment scores across genes or other genomic features while accounting for their length and additional characteristics such as gene expression. It integrates into typical analysis workflows and enables rapid graphical assessment and comparison of genome-wide data at a glance. AVAILABILITY: https://github.com/cmmt/chromatra CONTACT: MSK (msk@cmmt.ubc.ca), EE (eemberly@sfu.ca).&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22238257&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A method to infer positive selection from marker dynamics in an asexual population.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22223745</link>
      <description>Publication Date: 2012 Jan 5 PMID: 22223745&lt;br/&gt;Authors: Illingworth, C. J. - Mustonen, V.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The observation of positive selection acting on a mutant indicates that the mutation being selected for has some form of functional relevance. Determining the fitness effects of mutations thus has relevance to many interesting biological questions. One means of identifying beneficial mutations in an asexual population is to observe changes in the frequency of marked subsets of the population. We here describe a method to estimate the establishment times and fitnesses of beneficial mutations from neutral marker frequency data. RESULTS: The method accurately reproduces complex marker frequency trajectories. In simulations for which positive selection is close to 5% per generation, we obtain correlations upwards of 0.91 between correct and inferred haplotype establishment times. Where mutation selection coefficients are exponentially distributed, the inferred distribution of haplotype fitnesses is close to being correct. Applied to data from a bacterial evolution experiment, our method reproduces an observed correlation between evolvability, and initial fitness defect. AVAILABILITY: A C++ implementation of the inference tool is available under GNU GPL license (ftp://ftp.sanger.ac.uk/pub/team153). CONTACT: vm5@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22223745&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Minimal cut sets in a metabolic network are elementary modes in a dual network.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22190691</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22190691&lt;br/&gt;Authors: Ballerstein, K. - von Kamp, A. - Klamt, S. - Haus, U. U.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Elementary modes (EMs) and minimal cut sets (MCSs) provide important techniques for metabolic network modeling. Whereas EMs describe minimal subnetworks that can function in steady state, MCSs are sets of reactions whose removal will disable certain network functions. Effective algorithms were developed for EM computation while calculation of MCSs is typically addressed by indirect methods requiring the computation of EMs as initial step. RESULTS: In this contribution, we provide a method that determines MCSs directly without calculating the EMs. We introduce a duality framework for metabolic networks where the enumeration of MCSs in the original network is reduced to identifying the EMs in a dual network. As a further extension, we propose a generalization of MCSs in metabolic networks by allowing the combination of inhomogeneous constraints on reaction rates. This framework provides a promising tool to open the concept of EMs and MCSs to a wider class of applications. CONTACT: utz-uwe.haus@math.ethz.ch; klamt@mpi-magdeburg.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22190691&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>The Virtual Fly Brain browser and query interface.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22180411</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22180411&lt;br/&gt;Authors: Milyaev, N. - Osumi-Sutherland, D. - Reeve, S. - Burton, N. - Baldock, R. A. - Armstrong, J. D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Sources of neuroscience data in Drosophila are diverse and disparate making integrated search and retrieval difficult. A major obstacle to this is the lack of a comprehensive and logically structured anatomical framework and an intuitive interface. RESULTS: We present an online resource that provides a convenient way to study and query fly brain anatomy, expression and genetic data. We extended the newly developed BrainName nomenclature for the adult fly brain into a logically structured ontology that relates a comprehensive set of published neuron classes to the brain regions they innervate. The Virtual Fly Brain interface allows users to explore the structure of the Drosophila brain by browsing 3D images of a brain with subregions displayed as coloured overlays. An integrated query mechanism allows complex searches of underlying anatomy, cells, expression and other data from community databases. AVAILABILITY: Virtual Fly Brain is freely available online at www.virtualflybrain.org CONTACT: jda@inf.ed.ac.uk.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22180411&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Novel search method for the discovery of functional relationships.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22180409</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22180409&lt;br/&gt;Authors: Ramirez, F. - Lawyer, G. - Albrecht, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Numerous annotations are available that functionally characterize genes and proteins with regard to molecular process, cellular localization, tissue expression, protein domain composition, protein interaction, disease association and other properties. Searching this steadily growing amount of information can lead to the discovery of new biological relationships between genes and proteins. To facilitate the searches, methods are required that measure the annotation similarity of genes and proteins. However, most current similarity methods are focused only on annotations from the Gene Ontology (GO) and do not take other annotation sources into account. RESULTS: We introduce the new method BioSim that incorporates multiple sources of annotations to quantify the functional similarity of genes and proteins. We compared the performance of our method with four other well-known methods adapted to use multiple annotation sources. We evaluated the methods by searching for known functional relationships using annotations based only on GO or on our large data warehouse BioMyn. This warehouse integrates many diverse annotation sources of human genes and proteins. We observed that the search performance improved substantially for almost all methods when multiple annotation sources were included. In particular, our method outperformed the other methods in terms of recall and average precision. CONTACT: mario.albrecht@mpi-inf.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22180409&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MTBindingSim: simulate protein binding to microtubules.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22171336</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22171336&lt;br/&gt;Authors: Philip, J. T. - Pence, C. H. - Goodson, H. V.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Many protein-protein interactions are more complex than can be accounted for by 1:1 binding models. However, biochemists have few tools available to help them recognize and predict the behaviors of these more complicated systems, making it difficult to design experiments that distinguish between possible binding models. MTBindingSim provides researchers with an environment in which they can rapidly compare different models of binding for a given scenario. It is written specifically with microtubule polymers in mind, but many of its models apply equally well to any polymer or any protein-protein interaction. MTBindingSim can thus both help in training intuition about binding models and with experimental design. Availability and implementation: MTBindingSim is implemented in MATLAB and runs either within MATLAB (on Windows, Mac or Linux) or as a binary without MATLAB (on Windows or Mac). The source code (licensed under the GNU General Public License) and binaries are freely available at http://mtbindingsim.googlecode.com. CONTACT: jphilip@nd.edu; cpence@nd.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22171336&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Detection of microRNAs in color space.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22171334</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22171334&lt;br/&gt;Authors: Marco, A. - Griffiths-Jones, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. RESULTS: Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. Availability and implementation: A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ Contact: antonio.marco@manchester.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22171334&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>HydroPaCe: understanding and predicting cross-inhibition in serine proteases through hydrophobic patch centroids.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22171332</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22171332&lt;br/&gt;Authors: Goncalves-Almeida, V. M. - Pires, D. E. - de Melo-Minardi, R. C. - da Silveira, C. H. - Meira, W. - Santoro, M. M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Protein-protein interfaces contain important information about molecular recognition. The discovery of conserved patterns is essential for understanding how substrates and inhibitors are bound and for predicting molecular binding. When an inhibitor binds to different enzymes (e.g. dissimilar sequences, structures or mechanisms what we call cross-inhibition), identification of invariants is a difficult task for which traditional methods may fail. RESULTS: To clarify how cross-inhibition happens, we model the problem, propose and evaluate a methodology called HydroPaCe to detect conserved patterns. Interfaces are modeled as graphs of atomic apolar interactions and hydrophobic patches are computed and summarized by centroids (HP-centroids), and their conservation is detected. Despite sequence and structure dissimilarity, our method achieves an appropriate level of abstraction to obtain invariant properties in cross-inhibition. We show examples in which HP-centroids successfully predicted enzymes that could be inhibited by the studied inhibitors according to BRENDA database. AVAILABILITY: www.dcc.ufmg.br/~raquelcm/hydropace CONTACT: valdetemg@ufmg.br; raquelcm@dcc.ufmg.br; santoro@icb.ufmg.br SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22171332&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SitePainter: a tool for exploring biogeographical patterns.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22171330</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22171330&lt;br/&gt;Authors: Gonzalez, A. - Stombaugh, J. - Lauber, C. L. - Fierer, N. - Knight, R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;As microbial ecologists take advantage of high-throughput analytical techniques to describe microbial communities across ever-increasing numbers of samples, the need for new analysis tools that reveal the intrinsic spatial patterns and structures of these populations is crucial. Here we present SitePainter, an interactive graphical tool that allows investigators to create or upload pictures of their study site, load diversity analyses data and display both diversity and taxonomy results in a spatial context. Features of SitePainter include: visualizing alpha -diversity, using taxonomic summaries; visualizing beta -diversity, using results from multidimensional scaling methods; and animating relationships among microbial taxa or pathways overtime. SitePainter thus increases the visual power and ability to explore spatially explicit studies. AVAILABILITY: https://sourceforge.net/projects/sitepainter SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: antoniog@colorado.edu, Rob.Knight@colorado.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22171330&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>seeQTL: a searchable database for human eQTLs.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22171328</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22171328&lt;br/&gt;Authors: Xia, K. - Shabalin, A. A. - Huang, S. - Madar, V. - Zhou, Y. H. - Wang, W. - Zou, F. - Sun, W. - Sullivan, P. F. - Wright, F. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: seeQTL is a comprehensive and versatile eQTL database, including various eQTL studies and a meta-analysis of HapMap eQTL information. The database presents eQTL association results in a convenient browser, using both segmented local-association plots and genome-wide Manhattan plots. Availability and implementation: seeQTL is freely available for non-commercial use at http://www.bios.unc.edu/research/genomic_software/seeQTL/. CONTACT: fred_wright@unc.edu; kxia@bios.unc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22171328&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>M3: an improved SNP calling algorithm for Illumina BeadArray data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155947</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155947&lt;br/&gt;Authors: Li, G. - Gelernter, J. - Kranzler, H. R. - Zhao, H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Genotype calling from high-throughput platforms such as Illumina and Affymetrix is a critical step in data processing, so that accurate information on genetic variants can be obtained for phenotype-genotype association studies. A number of algorithms have been developed to infer genotypes from data generated through the Illumina BeadStation platform, including GenCall, GenoSNP, Illuminus and CRLMM. Most of these algorithms are built on population-based statistical models to genotype every SNP in turn, such as GenCall with the GenTrain clustering algorithm, and require a large reference population to perform well. These approaches may not work well for rare variants where only a small proportion of the individuals carry the variant. A fundamentally different approach, implemented in GenoSNP, adopts a single nucleotide polymorphism (SNP)-based model to infer genotypes of all the SNPs in one individual, making it an appealing alternative to call rare variants. However, compared to the population-based strategies, more SNPs in GenoSNP may fail the Hardy-Weinberg Equilibrium test. To take advantage of both strategies, we propose a two-stage SNP calling procedure, named the modified mixture model (M(3)), to improve call accuracy for both common and rare variants. The effectiveness of our approach is demonstrated through applications to genotype calling on a set of HapMap samples used for quality control purpose in a large case-control study of cocaine dependence. The increase in power with M(3) is greater for rare variants than for common variants depending on the model. AVAILABILITY: M(3) algorithm: http://bioinformatics.med.yale.edu/group. CONTACT: name@bio.com; hongyu.zhao@yale.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155947&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GWAtoolbox: an R package for fast quality control and handling of genome-wide association studies meta-analysis data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155946</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155946&lt;br/&gt;Authors: Fuchsberger, C. - Taliun, D. - Pramstaller, P. P. - Pattaro, C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The GWAtoolbox is an R package that standardizes and accelerates the handling of data from genome-wide association studies (GWAS), particularly in the context of large-scale GWAS meta-analyses. A key feature of GWAtoolbox is its ability to perform quality control (QC) of any number of files in a matter of minutes. The implemented workflow has been structured to check three particular data quality aspects: (i) data formatting, (ii) quality of the GWAS results and (iii) data consistency across studies. Output consists of an extensive list of quality statistics and plots which allow inspection of individual files and between-study comparison to identify systematic bias. AVAILABILITY: http://www.eurac.edu/GWAtoolbox CONTACT: cfuchsb@umich.edu; daniel.taliun@eurac.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155946&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SomaticSniper: identification of somatic point mutations in whole genome sequencing data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155872</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155872&lt;br/&gt;Authors: Larson, D. E. - Harris, C. C. - Chen, K. - Koboldt, D. C. - Abbott, T. E. - Dooling, D. J. - Ley, T. J. - Mardis, E. R. - Wilson, R. K. - Ding, L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample. RESULTS: In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. Availability and implementation: Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X. CONTACT: delarson@wustl.edu; lding@wustl.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155872&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155871</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155871&lt;br/&gt;Authors: Xi, Y. - Bock, C. - Muller, F. - Sun, D. - Meissner, A. - Li, W.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Reduced representation bisulfite sequencing (RRBS) is a powerful yet cost-efficient method for studying DNA methylation on a genomic scale. RRBS involves restriction-enzyme digestion, bisulfite conversion and size selection, resulting in DNA sequencing data that require special bioinformatic handling. Here, we describe RRBSMAP, a short-read alignment tool that is designed for handling RRBS data in a user-friendly and scalable way. RRBSMAP uses wildcard alignment, and avoids the need for any preprocessing or post-processing steps. We benchmarked RRBSMAP against a well-validated MAQ-based pipeline for RRBS read alignment and observed similar accuracy but much improved runtime performance, easier handling and better scaling to large sample sets. In summary, RRBSMAP removes bioinformatic hurdles and reduces the computational burden of large-scale epigenome association studies performed with RRBS. AVAILABILITY: http://rrbsmap.computational-epigenetics.org/ http://code.google.com/p/bsmap/ CONTACT: wl1@bcm.tmc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155871&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155870</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155870&lt;br/&gt;Authors: Boeva, V. - Popova, T. - Bleakley, K. - Chiche, P. - Cappo, J. - Schleiermacher, G. - Janoueix-Lerosey, I. - Delattre, O. - Barillot, E.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: More and more cancer studies use next-generation sequencing (NGS) data to detect various types of genomic variation. However, even when researchers have such data at hand, single-nucleotide polymorphism arrays have been considered necessary to assess copy number alterations and especially loss of heterozygosity (LOH). Here, we present the tool Control-FREEC that enables automatic calculation of copy number and allelic content profiles from NGS data, and consequently predicts regions of genomic alteration such as gains, losses and LOH. Taking as input aligned reads, Control-FREEC constructs copy number and B-allele frequency profiles. The profiles are then normalized, segmented and analyzed in order to assign genotype status (copy number and allelic content) to each genomic region. When a matched normal sample is provided, Control-FREEC discriminates somatic from germline events. Control-FREEC is able to analyze overdiploid tumor samples and samples contaminated by normal cells. Low mappability regions can be excluded from the analysis using provided mappability tracks. AVAILABILITY: C++ source code is available at: http://bioinfo.curie.fr/projects/freec/ CONTACT: freec@curie.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155870&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Identification and removal of ribosomal RNA sequences from metatranscriptomes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155869</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155869&lt;br/&gt;Authors: Schmieder, R. - Lim, Y. W. - Edwards, R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Here, we present riboPicker, a robust framework for the rapid, automated identification and removal of ribosomal RNA sequences from metatranscriptomic datasets. The results can be exported for subsequent analysis, and the databases used for the web-based version are updated on a regular basis. riboPicker categorizes rRNA-like sequences and provides graphical visualizations and tabular outputs of ribosomal coverage, alignment results and taxonomic classifications. Availability and implementation: This open-source application was implemented in Perl and can be used as stand-alone version or accessed online through a user-friendly web interface. The source code, user help and additional information is available at http://ribopicker.sourceforge.net/. CONTACT: rschmied@sciences.sdsu.edu; rschmied@sciences.sdsu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155869&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GenomeRunner: automating genome exploration.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155868</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155868&lt;br/&gt;Authors: Dozmorov, M. G. - Cara, L. R. - Giles, C. B. - Wren, J. D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: One of the challenges in interpreting high-throughput genomic studies such as a genome-wide associations, microarray or ChIP-seq is their open-ended nature-once a set of experimentally identified regions is identified as statistically significant, at least two questions arise: (i) besides P-value, do any of these significant regions stand out in terms of biological implications? (ii) Does the set of significant regions, as a whole, have anything in common genome wide? These issues are difficult to address because of the growing number of annotated genomic features (e.g. single nucleotide polymorphisms, transcription factor binding sites, methylation peaks, etc.), and it is difficult to know a priori which features would be most fruitful to analyze. Our goal is to provide partial automation of this process to begin examining associations between experimental features and annotated genomic regions in a hypothesis-free, data-driven manner. RESULTS: We created GenomeRunner-a tool for automating annotation and enrichment of genomic features of interest (FOI) with annotated genomic features (GFs), in different organisms. Besides simple association of FOIs with known GFs GenomeRunner tests whether the enriched FOIs, as a group, are statistically associated with a large and growing set of genomic features. AVAILABILITY: GenomeRunner setup files and source code are freely available at http://sourceforge.net/projects/genomerunner. CONTACT: mikhail-dozmorov@omrf.org; Jonathan-Wren@omrf.org; jdwren@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155868&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155867</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22155867&lt;br/&gt;Authors: Wang, H. - Nie, F. - Huang, H. - Kim, S. - Nho, K. - Risacher, S. L. - Saykin, A. J. - Shen, L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the influence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation. RESULTS: Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer's disease. Built upon regression analysis, our model uses a new form of regularization, group l(2,1)-norm (G(2,1)-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G(2,1)-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an l(2,1)-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs. AVAILABILITY: Software is publicly available at: http://ranger.uta.edu/%7eheng/imaging-genetics/ CONTACT: heng@uta.edu; shenli@iupui.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155867&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>InFiRe -- a novel computational method for the identification of insertion sites in transposon mutagenized bacterial genomes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155866</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155866&lt;br/&gt;Authors: Shevchuk, O. - Roselius, L. - Gunther, G. - Klein, J. - Jahn, D. - Steinert, M. - Munch, R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: InFiRe, Insertion Finder via Restriction digest, is a novel software tool that allows for the computational identification of transposon insertion sites in known bacterial genome sequences after transposon mutagenesis experiments. The approach is based on the fact that restriction endonuclease digestions of bacterial DNA yield a unique pattern of DNA fragments with defined sizes. Transposon insertion changes the size of the hosting DNA fragment by a known number of base pairs. The exact size of this fragment can be determined by Southern blot hybridization. Subsequently, the position of insertion can be identified with computational analysis. The outlined method provides a solid basis for the establishment of a new high-throughput technology. Availability and implementation: The software is freely available on our web server at www.infire.tu-bs.de. The algorithm was implemented in the statistical programming language R. For the most flexible use, InFiRe is provided in two different versions. A web interface offers the convenient use in a web browser. In addition, the software and source code is freely available for download as R-packages on our website. CONTACT: m.steinert@tu-bs.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155866&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>B-SOLANA: an approach for the analysis of two-base encoding bisulfite sequencing data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155865</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155865&lt;br/&gt;Authors: Kreck, B. - Marnellos, G. - Richter, J. - Krueger, F. - Siebert, R. - Franke, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Bisulfite sequencing, a combination of bisulfite treatment and high-throughput sequencing, has proved to be a valuable method for measuring DNA methylation at single base resolution. Here, we present B-SOLANA, an approach for the analysis of two-base encoding (colorspace) bisulfite sequencing data on the SOLiD platform of Life Technologies. It includes the alignment of bisulfite sequences and the determination of methylation levels in CpG as well as non-CpG sequence contexts. B-SOLANA enables a fast and accurate analysis of large raw sequence datasets. Availability and implementation: The source code, released under the GNU GPLv3 licence, is freely available at http://code.google.com/p/bsolana/. CONTACT: b.kreck@ikmb.uni-kiel.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155865&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A simple statistical test to infer the causality of target/phenotype correlation from small molecule phenotypic screens.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155864</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22155864&lt;br/&gt;Authors: Wei, X. - Hoffman, A. F. - Hamilton, S. M. - Xiang, Q. - He, Y. - So, W. V. - So, S. S. - Mark, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Cell-based phenotypic screens using small molecule inhibitors is an important technology for early drug discovery if the relationship between the disease-related cellular phenotype and inhibitors' biological targets can be determined. However, chemical inhibitors are rightfully believed to be less specific than perturbation by biological agents, such as antibody and small inference RNA. Therefore, it is often a challenge in small molecule phenotypic screening to infer the causality between a particular cellular phenotype and the inactivation of the responsible protein due to the off-target effect of the inhibitors. RESULTS: In this article, we present a Roche in-house effort of screening 746 structurally diverse compounds for their cytotoxicity in HeLa cells measured by high content imaging technology. These compounds were also systematically profiled for the targeted and off-target binding affinity to a panel of 25 pre-selected protein kinases in a cell-free system. In an effort to search for the kinases whose activities are crucial for cell survival, we found that the simple association method such as the chi-square test yields a large number of false positives because the observed cytotoxic phenotype is likely to be the result of promiscuous action of less specific inhibitors instead of true consequence of inactivation of single relevant target. We demonstrated that a stratified categorical data analysis technique such as the Cochran-Mantel-Haenszel test is an effective approach to extract the meaningful biological connection from the spurious correlation resulted from confounding covariates. This study indicates that, empowered by appropriate statistical adjustment, small molecule inhibitor perturbation remains a powerful tool to pin down the relevant biomarker for drug safety and efficacy research. CONTACT: xin.wei@roche.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155864&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Computing graphlet signatures of network nodes and motifs in Cytoscape with GraphletCounter.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22155862</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22155862&lt;br/&gt;Authors: Whelan, C. - Sonmez, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Biological network analysis can be enhanced by examining the connections between nodes and the rest of the network. For this purpose we have developed GraphletCounter, an open-source software tool for computing graphlet degree signatures that can operate on its own or as a plug-in to the network analysis environment Cytoscape. A unique characteristic of GraphletCounter is its ability to compute the graphlet signatures of network motifs, which can be specified by files generated by the motif-finding tool mfinder. GraphletCounter displays graphlet signatures for visual inspection within Cytoscape, and can output graphlet data for integration with larger workflows. Availability and implementation: GraphletCounter is implemented in Java. It can be downloaded from the Cytoscape plugin repository, and is also available at http://sonmezsysbio.org/software/ graphletcounter. CONTACT: whelanch@ohsu.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22155862&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Integrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22138362</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22138362&lt;br/&gt;Authors: San Lucas, F. A. - Wang, G. - Scheet, P. - Peng, B.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Storing, annotating and analyzing variants from next-generation sequencing projects can be difficult due to the availability of a wide array of data formats, tools and annotation sources, as well as the sheer size of the data files. Useful tools, including the GATK, ANNOVAR and BEDTools can be integrated into custom pipelines for annotating and analyzing sequence variants. However, building flexible pipelines that support the tracking of variants alongside their samples, while enabling updated annotation and reanalyses, is not a simple task. RESULTS: We have developed variant tools, a flexible annotation and analysis toolset that greatly simplifies the storage, annotation and filtering of variants and the analysis of the underlying samples. variant tools can be used to manage and analyze genetic variants obtained from sequence alignments, and the command-line driven toolset could be used as a foundation for building more sophisticated analytical methods. Availability and implementation: variant tools consists of two command-line driven programs vtools and vtools_report. It is freely available at http://varianttools.sourceforge.net, distributed under a GPL license. CONTACT: bpeng@mdanderson.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22138362&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>DMAN: a Java tool for analysis of multi-well differential scanning fluorimetry experiments.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22135419</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22135419&lt;br/&gt;Authors: Wang, C. K. - Weeratunga, S. K. - Pacheco, C. M. - Hofmann, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Differential scanning fluorimetry (DSF) is a rapid technique that can be used in structural biology to study protein-ligand interactions. We have developed DMAN, a novel tool to analyse multi-well plate data obtained in DSF experiments. DMAN is easy to install and provides a user-friendly interface. Multi-well plate layouts can be designed by the user and experimental data can be annotated and analysed by DMAN according to the specified plate layout. Statistical tests for significance are performed automatically, and graphical tools are also provided to assist in data analysis. The modular concept of this software will allow easy development of other multi-well plate analysis applications in the future. Availability and implementation: DMAN is implemented in Java to provide a cross-platform compatibility. It is freely available to academic users at http://www.structuralchemistry.org/pcsb/. To download DMAN, users will be asked for their name, institution and email address. A manual can also be downloaded from this site. CONTACT: conan.wang@griffith.edu.au; a.hofmann@griffith.edu.au.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22135419&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22135418</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22135418&lt;br/&gt;Authors: Karnovsky, A. - Weymouth, T. - Hull, T. - Tarcea, V. G. - Scardoni, G. - Laudanna, C. - Sartor, M. A. - Stringer, K. A. - Jagadish, H. V. - Burant, C. - Athey, B. - Omenn, G. S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Metabolomics is a rapidly evolving field that holds promise to provide insights into genotype-phenotype relationships in cancers, diabetes and other complex diseases. One of the major informatics challenges is providing tools that link metabolite data with other types of high-throughput molecular data (e.g. transcriptomics, proteomics), and incorporate prior knowledge of pathways and molecular interactions. RESULTS: We describe a new, substantially redesigned version of our tool Metscape that allows users to enter experimental data for metabolites, genes and pathways and display them in the context of relevant metabolic networks. Metscape 2 uses an internal relational database that integrates data from KEGG and EHMN databases. The new version of the tool allows users to identify enriched pathways from expression profiling data, build and analyze the networks of genes and metabolites, and visualize changes in the gene/metabolite data. We demonstrate the applications of Metscape to annotate molecular pathways for human and mouse metabolites implicated in the pathogenesis of sepsis-induced acute lung injury, for the analysis of gene expression and metabolite data from pancreatic ductal adenocarcinoma, and for identification of the candidate metabolites involved in cancer and inflammation. AVAILABILITY: Metscape is part of the National Institutes of Health-supported National Center for Integrative Biomedical Informatics (NCIBI) suite of tools, freely available at http://metscape.ncibi.org. It can be downloaded from http://cytoscape.org or installed via Cytoscape plugin manager. CONTACT: metscape-help@umich.edu; akarnovs@umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22135418&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SNPxGE2: a database for human SNP-coexpression associations.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22135417</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22135417&lt;br/&gt;Authors: Wang, Y. - Joseph, S. J. - Liu, X. - Kelley, M. - Rekaya, R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Recently, gene-coexpression relationships have been found to be often conditional and dynamic. Many studies have suggested that single nucleotide polymorphisms (SNPs) have impacts on gene expression variations in human populations. RESULTS: The SNPxGE(2) database contains the computationally predicted human SNP-coexpression associations, i.e. the differential coexpression between two genes is associated with the genotypes of an SNP. These data were generated from a large-scale association study that was based on the HapMap phase I data, which covered 269 individuals from 4 human populations, 556 873 SNPs and 15 000 gene expression profiles. In order to reduce the computational cost, the SNP-coexpression associations were assessed using gap/substitution models, proven to have a comparable power to logistic regression models. The results, at a false discovery rate (FDR) cutoff of 0.1, consisted of 44 769 and 50 792 SNP-coexpression associations based on single and pooled populations, respectively, and can be queried in the SNPxGE(2) database via either gene symbol or reference SNP ID. For each reported association, a detailed information page is provided. AVAILABILITY: http://lambchop.ads.uga.edu/snpxge2/index.php CONTACT: wyp1125@uga.edu, rrekaya@uga.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22135417&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Which species is it? Species-driven gene name disambiguation using random walks over a mixture of adjacency matrices.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22135416</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22135416&lt;br/&gt;Authors: Harmston, N. - Filsell, W. - Stumpf, M. P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The scientific literature contains a wealth of information about biological systems. Manual curation lacks the scalability to extract this information due to the ever-increasing numbers of papers being published. The development and application of text mining technologies has been proposed as a way of dealing with this problem. However, the inter-species ambiguity of the genomic nomenclature makes mapping of gene mentions identified in text to their corresponding Entrez gene identifiers an extremely difficult task. We propose a novel method, which transforms a MEDLINE record into a mixture of adjacency matrices; by performing a random walkover the resulting graph, we can perform multi-class supervised classification allowing the assignment of taxonomy identifiers to individual gene mentions. The ability to achieve good performance at this task has a direct impact on the performance of normalizing gene mentions to Entrez gene identifiers. Such graph mixtures add flexibility and allow us to generate probabilistic classification schemes that naturally reflect the uncertainties inherent, even in literature-derived data. RESULTS: Our method performs well in terms of both micro- and macro-averaged performance, achieving micro-F(1) of 0.76 and macro-F(1) of 0.36 on the publicly available DECA corpus. Re-curation of the DECA corpus was performed, with our method achieving 0.88 micro-F(1) and 0.51 macro-F(1). Our method improves over standard classification techniques [such as support vector machines (SVMs)] in a number of ways: flexibility, interpretability and its resistance to the effects of class bias in the training data. Good performance is achieved without the need for computationally expensive parse tree generation or 'bag of words classification'. CONTACT: m.stumpf@imperial.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22135416&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22130595</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22130595&lt;br/&gt;Authors: Chen, K. - Mizianty, M. J. - Kurgan, L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Nucleotides are multifunctional molecules that are essential for numerous biological processes. They serve as sources for chemical energy, participate in the cellular signaling and they are involved in the enzymatic reactions. The knowledge of the nucleotide-protein interactions helps with annotation of protein functions and finds applications in drug design. RESULTS: We propose a novel ensemble of accurate high-throughput predictors of binding residues from the protein sequence for ATP, ADP, AMP, GTP and GDP. Empirical tests show that our NsitePred method significantly outperforms existing predictors and approaches based on sequence alignment and residue conservation scoring. The NsitePred accurately finds more binding residues and binding sites and it performs particularly well for the sites with residues that are clustered close together in the sequence. The high predictive quality stems from the usage of novel, comprehensive and custom-designed inputs that utilize information extracted from the sequence, evolutionary profiles, several sequence-predicted structural descriptors and sequence alignment. Analysis of the predictive model reveals several sequence-derived hallmarks of nucleotide-binding residues; they are usually conserved and flanked by less conserved residues, and they are associated with certain arrangements of secondary structures and amino acid pairs in the specific neighboring positions in the sequence. AVAILABILITY: http://biomine.ece.ualberta.ca/nSITEpred/ CONTACT: lkurgan@ece.ualberta.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22130595&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>PGAP: pan-genomes analysis pipeline.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22130594</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22130594&lt;br/&gt;Authors: Zhao, Y. - Wu, J. - Yang, J. - Sun, S. - Xiao, J. - Yu, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: With the rapid development of DNA sequencing technology, increasing bacteria genome data enable the biologists to dig the evolutionary and genetic information of prokaryotic species from pan-genome sight. Therefore, the high-efficiency pipelines for pan-genome analysis are mostly needed. We have developed a new pan-genome analysis pipeline (PGAP), which can perform five analytic functions with only one command, including cluster analysis of functional genes, pan-genome profile analysis, genetic variation analysis of functional genes, species evolution analysis and function enrichment analysis of gene clusters. PGAP's performance has been evaluated on 11 Streptococcus pyogenes strains. AVAILABILITY: PGAP is developed with Perl script on the Linux Platform and the package is freely available from http://pgap.sf.net. CONTACT: junyu@big.ac.cn; xiaojingfa@big.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22130594&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>NRPSsp: non-ribosomal peptide synthase substrate predictor.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22130593</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22130593&lt;br/&gt;Authors: Prieto, C. - Garcia-Estrada, C. - Lorenzana, D. - Martin, J. F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Non-ribosomal peptide synthetases (NRPSs) are multi-modular enzymes, which biosynthesize many important peptide compounds produced by bacteria and fungi. Some studies have revealed that an individual domain within the NRPSs shows significant substrate selectivity. The discovery and characterization of non-ribosomal peptides are of great interest for the biotechnological industries. We have applied computational mining methods in order to build a database of NRPSs modules that bind to specific substrates. We have used this database to build a hidden Markov model predictor of substrates that bind to a given NRPS. AVAILABILITY: The database and the predictor are freely available on an easy-to-use website at www.nrpssp.com. CONTACT: carlos.prieto@unileon.es SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22130593&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Gaussian process modelling for bicoid mRNA regulation in spatio-temporal Bicoid profile.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22130592</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22130592&lt;br/&gt;Authors: Liu, W. - Niranjan, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Bicoid protein molecules, translated from maternally provided bicoid mRNA, establish a concentration gradient in Drosophila early embryonic development. There is experimental evidence that the synthesis and subsequent destruction of this protein is regulated at source by precise control of the stability of the maternal mRNA. Can we infer the driving function at the source from noisy observations of the spatio-temporal protein profile? We use non-parametric Gaussian process regression for modelling the propagation of Bicoid in the embryo and infer aspects of source regulation as a posterior function. RESULTS: With synthetic data from a 1D diffusion model with a source simulated to model mRNA stability regulation, our results establish that the Gaussian process method can accurately infer the driving function and capture the spatio-temporal dynamics of embryonic Bicoid propagation. On real data from the FlyEx database, too, the reconstructed source function is indicative of stability regulation, but is temporally smoother than what we expected, partly due to the fact that the dataset is only partially observed. To be in line with recent thinking on the subject, we also analyse this model with a spatial gradient of maternal mRNA, rather than being fixed at only the anterior pole. CONTACT: m.niranjan@southampton.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22130592&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SCPC: a method to structurally compare protein complexes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22130591</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22130591&lt;br/&gt;Authors: Koike, R. - Ota, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Protein-protein interactions play vital functional roles in various biological phenomena. Physical contacts between proteins have been revealed using experimental approaches that have solved the structures of protein complexes at atomic resolution. To examine the huge number of protein complexes available in the Protein Data Bank, an efficient automated method that compares protein complexes is required. RESULTS: We have developed Structural Comparison of Protein Complexes (SCPC), a novel method to structurally compare protein complexes. SCPC compares the spatial arrangements of subunits in a complex with those in another complex using secondary structure elements. Similar substructures are detected in two protein complexes and the similarity is scored. SCPC was applied to dimers, homo-oligomers and haemoglobins. SCPC properly estimated structural similarities between the dimers examined as well as an existing method, MM-align. Conserved substructures were detected in a homo-tetramer and a homo-hexamer composed of homologous proteins. Classification of quaternary structures of haemoglobins using SCPC was consistent with the conventional classification. The results demonstrate that SCPC is a valuable tool to investigate the structures of protein complexes. AVAILABILITY: SCPC is available at http://idp1.force.cs.is.nagoya-u.ac.jp/scpc/. CONTACT: rkoike@is.nagoya-u.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22130591&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>An infrastructure for ontology-based information systems in biomedicine: RICORDO case study.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22130590</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22130590&lt;br/&gt;Authors: Wimalaratne, S. M. - Grenon, P. - Hoehndorf, R. - Gkoutos, G. V. - de Bono, B.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The article presents an infrastructure for supporting the semantic interoperability of biomedical resources based on the management (storing and inference-based querying) of their ontology-based annotations. This infrastructure consists of: (i) a repository to store and query ontology-based annotations; (ii) a knowledge base server with an inference engine to support the storage of and reasoning over ontologies used in the annotation of resources; (iii) a set of applications and services allowing interaction with the integrated repository and knowledge base. The infrastructure is being prototyped and developed and evaluated by the RICORDO project in support of the knowledge management of biomedical resources, including physiology and pharmacology models and associated clinical data. Availability and implementation: The RICORDO toolkit and its source code are freely available from http://ricordo.eu/relevant-resources. CONTACT: sarala@ebi.ac.uk.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22130590&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>PathVar: analysis of gene and protein expression variance in cellular pathways using microarray data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22123829</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22123829&lt;br/&gt;Authors: Glaab, E. - Schneider, R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Finding significant differences between the expression levels of genes or proteins across diverse biological conditions is one of the primary goals in the analysis of functional genomics data. However, existing methods for identifying differentially expressed genes or sets of genes by comparing measures of the average expression across predefined sample groups do not detect differential variance in the expression levels across genes in cellular pathways. Since corresponding pathway deregulations occur frequently in microarray gene or protein expression data, we present a new dedicated web application, PathVar, to analyze these data sources. The software ranks pathway-representing gene/protein sets in terms of the differences of the variance in the within-pathway expression levels across different biological conditions. Apart from identifying new pathway deregulation patterns, the tool exploits these patterns by combining different machine learning methods to find clusters of similar samples and build sample classification models. AVAILABILITY: freely available at http://pathvar.embl.de CONTACT: enrico.glaab@uni.lu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22123829&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A novel specific edge effect correction method for RNA interference screenings.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22121160</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22121160&lt;br/&gt;Authors: Carralot, J. P. - Ogier, A. - Boese, A. - Genovesio, A. - Brodin, P. - Sommer, P. - Dorval, T.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: High-throughput screening (HTS) is an important method in drug discovery in which the activities of a large number of candidate chemicals or genetic materials are rapidly evaluated. Data are usually obtained by measurements on samples in microwell plates and are often subjected to artefacts that can bias the result selection. We report here a novel edge effect correction algorithm suitable for RNA interference (RNAi) screening, because its normalization does not rely on the entire dataset and takes into account the specificities of such a screening process. The proposed method is able to estimate the edge effects for each assay plate individually using the data from a single control column based on diffusion model, and thus targeting a specific but recurrent well-known HTS artefact. This method was first developed and validated using control plates and was then applied to the correction of experimental data generated during a genome-wide siRNA screen aimed at studying HIV-host interactions. The proposed algorithm was able to correct the edge effect biasing the control data and thus improve assay quality and, consequently, the hit-selection step. CONTACT: dorvalt@ip-korea.org; jean-philippe.carralot@roche.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22121160&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Modelling time course gene expression data with finite mixtures of linear additive models.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22121159</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22121159&lt;br/&gt;Authors: Grun, B. - Scharl, T. - Leisch, F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: A model class of finite mixtures of linear additive models is presented. The component-specific parameters in the regression models are estimated using regularized likelihood methods. The advantages of the regularization are that (i) the pre-specified maximum degrees of freedom for the splines is less crucial than for unregularized estimation and that (ii) for each component individually a suitable degree of freedom is selected in an automatic way. The performance is evaluated in a simulation study with artificial data as well as on a yeast cell cycle dataset of gene expression levels over time. AVAILABILITY: The latest release version of the R package flexmix is available from CRAN (http://cran.r-project.org/). CONTACT: Bettina.Gruen@jku.at.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22121159&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Inferring sequence regions under functional divergence in duplicate genes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22121158</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22121158&lt;br/&gt;Authors: Huang, Y. F. - Golding, G. B.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: A number of statistical phylogenetic methods have been proposed to identify type-I functional divergence in duplicate genes by detecting heterogeneous substitution rates in phylogenetic trees. A common disadvantage of the existing methods is that autocorrelation of substitution rates along sequences is not modeled. This reduces the power of existing methods to identify regions under functional divergence. RESULTS: We design a phylogenetic hidden Markov model to identify protein regions relevant to type-I functional divergence. A C++ program, HMMDiverge, has been developed to estimate model parameters and to identify regions under type-I functional divergence. Simulations demonstrate that HMMDiverge can successfully identify protein regions under type-I functional divergence unless the discrepancy of substitution rates between subfamilies is very limited or the regions under functional divergence are very short. Applying HMMDiverge to G protein alpha subunits in animals, we identify a candidate region longer than 20 amino acids, which overlaps with the alpha-4 helix and the alpha4-beta6 loop in the GTPase domain with divergent rates of substitutions. These sites are different from those reported by an existing program, DIVERGE2. Interestingly, previous biochemical studies suggest the alpha-4 helix and the alpha4-beta6 loop are important to the specificity of the receptor-G protein interaction. Therefore, the candidate region reported by HMMDiverge highlights that the type-I functional divergence in G protein alpha subunits may be relevant to the change of receptor-G protein specificity after gene duplication. From these results, we conclude that HMMDiverge is a useful tool to identify regions under type-I functional divergence after gene duplication. AVAILABILITY: C++ source codes of HMMDiverge and simulation programs used in this study, as well as example datasets, are available at http://info.mcmaster.ca/yifei/software/HMMDiverge.html CONTACT: golding@mcmaster.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22121158&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Using Sybil for interactive comparative genomics of microbes on the web.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22121156</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22121156&lt;br/&gt;Authors: Riley, D. R. - Angiuoli, S. V. - Crabtree, J. - Dunning Hotopp, J. C. - Tettelin, H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Analysis of multiple genomes requires sophisticated tools that provide search, visualization, interactivity and data export. Comparative genomics datasets tend to be large and complex, making development of these tools difficult. In addition to scalability, comparative genomics tools must also provide user-friendly interfaces such that the research scientist can explore complex data with minimal technical expertise. RESULTS: We describe a new version of the Sybil software package and its application to the important human pathogen Streptococcus pneumoniae. This new software provides a feature-rich set of comparative genomics tools for inspection of multiple genome structures, mining of orthologous gene families and identification of potential vaccine candidates. AVAILABILITY: The S.pneumoniae resource is online at http://strepneumo-sybil.igs.umaryland.edu. The software, database and website are available for download as a portable virtual machine and from http://sourceforge.net/projects/sybil. CONTACT: driley@som.umaryland.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22121156&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MSnbase-an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22113085</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22113085&lt;br/&gt;Authors: Gatto, L. - Lilley, K. S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: MSnbase is an R/Bioconductor package for the analysis of quantitative proteomics experiments that use isobaric tagging. It provides an exploratory data analysis framework for reproducible research, allowing raw data import, quality control, visualization, data processing and quantitation. MSnbase allows direct integration of quantitative proteomics data with additional facilities for statistical analysis provided by the Bioconductor project. AVAILABILITY: MSnbase is implemented in R (version &gt;/=2.13.0) and available at the Bioconductor web site (http://www.bioconductor.org/). Vignettes outlining typical workflows, input/output capabilities and detailing underlying infrastructure are included in the package. CONTACT: lg390@cam.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available from Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22113085&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>FTSite: high accuracy detection of ligand binding sites on unbound protein structures.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22113084</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22113084&lt;br/&gt;Authors: Ngan, C. H. - Hall, D. R. - Zerbe, B. - Grove, L. E. - Kozakov, D. - Vajda, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Binding site identification is a classical problem that is important for a range of applications, including the structure-based prediction of function, the elucidation of functional relationships among proteins, protein engineering and drug design. We describe an accurate method of binding site identification, namely FTSite. This method is based on experimental evidence that ligand binding sites also bind small organic molecules of various shapes and polarity. The FTSite algorithm does not rely on any evolutionary or statistical information, but achieves near experimental accuracy: it is capable of identifying the binding sites in over 94% of apo proteins from established test sets that have been used to evaluate many other binding site prediction methods. AVAILABILITY: FTSite is freely available as a web-based server at http://ftsite.bu.edu. CONTACT: vajda@bu.edu; midas@bu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22113084&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A novel and versatile computational tool to model translation.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22113083</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22113083&lt;br/&gt;Authors: Chu, D. - Zabet, N. - von der Haar, T.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Much is now known about the mechanistic details of gene translation. There are also rapid advances in high-throughput technologies to determine quantitative aspects of the system. As a consequence-realistic and system-wide simulation models of translation are now feasible. Such models are also needed as devices to integrate a large volume of highly fragmented data known about translation. Software: In this application note, we present a novel, highly efficient software tool to model translation. The tool represents the main aspects of translation. Features include a representation of exhaustible tRNA pools, ribosome-ribosome interactions and differential initiation rates for different mRNA species. The tool is written in Java, and is hence portable and can be parameterized for any organism. AVAILABILITY: The model can be obtained from the authors or directly downloaded from the authors' home-page (http://goo.gl/JUWvI). CONTACT: d.f.chu@kent.ac.uk.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22113083&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GenomicTools: a computational platform for developing high-throughput analytics in genomics.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22113082</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22113082&lt;br/&gt;Authors: Tsirigos, A. - Haiminen, N. - Bilal, E. - Utro, F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. RESULTS: We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. AVAILABILITY: The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools. CONTACT: atsirigo@us.ibm.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22113082&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22110245</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22110245&lt;br/&gt;Authors: Lischer, H. E. - Excoffier, L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The analysis of genetic data often requires a combination of several approaches using different and sometimes incompatible programs. In order to facilitate data exchange and file conversions between population genetics programs, we introduce PGDSpider, a Java program that can read 27 different file formats and export data into 29, partially overlapping, other file formats. The PGDSpider package includes both an intuitive graphical user interface and a command-line version allowing its integration in complex data analysis pipelines. AVAILABILITY: PGDSpider is freely available under the BSD 3-Clause license on http://cmpg.unibe.ch/software/PGDSpider/ CONTACT: heidi.lischer@iee.unibe.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22110245&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>FSR: feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22110244</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22110244&lt;br/&gt;Authors: Wong, G. - Leckie, C. - Kowalczyk, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. RESULTS: We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. AVAILABILITY: FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR CONTACT: gwong@csse.unimelb.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available from Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22110244&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MetalionRNA: computational predictor of metal-binding sites in RNA structures.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22110243</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22110243&lt;br/&gt;Authors: Philips, A. - Milanowska, K. - Lach, G. - Boniecki, M. - Rother, K. - Bujnicki, J. M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Metal ions are essential for the folding of RNA molecules into stable tertiary structures and are often involved in the catalytic activity of ribozymes. However, the positions of metal ions in RNA 3D structures are difficult to determine experimentally. This motivated us to develop a computational predictor of metal ion sites for RNA structures. RESULTS: We developed a statistical potential for predicting positions of metal ions (magnesium, sodium and potassium), based on the analysis of binding sites in experimentally solved RNA structures. The MetalionRNA program is available as a web server that predicts metal ions for RNA structures submitted by the user. AVAILABILITY: The MetalionRNA web server is accessible at http://metalionrna.genesilico.pl/. CONTACT: iamb@genesilico.pl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22110243&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Integrating human and murine anatomical gene expression data for improved comparisons.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22106336</link>
      <description>Publication Date: 2012 Feb 1 PMID: 22106336&lt;br/&gt;Authors: Jimenez-Lozano, N. - Segura, J. - Macias, J. R. - Vega, J. - Carazo, J. M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Information concerning the gene expression pattern in four dimensions (species, genes, anatomy and developmental stage) is crucial for unraveling the roles of genes through time. There are a variety of anatomical gene expression databases, but extracting information from them can be hampered by their diversity and heterogeneity. RESULTS: aGEM 3.1 (anatomic Gene Expression Mapping) addresses the issues of diversity and heterogeneity of anatomical gene expression databases by integrating six mouse gene expression resources (EMAGE, GXD, GENSAT, Allen Brain Atlas data base, EUREXPRESS and BioGPS) and three human gene expression databases (HUDSEN, Human Protein Atlas and BioGPS). Furthermore, aGEM 3.1 provides new cross analysis tools to bridge these resources. Availability and implementation: aGEM 3.1 can be queried using gene and anatomical structure. Output information is presented in a friendly format, allowing the user to display expression maps and correlation matrices for a gene or structure during development. An in-depth study of a specific developmental stage is also possible using heatmaps that relate gene expression with anatomical components. http://agem.cnb.csic.es CONTACT: natalia@cnb.csic.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22106336&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Data-driven information retrieval in heterogeneous collections of transcriptomics data links SIM2s to malignant pleural mesothelioma.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22106335</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22106335&lt;br/&gt;Authors: Caldas, J. - Gehlenborg, N. - Kettunen, E. - Faisal, A. - Ronty, M. - Nicholson, A. G. - Knuutila, S. - Brazma, A. - Kaski, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Genome-wide measurement of transcript levels is an ubiquitous tool in biomedical research. As experimental data continues to be deposited in public databases, it is becoming important to develop search engines that enable the retrieval of relevant studies given a query study. While retrieval systems based on meta-data already exist, data-driven approaches that retrieve studies based on similarities in the expression data itself have a greater potential of uncovering novel biological insights. RESULTS: We propose an information retrieval method based on differential expression. Our method deals with arbitrary experimental designs and performs competitively with alternative approaches, while making the search results interpretable in terms of differential expression patterns. We show that our model yields meaningful connections between biological conditions from different studies. Finally, we validate a previously unknown connection between malignant pleural mesothelioma and SIM2s suggested by our method, via real-time polymerase chain reaction in an independent set of mesothelioma samples. AVAILABILITY: Supplementary data and source code are available from http://www.ebi.ac.uk/fg/research/rex. CONTACT: samuel.kaski@aalto.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22106335&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Addendum: topology and prediction of RNA pseudoknots.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22106334</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22106334&lt;br/&gt;Authors: Reidys, C. M. - Huang, F. W. - Andersen, J. E. - Penner, R. C. - Stadler, P. F. - Nebel, M. E.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22106334&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Identification of context-specific gene regulatory networks with GEMULA--gene expression modeling using LAsso.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22106333</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22106333&lt;br/&gt;Authors: Geeven, G. - van Kesteren, R. E. - Smit, A. B. - de Gunst, M. C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Gene regulatory networks, in which edges between nodes describe interactions between transcriptional regulators and their target genes, determine the coordinated spatiotemporal expression of genes. Especially in higher organisms, context-specific combinatorial regulation by transcription factors (TFs) is believed to determine cellular states and fates. TF-target gene interactions can be studied using high-throughput techniques such as ChIP-chip or ChIP-Seq. These experiments are time and cost intensive, and further limited by, for instance, availability of high affinity TF antibodies. Hence, there is a practical need for methods that can predict TF-TF and TF-target gene interactions in silico, i.e. from gene expression and DNA sequence data alone. We propose GEMULA, a novel approach based on linear models to predict TF-gene expression associations and TF-TF interactions from experimental data. GEMULA is based on linear models, fast and considers a wide range of biologically plausible models that describe gene expression data as a function of predicted TF binding to gene promoters. RESULTS: We show that models inferred with GEMULA are able to explain roughly 70% of the observed variation in gene expression in the yeast heat shock response. The functional relevance of the inferred TF-TF interactions in these models are validated by different sources of independent experimental evidence. We also have applied GEMULA to an in vitro model of neuronal outgrowth. Our findings confirm existing knowledge on gene regulatory interactions underlying neuronal outgrowth, but importantly also generate new insights into the temporal dynamics of this gene regulatory network that can now be addressed experimentally. AVAILABILITY: The GEMULA R-package is available from http://www.few.vu.nl/~degunst/gemula_1.0.tar.gz. CONTACT: g.geeven@hubrecht.eu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22106333&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22101153</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22101153&lt;br/&gt;Authors: Jones, D. T. - Buchan, D. W. - Cozzetto, D. - Pontil, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. RESULTS: PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation &gt;23) was &gt;/=0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. AVAILABILITY: The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV CONTACT: d.jones@cs.ucl.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22101153&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22088845</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22088845&lt;br/&gt;Authors: Asmann, Y. W. - Middha, S. - Hossain, A. - Baheti, S. - Li, Y. - Chai, H. S. - Sun, Z. - Duffy, P. H. - Hadad, A. A. - Nair, A. - Liu, X. - Zhang, Y. - Klee, E. W. - Kalari, K. R. - Kocher, J. P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways. Availability and implementation: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm CONTACT: Hossain.Asif@mayo.edu; Kocher.JeanPierre@mayo.edu SUPPLEMENTARY INFORMATION: Supplementary data are provided at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22088845&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>NoRSE: noise reduction and state evaluator for high-frequency single event traces.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22088841</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22088841&lt;br/&gt;Authors: Reuel, N. F. - Bojo, P. - Zhang, J. - Boghossian, A. A. - Ahn, J. H. - Kim, J. H. - Strano, M. S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: NoRSE was developed to analyze high-frequency datasets collected from multistate, dynamic experiments, such as molecular adsorption and desorption onto carbon nanotubes. As technology improves sampling frequency, these stochastic datasets become increasingly large with faster dynamic events. More efficient algorithms are needed to accurately locate the unique states in each time trace. NoRSE adapts and optimizes a previously published noise reduction algorithm and uses a custom peak flagging routine to rapidly identify unique event states. The algorithm is explained using experimental data from our lab and its fitting accuracy and efficiency are then shown with a generalized model of stochastic datasets. The algorithm is compared to another recently published state finding algorithm and is found to be 27 times faster and more accurate over 55% of the generalized experimental space. NoRSE is written as an M-file for Matlab. AVAILABILITY: http://web.mit.edu/stranogroup/NoRSE.txt CONTACT: strano@mit.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22088841&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Discovering transcription factor regulatory targets using gene expression and binding data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22084256</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22084256&lt;br/&gt;Authors: Maienschein-Cline, M. - Zhou, J. - White, K. P. - Sciammas, R. - Dinner, A. R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping TF binding sites genome wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene. RESULTS: We present EMBER (Expectation Maximization of Binding and Expression pRofiles), a method that integrates high-throughput binding data (e.g. ChIP-chip or ChIP-seq) with gene expression data (e.g. DNA microarray) via an unsupervised machine learning algorithm for inferring the gene targets of sets of TF binding sites. Genes selected are those that match overrepresented expression patterns, which can be used to provide information about multiple TF regulatory modes. We apply the method to genome-wide human breast cancer data and demonstrate that EMBER confirms a role for the TFs estrogen receptor alpha, retinoic acid receptors alpha and gamma in breast cancer development, whereas the conventional approach of assigning regulatory targets based on proximity does not. Additionally, we compare several predicted target genes from EMBER to interactions inferred previously, examine combinatorial effects of TFs on gene regulation and illustrate the ability of EMBER to discover multiple modes of regulation. AVAILABILITY: All code used for this work is available at http://dinner-group.uchicago.edu/downloads.html CONTACT: dinner@uchicago.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22084256&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Gene set analysis in the cloud.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22084254</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22084254&lt;br/&gt;Authors: Zhang, L. - Gu, S. - Liu, Y. - Wang, B. - Azuaje, F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Cloud computing offers low cost and highly flexible opportunities in bioinformatics. Its potential has already been demonstrated in high-throughput sequence data analysis. Pathway-based or gene set analysis of expression data has received relatively less attention. We developed a gene set analysis algorithm for biomarker identification in the cloud. The resulting tool, YunBe, is ready to use on Amazon Web Services. Moreover, here we compare its performance to those obtained with desktop and computing cluster solutions. Availability and implementation: YunBe is open-source and freely accessible within the Amazon Elastic MapReduce service at s3n://lrcv-crp-sante/app/yunbe.jar. Source code and user's guidelines can be downloaded from http://tinyurl.com/yunbedownload. CONTACT: francisco.azuaje@crp-sante.lu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22084254&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22084253</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22084253&lt;br/&gt;Authors: Ding, J. - Bashashati, A. - Roth, A. - Oloumi, A. - Tse, K. - Zeng, T. - Haffari, G. - Hirst, M. - Marra, M. A. - Condon, A. - Aparicio, S. - Shah, S. P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge. RESULTS: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth 'false positive' predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study. AVAILABILITY: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca. CONTACT: saparicio@bccrc.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22084253&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>BadiRate: estimating family turnover rates by likelihood-based methods.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22080468</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22080468&lt;br/&gt;Authors: Librado, P. - Vieira, F. G. - Rozas, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. RESULTS: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset. AVAILABILITY: BadiRate software and documentation is available from http://www.ub.edu/softevol/badirate. CONTACT: jrozas@ub.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22080468&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Fast computation of minimum hybridization networks.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22072387</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22072387&lt;br/&gt;Authors: Albrecht, B. - Scornavacca, C. - Cenci, A. - Huson, D. H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Hybridization events in evolution may lead to incongruent gene trees. One approach to determining possible interspecific hybridization events is to compute a hybridization network that attempts to reconcile incongruent gene trees using a minimum number of hybridization events. RESULTS: We describe how to compute a representative set of minimum hybridization networks for two given bifurcating input trees, using a parallel algorithm and provide a user-friendly implementation. A simulation study suggests that our program performs significantly better than existing software on biologically relevant data. Finally, we demonstrate the application of such methods in the context of the evolution of the Aegilops/Triticum genera. Availability and implementation: The algorithm is implemented in the program Dendroscope 3, which is freely available from www.dendroscope.org and runs on all three major operating systems. CONTACT: scornava@informatik.uni-tuebingen.de; huson@informatik.uni-tuebingen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22072387&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Wavelet-based image fusion in multi-view three-dimensional microscopy.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22072386</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22072386&lt;br/&gt;Authors: Rubio-Guivernau, J. L. - Gurchenkov, V. - Luengo-Oroz, M. A. - Duloquin, L. - Bourgine, P. - Santos, A. - Peyrieras, N. - Ledesma-Carbayo, M. J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Multi-view microscopy techniques such as Light-Sheet Fluorescence Microscopy (LSFM) are powerful tools for 3D + time studies of live embryos in developmental biology. The sample is imaged from several points of view, acquiring a set of 3D views that are then combined or fused in order to overcome their individual limitations. Views fusion is still an open problem despite recent contributions in the field. RESULTS: We developed a wavelet-based multi-view fusion method that, due to wavelet decomposition properties, is able to combine the complementary directional information from all available views into a single volume. Our method is demonstrated on LSFM acquisitions from live sea urchin and zebrafish embryos. The fusion results show improved overall contrast and details when compared with any of the acquired volumes. The proposed method does not need knowledge of the system's point spread function (PSF) and performs better than other existing PSF independent fusion methods. Availability and Implementation: The described method was implemented in Matlab (The Mathworks, Inc., USA) and a graphic user interface was developed in Java. The software, together with two sample datasets, is available at http://www.die.upm.es/im/software/SPIMFusionGUI.zip A public release, free of charge for non-commercial use, is planned after the publication of this article. CONTACT: jlrubio@die.upm.es; nadine.peyrieras@inaf.cnrs-gif.fr; mledesma@die.upm.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22072386&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>NARWHAL, a primary analysis pipeline for NGS data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=22072383</link>
      <description>Publication Date: 2012 Jan 15 PMID: 22072383&lt;br/&gt;Authors: Brouwer, R. W. - van den Hout, M. C. - Grosveld, F. G. - van Ijcken, W. F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The NARWHAL software pipeline has been developed to automate the primary analysis of Illumina sequencing data. This pipeline combines a new and flexible de-multiplexing tool with open-source aligners and automated quality assessment. The entire pipeline can be run using only one simple sample-sheet for diverse sequencing applications. NARWHAL creates a sample-oriented data structure and outperforms existing tools in speed. AVAILABILITY: https://trac.nbic.nl/narwhal/ CONTACT: w.vanijcken@erasmusmc.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D22072383&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
  </channel>
</rss>

