<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
  xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Bioinformatics</title>
    <link>http://barf.jcowboy.org</link>
    <description>Bioinformatics recent publications</description>
    <language>en-us</language>
    <image>
      <url>http://barf.jcowboy.org/pubmed.gif</url>
      <title>the data for this feed is provided by PubMed</title>
      <link>http://barf.jcowboy.org</link>
    </image>
    <item>
      <title>Model-Based Clustering of Microarray Expression Data via Latent Gaussian Mixture Models.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20802251</link>
      <description>Publication Date: 2010 Aug 29 PMID: 20802251&lt;br/&gt;Authors: McNicholas, P. D. - Murphy, T. B.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this paper, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to twelve models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of twelve mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the EM algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the EPGMM family, is then applied to two well-known gene expression data sets. RESULTS: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. AVAILABILITY: The reduced, preprocessed data that were analyzed are available at www.paulmcnicholas.info CONTACT: pmcnicho@uoguelph.ca.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20802251&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>XMSF: Structure-preserving noise reduction and pre-segmentation in microscope tomography.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20802209</link>
      <description>Publication Date: 2010 Aug 27 PMID: 20802209&lt;br/&gt;Authors: Bilbao-Castro, J. R. - Sorzano, C. O. - Garcia, I. - Fernandez, J. J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Interpretation of electron tomograms is difficult due to the high noise levels. Thus, denoising techniques are needed to improve the signal-to-noise ratio. XMSF (Microscopy Mean Shift Filtering) is a fast, user-friendly application that succeeds in filtering noise while preserving the structures of interest. It is based on the extension to 3D of a method widely applied in other image processing fields under very different scenarios. XMSF has been tested for a variety of tomograms, showing a great potential to become a state-of the-art filtering program in electron tomography. Applied iteratively, the algorithm yields pre-segmented volumes facilitating posterior segmentation tasks. Moreover, execution times remain low thanks to parallel computing techniques to exploit current multicore computers. AVAILABILITY: http://sites.google.com/site/xmsfilter/ CONTACT: jrbcast@ace.ual.es SUPPLEMENTARY INFORMATION: http://sites.google.com/site/xmsfilter/&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20802209&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Preferential Use of Protein Domain-pairs as Interaction Mediators: Order and Transitivity.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20802208</link>
      <description>Publication Date: 2010 Aug 27 PMID: 20802208&lt;br/&gt;Authors: Itzhaki, Z. - Akiva, E. - Margalit, H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Many protein-protein interactions are mediated by protein-domains. The structural data of multi-domain PPIs reveal the domain-pair (or pairs) that mediate a protein-protein interaction, and implicitly also the domain-pairs that are not involved in the interaction. By analyzing such data, preference relations between domain-pairs as interaction mediators may be revealed. RESULTS: Here we analyze the differential use of domain-pairs as mediators of stable interactions based on structurally-solved multi-domain protein complexes. Our analysis revealed domain-pairs that are preferentially used as interaction mediators and domain-pairs that rarely or never mediate interaction, independent of the proteins' context. Between these extremes there are domain-pairs that mediate protein interaction in some protein contexts, while in other contexts different domain-pairs predominate over them. By describing the preference-relations between domain-pairs as a network we uncovered partial order and transitivity in these relations, which we further exploited for predicting interaction-mediating domains. The preferred domain-pairs and the ones over which they predominate differ in several properties, but these differences cannot yet determine explicitly what underlies the differential use of domain-pairs as interaction mediators. One property that stood up was the over-abundance of homotypic interactions among the preferred domain-pairs, supporting previous suggestions on the advantages in the use of domain self-interaction for mediating protein interactions. Finally, we show a possible association between the preferred domain-pairs and the function of the complex where they reside. CONTACT: hanahm@ekmd.huji.ac.il.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20802208&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20801914</link>
      <description>Publication Date: 2010 Aug 26 PMID: 20801914&lt;br/&gt;Authors: Liu, B. H. - Yu, H. - Tu, K. - Li, C. - Li, Y. X. - Li, Y. Y.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Gene coexpression analysis was developed to explore gene interconnection at the expression level from a systems perspective, and 'differential coexpression analysis (DCEA)', which examines the change in gene expression correlation between two conditions, was accordingly designed as a complementary technique to traditional 'differential expression analysis' (DEA). Since there is a shortage of DCEA tools, we implemented in an R package &quot;DCGL&quot; five DCEA methods for identification of differentially coexpressed genes and differentially coexpressed links, including three currently popular methods and two novel algorithms described in a companion paper. DCGL can serve as an easy-to-use tool to facilitate differential coexpression analyses. AVAILABILITY: The R package and the Vignette are available in the online Supplementary Materials. CONTACT: yyli@scbit.org and yxli@scbit.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20801914&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Simple sequence-based kernels do not predict protein-protein in-teractions.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20801913</link>
      <description>Publication Date: 2010 Aug 27 PMID: 20801913&lt;br/&gt;Authors: Yu, J. - Guo, M. - Needham, C. J. - Huang, Y. - Cai, L. - Westhead, D. R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: A number of methods have been reported that predict protein-protein interactions with high accuracy using only simple sequence-based features such as amino-acid 3-mer content. This is surprising, given that many protein interactions have high specificity that depends on detailed atomic recognition between physiochemically complementary surfaces. Are the reported high accuracies realistic? RESULTS: We find that the reported accuracies of the predictions are significantly over-estimated, and strongly dependent on the structure of the training and testing datasets used. The choice of which protein pairs are deemed as non-interactions in the training data has a variable impact on the accuracy estimates, and the accuracies can be artificially inflated by a bias towards dominant samples in the positive data which result from the presence of hub proteins in the protein interaction network. To address this bias we propose a positive-set-specific method to create a 'balanced' negative set maintaining the degree distribution for each protein, leading to the conclusion that simple sequence-based features contain insufficient information to be useful for predicting protein-protein interactions, but that protein domain-based features have some predictive value. AVAILABILITY: Our method, named 'BRS-nonint', is available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/. CONTACT: D.R.Westhead@leeds.ac.uk, maozuguo@hit.edu.cn Supporting Data: All the data sets used in this study are derived from publicly available data, and are available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/PPI_RandomBalance.html.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20801913&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Ontology- and graph-based similarity assessment in biological networks.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20801912</link>
      <description>Publication Date: 2010 Aug 27 PMID: 20801912&lt;br/&gt;Authors: Wang, H. - Zheng, H. - Azuaje, F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: A standard systems-based approach to biomarker and drug target discovery consists of placing putative biomarkers in the context of a network of biological interactions, followed by different &quot;guilt-by-association&quot; analyses. The latter is typically done based on network structural features. Here an alternative analysis approach in which the networks are analyzed on a &quot;semantic similarity&quot; space is reported. Such information is extracted from ontology-based functional annotations. We present SimTrek, a Cytoscape plugin for ontology-based similarity assessment in biological networks. Availability and SUPPLEMENTARY INFORMATION: http://rosalind.infj.ulst.ac.uk/SimTrek.html CONTACT: Francisco.Azuaje@crp-sante.lu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20801912&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>miRNAkey: a software for microRNA Deep Sequencing analysis.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20801911</link>
      <description>Publication Date: 2010 Aug 27 PMID: 20801911&lt;br/&gt;Authors: Ronen, R. - Gan, I. - Modai, S. - Sukacheov, A. - Dror, G. - Halperin, E. - Shomron, N.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: MicroRNAs (miRNAs) are short abundant non-coding RNAs critical for many cellular processes. Deep Sequencing (Next Generation Sequencing) technologies are being readily used to receive a more accurate depiction of miRNA expression profiles in living cells. This type of analysis is a key step towards improving our understanding of the complexity and mode of miRNA regulation. RESULTS: miRNAkey is a software package designed to be used as a base-station for the analysis of miRNA deep sequencing data. The package implements common steps taken in the analysis of such data, as well as adds unique features, such as data statistics and multiple read determination, generating a novel platform for the analysis of miRNA expression. A user friendly graphical interface is applied to determine the analysis steps. The tabular and graphical output contains general and detailed reports on the sequence reads and provides an accurate picture of the differentially expressed miRNAs in paired samples. Availability and implementation: See [http://ibis.tau.ac.il/miRNAkey.] CONTACT: nshomron@post.tau.ac.il.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20801911&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Study of the Efficiency of Pooling in Haplotype Estimation.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20801910</link>
      <description>Publication Date: 2010 Aug 27 PMID: 20801910&lt;br/&gt;Authors: Kuk, A. Y. - Xu, J. - Yang, Y.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: It has been claimed in the literature that pooling DNA samples is efficient in estimating haplotype frequencies. There is, however, no theoretical justification based on calculation of statistical efficiency. In fact, the limited evidence given so far is based on simulation studies with small numbers of loci. With rapid advance in technology, it is of interest to see if pooling is still efficient when the number of loci increases. METHODS: Instead of resorting to simulation studies, we make use of asymptotic statistical theory to perform exact calculation of the efficiency of pooling relative to no pooling in the estimation of haplotype frequencies. As an intermediate step, we use the log-linear formulation of the haplotype probabilities and derive the asymptotic variance-covariance matrix of the maximum likelihood estimators of the canonical parameters of the log-linear model. RESULTS: Based on our calculations under linkage equilibrium, pooling can suffer huge loss in efficiency relative to no pooling when there are more than three independent loci and the alleles are not rare. Pooling works better for rare alleles. In particular, if all the minor allele frequencies are 0.05, pooling maintains an advantage over no pooling until the number of independent loci reaches 6. High linkage disequilibrium effectively reduces the number of independent loci by ruling out certain haplotypes from occurring. Similar calculations of efficiency for the case of no pooling justify the common belief that it is not worthwhile to use molecular methods to resolve the phase ambiguity of individual genotype data. AVAILABILITY: The R codes for the calculation are available at http://www.stat.nus.edu.sg/~staxj/pooling CONTACT: stakuka@nus.edu.sg.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20801910&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Measuring the physical cohesiveness of proteins using Physical Interaction Enrichment (PIE).</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20798171</link>
      <description>Publication Date: 2010 Aug 26 PMID: 20798171&lt;br/&gt;Authors: Sama, I. E. - Huynen, M. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Protein-Protein Interaction (PPI) networks are a valuable resource for the interpretation of genomics data. However, such networks have interaction enrichment biases for proteins that are often studied. These biases skew quantitative results from comparing PPI networks with genomics data. Here we introduce an approach named Physical Interaction Enrichment (PIE) to eliminate these biases. Methodology: PIE employs a normalization that ensures equal node degree (edge) distribution of a test set and of the random networks it is compared with. It quantifies whether a set of proteins have more interactions between themselves than proteins in random networks, and can therewith be regarded as physically cohesive. RESULTS: Amongst other datasets, we applied PIE to genetic morbid disease (GMD) genes and to genes whose expression is induced upon infection with human-metapneumovirus (HMPV). Both sets contain proteins that are often studied and that have relatively many interactions in the PPI network. Although interactions between proteins of both sets are found to be over-represented in PPI networks, the GMD proteins are not more likely to interact with each other than random proteins when this over-representation is taken into account. By contrast the HMPV-induced genes, representing a biologically more coherent set, encode proteins that do tend to interact with each other and can be used to predict new HMPV-induced genes. By handling biases in PPI networks, PIE can be a valuable tool to quantify the degree to which a set of genes are involved in the same biological process. CONTACT: i.sama@cmbi.ru.nl; m.huynen@cmbi.ru.nl.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20798171&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>RDP3: A flexible and fast computer program for analysing recombination.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20798170</link>
      <description>Publication Date: 2010 Aug 26 PMID: 20798170&lt;br/&gt;Authors: Martin, D. P. - Lemey, P. - Lott, M. - Moulton, V. - Posada, D. - Lefeuvre, P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: RDP3 is a new version of the RDP program for characterizing recombination events in DNA sequence alignments. Among other novelties, this version includes four new recombination analysis methods (3SEQ, VISRD, PHYLRO and LDHAT), new tests for recombination hot-spots, a range of matrix methods for visualising over-all patterns of recombination within datasets and recombination-aware ancestral sequence reconstruction. Complementary to a high degree of analysis flow automation, RDP3 also has a highly interactive and detailed graphical user interface that enables more focussed hands-on cross-checking of results with a wide variety of newly implemented phylogenetic tree construction and matrix-based recombination signal visualisation methods. The new RDP3 can accommodate large datasets and is capable of analysing alignments ranging in size from 1000 x 10 kilobase sequences to 20 x 2 megabase sequences within 48 hours on a desktop PC. AVAILABILITY: RDP3 is available for free from its website (http://darwin.uvigo.es/rdp/rdp.html). CONTACT: darrenpatrickmartin@gmail.com SUPPLEMENTARY INFORMATION: The RDP3 program manual contains detailed descriptions of the various methods it implements and a step-by-step guide describing how best to use these.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20798170&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>METAREP: JCVI Metagenomics Reports - an open source tool for high-performance comparative metagenomics.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20798169</link>
      <description>Publication Date: 2010 Aug 26 PMID: 20798169&lt;br/&gt;Authors: Goll, J. - Rusch, D. - Tanenbaum, D. M. - Thiagarajan, M. - Li, K. - Methe, B. A. - Yooseph, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: JCVI Metagenomics Reports (METAREP) is a Web 2.0 application designed to help scientists analyze and compare annotated metagenomics data sets. It utilizes Solr/Lucene, a high-performance scalable search engine, to quickly query large data collections. Furthermore, users can use its SQL-like query syntax to filter and refine datasets. METAREP provides graphical summaries for top taxonomic and functional classifications as well as a GO, NCBI Taxonomy, and KEGG Pathway Browser. Users can compare absolute and relative counts of multiple datasets at various functional and taxonomic levels. Advanced comparative features comprise statistical tests as well as multidimensional scaling, heatmap and hierarchical clustering plots. Summaries can be exported as tab-delimited files, publication quality plots in PDF format. A data management layer allows collaborative data analysis and result sharing. AVAILABILITY: Website: http://www.jcvi.org/metarep Source Code: http://github.com/jcvi/METAREP CONTACT: syooseph@jcvi.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20798169&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>TFInfer: A Tool for Probabilistic Inference of Transcription Factor Activities.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20739311</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20739311&lt;br/&gt;Authors: Shahzad Asif, H. M. - Rolfe, M. D. - Green, J. - Lawrence, N. D. - Rattray, M. - Sanguinetti, G.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: TFInfer is a novel open-access, standalone tool for genome-wide inference of transcription factor activities from gene expression data. Based on an earlier MATLAB version, the software has now been extended in a number of ways. It has been significantly optimised in terms of performance, and it was given novel functionality, by allowing the user to model both time series and data from multiple independent conditions. With a full documentation and intuitive graphical user interface, together with an in-built data base of yeast and E. coli transcription factors, the software does not require any mathematical or computational expertise to be used effectively. AVAILABILITY: http://homepages.inf.ed.ac.uk/gsanguin/TFInfer.html CONTACT: gsanguin@staffmail.ed.ac.uk.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20739311&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GASSST: Global Alignment Short Sequence Search Tool.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20739310</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20739310&lt;br/&gt;Authors: Rizk, G. - Lavenier, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The rapid development of Next Generation Sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for large scale applications. Moreover many current aligners are becoming inefficient as generated reads grow ever larger. Our goal with our new aligner GASSST is thus twofold - achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads. RESULTS: We propose a new efficient filtering step that discards most alignments coming from the seed phase before they are checked by the costly dynamic programming algorithm. We use a carefully designed series of filters of increasing complexity and efficiency to quickly eliminate most candidate alignments in a wide range of configurations. The main filter uses a precomputed table containing the alignment score of short four base words aligned against each other. This table is reused several times by a new algorithm designed to approximate the score of the full dynamic programming algorithm. We compare the performance of GASSST against BWA, BFAST, SSAHA2 and PASS. We found that GASSST achieves high sensitivity in a wide range of configurations and faster overall execution time than other state-of-the-art aligners. AVAILABILITY: GASSST is distributed under the CeCILL software license at http://www.irisa.fr/symbiose/projects/gassst/. CONTACT: guillaume.rizk@irisa.fr, dominique.lavenier@irisa.fr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20739310&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Structural Genomics of Histone Tail Recognition.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20739309</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20739309&lt;br/&gt;Authors: Wang, M. - Mok, M. W. - Harper, H. - Lee, W. H. - Min, J. - Knapp, S. - Oppermann, U. - Marsden, B. - Schapira, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The Structural Genomics of Histone Tail Recognition web server is an open-access resource that presents within mini articles all publicly available experimental structures of histone tails in complex with human proteins. Each article is composed of interactive 3D slides that dissect the structural mechanism underlying the recognition of specific sequences and histone marks. A concise text html-linked to interactive graphics guides the reader through the main features of the interaction. This resource can be used to analyze and compare binding modes across multiple histone recognition modules, to evaluate the chemical tractability of binding sites involved in epigenetic signaling, and design small molecule inhibitors. AVAILABILITY: http://www.thesgc.org/resources/histone_tails/ CONTACT: matthieu.schapira@utoronto.ca.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20739309&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>2Struc: The Secondary Structure Server.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20739308</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20739308&lt;br/&gt;Authors: Klose, D. P. - Wallace, B. A. - Janes, R. W.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The Defined Secondary Structure of Proteins (DSSP) method is often considered the gold standard for assignment of secondary structure from three-dimensional coordinates. However, there are alternative methods. &quot;2Struc: The Secondary Structure Server&quot; has been created as a single point of access for eight different secondary structure assignment methods. It has been designed to enable comparisons between methods for analyzing the secondary structure content for a single protein. It also includes a second functionality, &quot;Compare-the-Protein&quot; to enable comparisons of the secondary structure features from any one method to be made within a collection of NMR models, or between the crystal structures of two different proteins. AVAILABILITY: http://2struc.cryst.bbk.ac.uk. CONTACT: r.w.janes@qmul.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available atBioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20739308&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>BioRuby: Bioinformatics software for the Ruby programming language.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20739307</link>
      <description>Publication Date: 2010 Aug 25 PMID: 20739307&lt;br/&gt;Authors: Goto, N. - Prins, P. - Nakao, M. - Bonnal, R. - Aerts, J. - Katayama, T.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. AVAILABILITY: BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. CONTACT: Toshiaki Katayama (katayama@bioruby.org); Queries should be directed to the BioRuby mailing list.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20739307&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Detecting two-locus associations allowing for interactions in genome-wide association studies.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20736343</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20736343&lt;br/&gt;Authors: Wan, X. - Yang, C. - Yang, Q. - Xue, H. - Tang, N. L. - Yu, W.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Genome wide associations studies (GWAS) aim to identify genetic susceptibility to complex diseases by assaying and analyzing hundreds of thousands of single-nucleotide polymorphisms (SNPs). Although traditional single-locus statistical tests have identified many genetic determinants of susceptibility, those findings cannot completely explain genetic contributions to complex diseases. Marchini et al. (2005) demonstrated the importance of testing two-locus associations allowing for interactions through a wide range of simulation studies. However, such a test is computationally demanding as we need to test hundreds of billions of SNP pairs in GWAS. Here we provide a method to address this computational burden for dichotomous phenotypes. RESULTS: We have applied our method on nine data sets from GWAS, including the aged-related macular degeneration (AMD) data set, the Parkinson's disease data set and seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). Our method has discovered many associations that were not identified before. The running time for the AMD data set, the Parkinson's disease data set and each of seven WTCCC data sets are 2.5 hours, 82 hours and 90 hours on a standard 3.0 GHz desktop with 4G memory running Windows XP system. Our experiment results demonstrate that our method is feasible for the full scale analyses of both single-locus associations and two-locus associations allowing for interactions in GWAS. AVAILABILITY: http://bioinformatics.ust.hk/SNPAssociation.zip CONTACT: eexiangw@ust.hk, eeyang@ust.hk, eeyu@ust.hk, nelsontang@cuhk.edu.hk.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20736343&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>ICPS: an Integrative Cancer Profiler System.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20736342</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20736342&lt;br/&gt;Authors: Zhang, X. Y. - Shi, L. - Liu, Y. - Tian, F. - Zhao, H. T. - Miao, X. P. - Huang, M. L. - Zhu, X. Y.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Founded upon the database of 570 public signatures, ICPS is a web-based application to obtain biomarker profiles among 11 common cancers by integrating genomic alterations with transcription signatures on the basis of a previously developed integrative pipeline. ICPS supports both public data and user's in-house data, and performs meta-analysis at a cancer sub-type level by combining heterogeneous datasets. Finally, ICPS returns the robust gene signature containing potential cancer biomarkers that may be useful to carcinogenesis study and clinical cancer diagnosis. AVAILABILITY: http://server.bioicps.org CONTACT: zhxy@mail.tsinghua.edu.cn.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20736342&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Obtaining Better Quality Final Clustering by Merging a Collection of Clusterings.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20736341</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20736341&lt;br/&gt;Authors: Mimaroglu, S. - Erdil, E.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Clustering methods including k-means, SOM, UPGMA, DAA, CLICK, GENECLUSTER, CAST, DHC, PMETIS, KMETIS have been widely used in biological studies for gene expression, protein localization, sequence recognition and more. All these clustering methods have some benefits and drawbacks. We propose a novel graph based clustering software called COMUSA for combining the benefits of a collection of clusterings into a final clustering having better overall quality. RESULTS: COMUSA implementation is compared with PMETIS, KMETIS, and k-means. Experimental results on artificial, real, and biological data sets demonstrate the effectiveness of our method. COMUSA produces very good quality clusters in a short amount of time. AVAILABILITY: http://www.cs.umb.edu/~smimarog/comusa CONTACT: selim.mimaroglu@bahcesehir.edu.tr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20736341&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Deep and wide digging for binding motifs in ChIP-Seq data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20736340</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20736340&lt;br/&gt;Authors: Kulakovskiy, I. V. - Boeva, V. A. - Favorov, A. V. - Makeev, V. J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: ChIP-Seq data is a new challenge for motif discovery. Such a data typically consists of thousands of DNA segments with base-specific coverage values. We present a new version of our DNA motif discovery software ChIPMunk adapted for motif discovery in ChIP-Seq data. ChIPMunk is an iterative algorithm that combines greedy optimization with bootstrapping and uses coverage profiles as motif positional preferences. ChIPMunk does not require truncation of long DNA segments and it is practical for processing up to tens of thousands of data sequences. Comparison with traditional (MEME) or ChIP-Seq-oriented (HMS) motif discovery tools shows that ChIPMunk identifies the correct motifs with the same or better quality but works dramatically faster. Availability and Implementation: ChIPMunk is freely available within the ru_genetika Java package:http://line.imb.ac.ru/ChIPMunk. Web-based version is also available. SUPPLEMENTARY INFORMATION: http://line.imb.ac.ru/ChIPMunk/supplement.html.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20736340&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Circoletto: visualizing sequence similarity with Circos.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20736339</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20736339&lt;br/&gt;Authors: Darzentas, N.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We present Circoletto, an online visualization tool based on Circos, which provides a fast, aesthetically pleasing, and informative overview of sequence similarity search results. Availability and Implementation: Online version and downloadable software package for offline use (source code in PERL) freely available at http://bat.ina.certh.gr/tools/circoletto/ CONTACT: ndarz@certh.gr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20736339&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>DRIMM-Synteny: Decomposing Genomes into Evolutionary Conserved Segments.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20736338</link>
      <description>Publication Date: 2010 Aug 24 PMID: 20736338&lt;br/&gt;Authors: Pham, S. K. - Pevzner, P. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The rapidly increasing set of sequenced genomes high-lights the importance of identifying the synteny blocks in multiple and/or highly duplicated genomes. Most synteny block reconstruction algorithms use genes shared over all genomes to construct the synteny blocks for multiple genomes. However, the number of genes shared among all genomes quickly decreases with the increase in the number of genomes. RESULTS: We propose the DRIMM-Synteny algorithm to address this bottleneck and apply it to analyzing genomic architectures of yeast, plant, and mammalian genomes. We further combine synteny block generation with rearrangement analysis to reconstruct the ancestral pre-duplicated yeast genome. CONTACT: kspham@cs.ucsd.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20736338&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Improving Performances of Suboptimal Greedy Iterative Biclustering Heuristics via Localization.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20733064</link>
      <description>Publication Date: 2010 Aug 23 PMID: 20733064&lt;br/&gt;Authors: Erten, C. - Sozdinler, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Biclustering gene expression data is the problem of extracting submatrices of genes and conditions exhibiting significant correlation across both the rows and the columns of a data matrix of expression values. Even the simplest versions of the problem are computationally hard. Most of the proposed solutions therefore employ greedy iterative heuristics that locally optimize a suitably assigned scoring function. METHODS: We provide a fast and simple preprocessing algorithm called localization that reorders the rows and columns of the input data matrix in such a way as to group correlated entries in small local neighborhoods within the matrix. The proposed localization algorithm takes its roots from effective use of graph-theoretical methods applied to problems exhibiting a similar structure to that of biclustering. In order to evaluate the effectivenesss of the localization preprocessing algorithm, we focus on three representative greedy iterative heuristic methods. We show how the localization preprocessing can be incorporated into each representative algorithm to improve biclustering performance. Furthermore we propose a simple biclustering algorithm, Random Extraction After Localization (REAL) which randomly extracts submatrices from the localization preprocessed data matrix, eliminates those with low similarity scores, and provides the rest as correlated structures representing biclusters. RESULTS: We compare the proposed localization preprocessing with another preprocessing alternative, non-negative matrix factorization. We show that our fast and simple localization procedure provides similar or even better results than the computationally heavy matrix factorization preprocessing with regards to H-value tests. We next demonstrate that the performances of the three representative greedy iterative heuristic methods improve with localization preprocessing when biological correlations in the form of functional enrichment and PPI verification constitute the main performance criteria. The fact that the random extraction method based on localization REAL performs better than the representative greedy heuristic methods under same criteria also confirms the effectiveness of the suggested preprocessing method. AVAILABILITY: Supplementary material including code implementations in LEDA C++ library, experimental data, and the results are available at http://hacivat.khas.edu.tr/~cesim/bicluster.html Contacts: cesim@khas.edu.tr, melihsozdinler@boun.edu.tr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20733064&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>OpenStructure: A flexible software framework for computational structural biology.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20733063</link>
      <description>Publication Date: 2010 Aug 23 PMID: 20733063&lt;br/&gt;Authors: Biasini, M. - Mariani, V. - Haas, J. - Scheuber, S. - Schenk, A. D. - Schwede, T. - Philippsen, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Developers of new methods in computational structural biology are often hampered in their research by incompatible software tools and non-standardized data formats. To address this problem, we have developed OpenStructure as a modular open source platform to provide a powerful, yet flexible general working environment for structural bioinformatics. OpenStructure consists primarily of a set of libraries written in C++ with a cleanly designed application programmer interface (API). All functionality can be accessed directly in C++ or in a Python layer, meeting both the requirements for high efficiency and ease of use. Powerful selection queries and the notion of entity views to represent these selections greatly facilitate the development and implementation of algorithms on structural data. The modular integration of computational core methods with powerful visualization tools makes OpenStructure an ideal working and development environment. Several applications, such as the latest versions of IPLT and QMean, have been implemented based on OpenStructure - demonstrating its value for the development of next-generation structural biology algorithms. AVAILABILITY: Source code licensed under the GNU lesser general public license (LGPL) and binaries for MacOS X, Linux and Windows are available for download at http://www.openstructure.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20733063&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Annotare - a tool for annotating high-throughput biomedical investigations and resulting data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20733062</link>
      <description>Publication Date: 2010 Aug 23 PMID: 20733062&lt;br/&gt;Authors: Shankar, R. - Parkinson, H. - Burdett, T. - Hastings, E. - Liu, J. - Miller, M. - Srinivasa, R. - White, J. - Brazma, A. - Sherlock, G. - Stoeckert, C. J. Jr - Ball, C. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis. Availability and Implementation: Annotare is available from {{http://code.google.com/p/annotare/}} under the terms of the opensource MIT License ({{http://www.opensource.org/licenses/mit-license.php}}). It has been tested on both Mac and Windows. CONTACT: rshankar@stanford.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20733062&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>GolgiP: prediction of Golgi resident proteins in plants.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20733061</link>
      <description>Publication Date: 2010 Aug 23 PMID: 20733061&lt;br/&gt;Authors: Chou, W. C. - Yin, Y. - Xu, Y.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We present a novel Golgi-prediction server, GolgiP, for computational prediction of both membrane-associated and non-membrane-associated Golgi resident proteins in plants. We have employed a support vector machine-based classification method for prediction of such Golgi proteins, based on three types of information, dipeptide composition, transmembrane domain(s), and functional domain(s) of a protein, where the functional domain information is generated through searching against the Conserved Domains Database (CDD), and the transmembrane domain (TMD) information includes the number of TMDs, the length of TMD, and the number of TMDs at the N-terminus of a protein. Using GolgiP, we have made genome-scale predictions of Golgi resident proteins in 18 plant genomes, and have made the preliminary analysis of the predicted data. AVAILABILITY: The GolgiP web service is publically available at http://csbl1.bmb.uga.edu/GolgiP/ CONTACT: xyn@csbl.bmb.uga.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at the database website and Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20733061&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Discriminatory Function for Prediction of Protein-DNA Interactions based on Alpha Shape Modeling.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20733060</link>
      <description>Publication Date: 2010 Aug 23 PMID: 20733060&lt;br/&gt;Authors: Zhou, W. - Yan, H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Protein-DNA interaction has significant importance in many biological processes. However, the underlying principle of the molecular recognition process is still largely unknown. As more high resolution 3D structures of protein-DNA complex are becoming available, the surface characteristics of the complex become an important research topic. RESULT: In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and developed an interface-atom curvature-dependent conditional probability discriminatory function for the prediction of protein-DNA interaction. The interface-atom curvature-dependent formalism captures atomic interaction details better than the atomic distance based method. The proposed method provides good performance in discriminating the native structures from the docking decoy sets, and outperforms the distance-dependent formalism in terms of the z-score. Computer experiment results show that the curvature-dependent formalism with the optimal parameters can achieve a native z-score of -8.17 in discriminating the native structure from the highest surface-complementarity scored decoy set and a native z-score of -7.38 in discriminating the native structure from the lowest RMSD decoy set. The interface-atom curvature-dependent formalism can also be used to predict apo version of DNA-binding proteins. These results suggest that the interface-atom curvature-dependent formalism has a good prediction capability for protein-DNA interactions. AVAILABILITY: The code and data sets are available for download on http://www.hy8.com/bioinformatics.htm.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20733060&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Module-based prediction approach for robust inter-study predictions in microarray data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20719761</link>
      <description>Publication Date: 2010 Aug 17 PMID: 20719761&lt;br/&gt;Authors: Mi, Z. - Shen, K. - Song, N. - Cheng, C. - Song, C. - Kaminski, N. - Tseng, G. C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. RESULTS: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of interstudy prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real data sets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. AVAILABILITY: http://www.biostat.pitt.edu/bioinfo/ CONTACT: George C. Tseng (ctseng@pitt.edu) SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20719761&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Exchangeable random variables.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20716615</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20716615&lt;br/&gt;Authors: Good, P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20716615&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>ChEA: Transcription Factor Regulation Inferred from Integrating Genome-Wide ChIP-X Experiments.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20709693</link>
      <description>Publication Date: 2010 Aug 13 PMID: 20709693&lt;br/&gt;Authors: Lachmann, A. - Xu, H. - Krishnan, J. - Berger, S. I. - Mazloom, A. R. - Ma'ayan, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Experiments such as ChIP-chip, ChIP-seq, ChIP-PET and DamID (the four methods referred herein as ChIP-X) are used to profile the binding of transcription factors to DNA at a genomewide scale. Such experiments provide hundreds to thousands of potential binding sites for a given transcription factor in proximity to gene coding regions. RESULTS: In order to integrate data from such studies and utilize it for further biological discovery, we collected interactions from such experiments to construct a mammalian ChIP-X database. The database contains 189933 interactions, manually extracted from 87 publications, describing the binding of 92 transcription factors to 31932 target genes. We used the database to analyze mRNA expression data where we perform gene-list enrichment analysis using the ChIP-X database as the prior biological knowledge gene-list library. The system is delivered as a web-based interactive application called ChIP Enrichment Analysis (ChEA). With ChEA, users can input lists of mammalian gene symbols for which the program computes over-representation of transcription factor targets from the ChIP-X database. The ChEA database allowed us to reconstruct an initial network of transcription factors connected based on shared overlapping targets and binding site proximity. To demonstrate the utility of ChEA we present three case studies. We show how by combining the Connectivity Map (CMAP) with ChEA, we can rank pairs of compounds to be used to target specific transcription factor activity in cancer cells. AVAILABILITY: The ChEA software and ChIP-X database is freely available online at: http://amp.pharm.mssm.edu/lib/chea.jsp. CONTACT: avi.maayan@mssm.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20709693&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A cell-based simulation software for multicellular systems.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20709692</link>
      <description>Publication Date: 2010 Aug 13 PMID: 20709692&lt;br/&gt;Authors: Hoehme, S. - Drasdo, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;CellSys is a modular software tool for efficient off-lattice simulation of growth and organization processes in multicellular systems in two and three dimensions. It implements an agent-based model that approximates cells as isotropic, elastic and adhesive objects. Cell migration is modeled by an equation of motion for each cell. The software includes many modules specifically tailored to support the simulation and analysis of virtual tissues including real-time 3D visualization and VRML 2.0 support. All cell and environment parameters can be independently varied which facilitates species specific simulations and allows for detailed analyses of growth dynamics and links between cellular and multicellular phenotypes. AVAILABILITY: CellSys is freely available for non-commercial use at http://msysbio.com/software/cellsys. The current version of CellSys permits the simulation of growing monolayer cultures and avascular tumor spheroids in liquid environment. Further functionality will be made available ongoing with published papers. Emails: hoehme@izbi.uni-leipzig.de*; dirk.drasdo@inria.fr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20709692&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Search and clustering orders of magnitude faster than BLAST.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20709691</link>
      <description>Publication Date: 2010 Aug 12 PMID: 20709691&lt;br/&gt;Authors: Edgar, R. C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. RESULTS: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely-used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. AVAILABILITY: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch. CONTACT: robert@drive5.com.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20709691&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Text Mining Meets Workflow: Linking U-Compare with Taverna.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20709690</link>
      <description>Publication Date: 2010 Aug 12 PMID: 20709690&lt;br/&gt;Authors: Kano, Y. - Dobson, P. - Nakanishi, M. - Tsujii, J. - Ananiadou, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. AVAILABILITY: http://u-compare.org/taverna.html, http://u-compare.org CONTACT: kano@is.s.u-tokyo.ac.jp.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20709690&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>QuantProReloaded: Quantitative Analysis of Microspot Immunoassays.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20709689</link>
      <description>Publication Date: 2010 Aug 12 PMID: 20709689&lt;br/&gt;Authors: Joecker, A. - Sonntag, J. - Henjes, F. - Goetschel, F. - Tresch, A. - Beissbarth, T. - Wiemann, S. - Korf, U.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Protein microarrays are well-established as sensitive tools for proteomics. Particularly, the microspot immunoassay platform enables a quantitative analysis of (phospho-) proteins in complex solutions (e.g. cell lysates or blood plasma) and with low consumption of samples and reagents. Despite numerous biological and clinical applications of microspot immunoassays there is currently no user-friendly open source data analysis software available with versatile options for data analysis and data visualization. Here we introduce the open source software QuantProReloaded that is specifically designed for the analysis of data from MIA experiments. Availability and Implementation: QuantProReloaded is written in R and Java and is open for download under the BSB license at http://code.google.com/p/quantproreloaded/ CONTACT: Anika Joecker (a.joecker@dkfz.de) SUPPLEMENTARY INFORMATION: Further information about confidence intervals and uncertainties of the estimated concentrations as well as example time-course data including the experimental protocol are available as part of the supplementary information.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20709689&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702402</link>
      <description>Publication Date: 2010 Aug 10 PMID: 20702402&lt;br/&gt;Authors: Yang, T. P. - Beazley, C. - Montgomery, S. B. - Dimas, A. S. - Gutierrez-Arcelus, M. - Stranger, B. E. - Deloukas, P. - Dermitzakis, E. T.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate eQTL associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the internet via web services protocols. AVAILABILITY: http://www.sanger.ac.uk/resources/software/genevar CONTACT: emmanouil.dermitzakis@unige.ch.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702402&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MetDAT: A modular and workflow-based free online pipeline for metabolite data processing, analysis and interpretation.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702401</link>
      <description>Publication Date: 2010 Aug 11 PMID: 20702401&lt;br/&gt;Authors: Biswas, A. - Mynampati, K. C. - Umashankar, S. - Reuben, S. - Parab, G. - Rao, R. - Kannan, V. S. - Swarup, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Analysis of high throughput metabolomics experiments is a resource-intensive process that includes pre-processing, pretreatment, and post-processing at each level of experimental hierarchy. We have developed interactive experimental design-driven and user-friendly online software called Metabolite Data Analysis Tool (MetDAT). It offers a pipeline of tools for file handling, data preprocessing, univariate and multivariate statistical analyses, database searching, and pathway mapping. Outputs are produced in the form of text and high-quality images in real-time. MetDAT allows users to combine data management and experiment-centric workflows for optimization of metabolomics methods and metabolite analysis. AVAILABILITY: MetDAT is available free for academic use from http://smbl.nus.edu.sg/METDAT2/. CONTACT: sanjay@nus.edu.sg.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702401&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Identifying informative subsets of the Gene Ontology with information bottleneck methods.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702400</link>
      <description>Publication Date: 2010 Aug 11 PMID: 20702400&lt;br/&gt;Authors: Jin, B. - Lu, X.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The Gene Ontology (GO) is a controlled vocabulary designed to represent the biological concepts pertaining to gene products. This study investigates the methods for identifying informative subsets of GO terms in an automatic and objective fashion. This task in turn requires addressing the following issues: how to represent the semantic context of GO terms, what metrics are suitable for measuring the semantic differences between terms, how to identify an informative subset that retains as much as possible of the original semantic information of GO. RESULTS: We represented the semantic context of a GO term using the word-usage-profile associated with the term, which enables one to measure the semantic differences between terms based on the differences in their semantic contexts. We further employed the information bottleneck methods to automatically identify subsets of GO terms that retain as much as possible of the semantic information in an annotation database. The automatically retrieved informative subsets align well with an expert-picked GO slim subset, cover important concepts and proteins, and enhance literature-based GO annotation. AVAILABILITY: http://carcweb.musc.edu/TextminingProjects/ CONTACT: lux@musc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702400&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Automated Analysis of Time-Lapse Fluorescence Microscopy Images: From Live Cell Images to Intracellular Foci.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702399</link>
      <description>Publication Date: 2010 Aug 11 PMID: 20702399&lt;br/&gt;Authors: Dzyubachyk, O. - Essers, J. - van Cappellen, W. A. - Baldeyron, C. - Inagaki, A. - Niessen, W. J. - Meijering, E.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Complete, accurate, and reproducible analysis of intracellular foci from fluorescence microscopy image sequences of live cells requires full automation of all processing steps involved: cell segmentation and tracking followed by foci segmentation and pattern analysis. Integrated systems for this purpose are lacking. RESULTS: Extending our previous work in cell segmentation and tracking, we developed a new system for performing fully automated analysis of fluorescent foci in single cells. The system was validated by applying it to two common tasks: intracellular foci counting (in DNA damage repair experiments) and cell phase identification based on foci pattern analysis (in DNA replication experiments). Experimental results show that the system performs comparably to expert human observers. Thus, it may replace tedious manual analyses for the considered tasks, and enables high-content screening. Availability and Implementation: The described system was implemented in MATLAB (The MathWorks, Inc., USA) and compiled to run within the MATLAB environment. The routines together with four sample data sets are available at http://celmia.bigr.nl/. The software is planned for public release, free of charge for non-commercial use, after publication of this paper. CONTACT: Erik Meijering (meijering@imagescience.org).&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702399&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>NoiseMaker: Simulated Screens for Statistical Assessment.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702398</link>
      <description>Publication Date: 2010 Aug 11 PMID: 20702398&lt;br/&gt;Authors: Kwan, P. - Birmingham, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: High-throughput screening (HTS) is a common technique for both drug discovery and basic research, but researchers often struggle with how best to derive hits from HTS data. While a wide range of hit identification techniques exist, little information is available about their sensitivity and specificity, especially in comparison to each other. To address this, we have developed the open-source NoiseMaker software tool for generation of realistically noisy virtual screens. By applying potential hit identification methods to NoiseMaker-simulated data and determining how many of the pre-defined true hits are recovered (as well as how many known non-hits are misidentified as hits), one can draw conclusions about the likely performance of these techniques on real data containing unknown true hits. Such simulations apply to a range of screens, such as those using small molecules, siRNAs, shRNAs, miRNA mimics or inhibitors, or gene over-expression; we demonstrate this utility by using it to explain apparently-conflicting reports about the performance of the B score hit identification method. Availability and Implementation: NoiseMaker is written in C#, an ECMA and ISO standard language compilers for multiple operating systems. Source code, a Windows installer, and complete unit tests are available at http://sourceforge.net/projects/noisemaker. Full documentation and support are provided via an extensive help file and tool-tips, and the developers welcome user suggestions. CONTACT: amanda.birmingham@thermofisher.com SUPPLEMENTARY INFORMATION: [Link to be provided.].&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702398&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Bayesian Method for 3-D Macromolecular Structure Inference using Class Average Images from Single Particle Electron Microscopy.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702397</link>
      <description>Publication Date: 2010 Aug 11 PMID: 20702397&lt;br/&gt;Authors: Jaitly, N. - Brubaker, M. A. - Rubinstein, J. L. - Lilien, R. H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Electron Cryo-Microscopy can be used to infer 3-D structures of large macromolecules with high resolution, but the large amounts of data captured necessitate the development of appropriate statistical models to describe the data generation process, and to perform structure inference. We present a new method for performing ab initio inference of the three-dimensional structures of macromolecules from single particle electron cryo-microscopy experiments using class average images. RESULTS: We demonstrate this algorithm on one phantom, one synthetic dataset and three real (experimental) datasets (ATP synthase, V-ATPase and GroEL). Structures consistent with the known structures were inferred for all datasets. AVAILABILITY: The software and source code for this method is available for download from our website: http://compbio.cs.toronto.edu/cryoem/ CONTACT: ndjaitly@cs.toronto.edu, lilien@cs.toronto.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702397&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Cross Species Queries of Large Gene Expression Databases.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702396</link>
      <description>Publication Date: 2010 Aug 11 PMID: 20702396&lt;br/&gt;Authors: Le, H. S. - Oltvai, Z. N. - Bar-Joseph, Z.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Expression databases, including the Gene Expression Omnibus (GEO) and ArrayExpress, have experienced significant growth over the last decade and now hold hundreds of thousands of arrays from multiple species. Since most drugs are initially tested on model organisms, the ability to compare expression experiments across species may help identify pathways that are activated in a similar way in humans and other organisms. However, while several methods exist for finding co-expressed genes in the same species as a query gene, looking at co-expression of homologs or arbitrary genes in other species is challenging. Unlike sequence, which is static, expression is dynamic and changes between tissues, conditions and time. Thus, to carry out cross species analysis using these databases we need methods that can match experiments in one species with experiments in another species. RESULTS: To facilitate queries in large databases we developed a new method for comparing expression experiments from different species. We define a distance metric between the ranking of orthologous genes in the two species. We show how to solve an optimization problem for learning the parameters of this function using a training dataset of known similar expression experiments pairs. The function we learn outperforms previous methods and simpler rank comparison methods that have been used in the past for single species analysis. We used our method to compare millions of array pairs from mouse and human expression experiments. The resulting matches can be used to find functionally related genes, to hypothesize about biological response mechanisms and to highlight conditions and diseases that are activating similar pathways in both species. AVAILABILITY: Supporting methods, results and a Matlab implementation are available from http://sb.cs.cmu.edu/ExpQ/ CONTACT: zivbj@cs.cmu.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702396&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>BDVal: reproducible large scale predictive model development and validation in high-throughput datasets.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702395</link>
      <description>Publication Date: 2010 Aug 11 PMID: 20702395&lt;br/&gt;Authors: Dorff, K. C. - Chambwe, N. - Srdanovic, M. - Campagne, F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: High-throughput data can be used in conjunction with clinical information to develop predictive models. Automating the process of developing, evaluating and testing such predictive models on different datasets would minimize operator errors and facilitate the comparison of different modeling approaches on the same dataset. Complete automation would also yield unambiguous documentation of the process followed to develop each model. We present the BDVal suite of programs that fully automate the construction of predictive classification models from high-throughput data and generate detailed reports about the model construction process. We have used BDVal to construct models from microarray and proteomics data, as well as from DNA-methylation datasets. The programs are designed for scalability and support the construction of thousands of alternative models from a given dataset and prediction task. Availability and Implementation: The BDVal programs are implemented in Java, provided under the GNU General Public License and freely available at http://bdval.campagnelab.org. CONTACT: fac2003@med.cornell.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702395&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>RBSDesigner: software for designing synthetic ribosome binding sites that yield a desired level of protein expression.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20702394</link>
      <description>Publication Date: 2010 Aug 11 PMID: 20702394&lt;br/&gt;Authors: Na, D. - Lee, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: RBSDesigner predicts the translation efficiency of existing mRNA sequences and designs synthetic ribosome binding sites (RBS) for a given coding sequence (CDS) to yield a desired level of protein expression. The program implements the mathematical model for translation initiation described in Na et. al. (BMC Syst. Biol. 4(1):71). The program additionally incorporates the effect on translation efficiency of the spacer length between a Shine-Dalgarno (SD) sequence and an AUG codon, which is crucial for the incorporation of fMet-tRNA into the ribosome. RBSDesigner provides a graphical user interface for the convenient design of synthetic ribosome binding sites. AVAILABILITY: RBSDesigner is written in Python and Microsoft Visual Basic 6.0 and is publicly available as precompiled stand-alone software on the web (http://rbs.kaist.ac.kr). CONTACT: dhlee@kaist.ac.kr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20702394&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>An Alignment-Free Model for Comparison of Regulatory Sequences.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20696736</link>
      <description>Publication Date: 2010 Aug 9 PMID: 20696736&lt;br/&gt;Authors: Koohy, H. - Dyer, N. P. - Reid, J. E. - Koentges, G. - Ott, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Some recent comparative studies have revealed that regulatory regions can retain function over large evolutionary distances, even though the DNA sequences are divergent and difficult to align. It is also known that such enhancers can drive very similar expression patterns. This poses a challenge for the in silico detection of biologically related sequences, as they can only be discovered using alignment-free methods. RESULTS: Here we present a new computational framework called Regulatory Region Scoring model (RRS) for the detection of functional conservation of regulatory sequences using predicted occupancy levels of transcription factors of interest. We demonstrate that our model can detect the functional and/or evolutionary links between some non-alignable enhancers with a strong statistical significance. We also identify groups of enhancers that are likely to be similarly regulated. Our model is motivated by previous work on prediction of expression patterns and it can capture similarity by strong binding sites, weak binding sites, and even the statistically significant absence of sites. Our results support the hypothesis that weak binding sites contribute to the functional similarity of sequences. Our model fills a gap between two families of models: detailed, data-intensive models for the prediction of precise spatio-temporal expression patterns on the one side, and crude, generally applicable models on the other side. Our model borrows some of the strengths of each group and addresses their drawbacks. AVAILABILITY: The RRS source code is freely available upon publication of this manuscript: http://www2.warwick.ac.uk/fac/sci/systemsbiology/staff/ott/tools_and_softw are/rrs. CONTACT: S.Ott@warwick.ac.uk, Hashem.Koohy@warwick.ac.uk.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20696736&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>The misuse of terms in scientific literature.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20696735</link>
      <description>Publication Date: 2010 Aug 9 PMID: 20696735&lt;br/&gt;Authors: Marabotti, A. - Facchiano, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20696735&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MASiVE: Mapping and Analysis of SireVirus Elements in plant genome sequences.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20696734</link>
      <description>Publication Date: 2010 Aug 9 PMID: 20696734&lt;br/&gt;Authors: Darzentas, N. - Bousios, A. - Apostolidou, V. - Tsaftaris, A. S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We present MASiVE, an expertly-built tool for the large-scale, yet sensitive and highly accurate, discovery, preliminary analysis, and insertion age estimation of intact Sirevirus LTR-retrotransposons in plant genomic sequences. Validation was based on the recently available and annotated large maize chromosome one. Results show a considerable improvement in the annotation of Sireviruses, and support our approach as an important addition to the bioinformatics toolbox of plant biologists. AVAILABILITY: PERL source code and essential files are available online at http://bat.ina.certh.gr/tools/masive/. The freely available Vmatch, LTRharvest, Wise2, and MAFFT algorithms are required. CONTACT: ndarz@certh.gr SUPPLEMENTARY INFORMATION: Supplementary Tables and Figures are available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20696734&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>RefProtDom: A Protein Database with Improved Domain Boundaries and Homology Relationships.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20693322</link>
      <description>Publication Date: 2010 Aug 6 PMID: 20693322&lt;br/&gt;Authors: Gonzalez, M. W. - Pearson, W. R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: RefProtDom provides a set of divergent query domains, originally selected from Pfam, and full-length proteins containing their homologous domains, with diverse architectures, for evaluating pair-wise and iterative sequence similarity searches. Pfam homology and domain boundary annotations in the target library were supplemented using local and semi-global searches, PSI-BLAST searches, and SCOP and CATH classifications. AVAILABILITY: RefProtDom is available from http://faculty.virginia.edu/wrpearson/fasta/PUBS/gonzalez09a. CONTACT: pearson@virginia.edu or miledywgonzalez@gmail.com.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20693322&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Association Screening of Common and Rare Genetic Variants by Penalized Regression.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20693321</link>
      <description>Publication Date: 2010 Aug 6 PMID: 20693321&lt;br/&gt;Authors: Zhou, H. - Sehl, M. E. - Sinsheimer, J. S. - Lange, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: This paper extends our recent research on penalized estimation methods in genome-wide association studies to the realm of rare variants. RESULTS: The new strategy is tested on both simulated and real data. Our findings on breast cancer data replicate previous results and shed light on variant effects within genes. AVAILABILITY: Rare variant discovery by group penalized regression is now implemented in the free program Mendel at http://www.genetics.ucla.edu/software/. CONTACT: huazhou@ucla.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20693321&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Gene function prediction using semantic similarity clustering and enrichment analysis in the malaria parasite Plasmodium falciparum.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20693320</link>
      <description>Publication Date: 2010 Aug 6 PMID: 20693320&lt;br/&gt;Authors: Tedder, P. M. - Bradford, J. R. - Needham, C. J. - McConkey, G. A. - Bulpitt, A. J. - Westhead, D. R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Functional genomics data provides a rich source of information that can be used in the annotation of the thousands of genes of unknown function found in most sequenced genomes. However, previous gene function prediction programs are mostly produced for relatively well annotated organisms that often have a large amount of functional genomics data. Here, we present a novel method for predicting gene function that uses clustering of genes by semantic similarity, a naive Bayes classifier and 'enrichment analysis' to predict gene function for a genome that is less well annotated but does has a severe effect on human health, that of the malaria parasite Plasmodium falciparum. RESULTS: Predictions for the molecular function, biological process and cellular component of P. falciparum genes were created from eight different datasets with a combined prediction also being produced. The high confidence predictions produced by the combined prediction were compared to those produced by a simple K-nearest neighbour classifier approach and were shown to improve accuracy and coverage. Finally, two case studies are described which investigate two biological processes in more detail, that of translation initiation and invasion of the host cell. AVAILABILITY: Predictions produced are available at http://www.bioinformatics.leeds.ac.uk/~bio5pmrt/PAGODA.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20693320&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Framework for Oligonucleotide Microarray Preprocessing.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20688976</link>
      <description>Publication Date: 2010 Aug 5 PMID: 20688976&lt;br/&gt;Authors: Carvalho, B. S. - Irizarry, R. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The availability of flexible open source software for the analysis of gene expression raw level data has greatly facilitated the development of widely used preprocessing methods for these technologies. However, the expansion of microarray applications has exposed the limitation of existing tools. RESULTS: We developed the oligo package to provide a more general solution that supports a wide range of applications. The package is based on the BioConductor principles of transparency, reproducibility and efficiency of development. It extends existing tools and leverages existing code for visualization, accessing data, and widely used preprocessing routines. The oligo package implements a unified paradigm for preprocessing data and interfaces with other BioConductor tools for downstream analysis. Our infrastructure is general and can be used by other BioConductor packages. AVAILABILITY: The oligo package is freely available through BioConductor, http://www.bioconductor.org. CONTACT: benilton.carvalho@cancer.org.uk, rafa@jhu.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20688976&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Ultra-Fast FFT Protein Docking On Graphics Processors.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20685958</link>
      <description>Publication Date: 2010 Aug 4 PMID: 20685958&lt;br/&gt;Authors: Ritchie, D. W. - Venkatraman, V.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Modelling protein-protein interactions (PPIs) is an increasingly important aspect of structural bioinformatics. However, predicting PPIs using in silico docking techniques is computationally very expensive. Developing very fast protein docking tools will be useful for studying large-scale PPI networks, and could contribute to the rational design of new drugs. RESULTS: The Hex spherical polar Fourier protein docking algorithm has been implemented on Nvidia graphics processor units (GPUs). On a GTX 285 GPU, an exhaustive and densely sampled six-dimensional (6D) docking search can be calculated in just 15 seconds using multiple one-dimensional (1D) fast Fourier transforms (FFTs). This represents a 45-fold speed-up over the corresponding calculation on a single CPU, being at least two orders of magnitude times faster than a similar CPU calculation using ZDOCK 3.0.1, and estimated to be at least three orders of magnitude faster than the GPU-accelerated version of PIPER on comparable hardware. Hence, for the first time, exhaustive FFT-based protein docking calculations may now be performed in a matter of seconds on a contemporary GPU. Three-dimensional (3D) Hex FFT correlations are also accelerated by the GPU, but the speed-up factor of only 2.5 is much less than that obtained with 1D FFTs. Thus the Hex algorithm appears to be especially well suited to exploit GPUs compared to conventional 3D FFT docking approaches. AVAILABILITY: http://hex.loria.fr/ and http://hexserver.loria.fr/. CONTACT: Dave.Ritchie@loria.fr.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20685958&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>SPRINT: Side-chain Prediction Inference Toolbox for Multistate Protein Design.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20685957</link>
      <description>Publication Date: 2010 Aug 4 PMID: 20685957&lt;br/&gt;Authors: Fromer, M. - Yanover, C. - Harel, A. - Shachar, O. - Weiss, Y. - Linial, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: SPRINT is a software package that performs computational multistate protein design using state-of-the-art inference on probabilistic graphical models. The input to SPRINT is a list of protein structures, the rotamers modeled for each structure, and the pre-calculated rotamer energies. Probabilistic inference is performed using the belief propagation or A* algorithms, and dead-end elimination (DEE) can be applied as pre-processing. The output can either be a list of amino acid sequences simultaneously compatible with these structures, or probabilistic amino acid profiles compatible with the structures. In addition, higher-order (e.g., pairwise) amino acid probabilities can also be predicted. Finally, SPRINT also has a module for protein side-chain prediction and single-state design. AVAILABILITY: The full C++ source code for SPRINT can be freely downloaded from http://www.protonet.cs.huji.ac.il/sprint. CONTACT: fromer@cs.huji.ac.il SUPPLEMENTARY INFORMATION: Available at Bioinformatics online.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20685957&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>DMDM: Domain Mapping of Disease Mutations.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20685956</link>
      <description>Publication Date: 2010 Aug 4 PMID: 20685956&lt;br/&gt;Authors: Peterson, T. A. - Adadey, A. - Santana-Cruz, I. - Sun, Y. - Winder, A. - Kann, M. G.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Domain Mapping of Disease Mutations (DMDM) is a database in which each disease mutation can be displayed by its gene, protein, or domain location. DMDM provides a unique domain-level view where all human coding mutations are mapped on the protein domain. To build DMDM, all human proteins were aligned to a database of conserved protein domains using a Hidden Markov Model-based sequence alignment tool (HMMer). The resulting protein-domain alignments were used to provide a domain location for all available human disease mutations and polymorphisms. The number of disease mutations and polymorphisms in each domain position are displayed alongside other relevant functional information (e.g. the binding and catalytic activity of the site and the conservation of that domain location). DMDM's protein domain view highlights molecular relationships among mutations from different diseases that might not be clearly observed with traditional gene-centric visualization tools. AVAILABILITY: Freely available at http://bioinf.umbc.edu/dmdm. CONTACT: mkann@umbc.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20685956&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>TreesimJ: a flexible, forward time population genetic simulator.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20671150</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20671150&lt;br/&gt;Authors: O'Fallon, B.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Most population genetic simulators fall into one of two classes, backward time simulators that quickly generate trees but accommodate only relatively simple selective and demographic regimes, and forward simulators that allow for a broader range of evolutionary scenarios but which cannot produce genealogies. Thus, few tools are available that allow for producing genealogies under arbitrarily complex selective and demographic models. RESULTS: TreesimJ is a forward time population genetic simulator that allows for sampling of genealogies, genetic data and many population parameters from populations evolving under complex evolutionary scenarios. The application provides many fitness and demographic models and new models are easy to develop. Data collection is performed by a variety of independently configurable collectors which periodically sample the population and record statistics. Output options include writing traces, histograms and summary statistics from the data collectors in addition to sampled genetic sequences and genealogies. SUMMARY: TreesimJ allows researchers to easily sample and analyze gene genealogies and related data from populations evolving under a wide variety of selective and demographic regimes. It is likely to be useful for population genetic researchers seeking to understand the links between evolutionary and demographic forces, genealogical structure and the resulting patterns of genetic variation. AVAILABILITY: TreesimJ home : http://staff.washington.edu/brendano/treesimj. Source and developer resources: http://code.google.com/p/treesimj.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20671150&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>ExpressionView--an interactive viewer for modules identified in gene expression data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20671149</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20671149&lt;br/&gt;Authors: Luscher, A. - Csardi, G. - de Lachapelle, A. M. - Kutalik, Z. - Peter, B. - Bergmann, S.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: ExpressionView is an R package that provides an interactive graphical environment to explore transcription modules identified in gene expression data. A sophisticated ordering algorithm is used to present the modules with the expression in a visually appealing layout that provides an intuitive summary of the results. From this overview, the user can select individual modules and access biologically relevant metadata associated with them. AVAILABILITY: http://www.unil.ch/cbg/ExpressionView. Screenshots, tutorials and sample data sets can be found on the ExpressionView web site.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20671149&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20663846</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20663846&lt;br/&gt;Authors: Ramsey, S. A. - Knijnenburg, T. A. - Kennedy, K. A. - Zak, D. E. - Gilchrist, M. - Gold, E. S. - Johnson, C. D. - Lampano, A. E. - Litvak, V. - Navarro, G. - Stolyar, T. - Aderem, A. - Shmulevich, I.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation. RESULTS: We tested this hypothesis using a multi-evidence approach for predicting binding sites. As a training/test dataset, we used ChIP-Seq-derived TF binding site locations for five TFs in activated murine macrophages. Our model combined TF binding site motif scanning with evidence from sequence-based sources and from HAc ChIP-Seq data, using a weighted sum of thresholded scores. We find that using HAc data significantly improves the performance of motif-based TF binding site prediction. Furthermore, we find that within regions of high HAc, local minima of the HAc ChIP-Seq signal are particularly strongly correlated with TF binding locations. Our model, using motif scanning and HAc local minima, improves the sensitivity for TF binding site prediction by approximately 50% over a model based on motif scanning alone, at a false positive rate cutoff of 0.01. AVAILABILITY: The data and software source code for model training and validation are freely available online at http://magnet.systemsbiology.net/hac.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20663846&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Prophossi: automating expert validation of phosphopeptide-spectrum matches from tandem mass spectrometry.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20651112</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20651112&lt;br/&gt;Authors: Martin, D. M. - Nett, I. R. - Vandermoere, F. - Barber, J. D. - Morrice, N. A. - Ferguson, M. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Complex patterns of protein phosphorylation mediate many cellular processes. Tandem mass spectrometry (MS/MS) is a powerful tool for identifying these post-translational modifications. In high-throughput experiments, mass spectrometry database search engines, such as MASCOT provide a ranked list of peptide identifications based on hundreds of thousands of MS/MS spectra obtained in a mass spectrometry experiment. These search results are not in themselves sufficient for confident assignment of phosphorylation sites as identification of characteristic mass differences requires time-consuming manual assessment of the spectra by an experienced analyst. The time required for manual assessment has previously rendered high-throughput confident assignment of phosphorylation sites challenging. RESULTS: We have developed a knowledge base of criteria, which replicate expert assessment, allowing more than half of cases to be automatically validated and site assignments verified with a high degree of confidence. This was assessed by comparing automated spectral interpretation with careful manual examination of the assignments for 501 peptides above the 1% false discovery rate (FDR) threshold corresponding to 259 putative phosphorylation sites in 74 proteins of the Trypanosoma brucei proteome. Despite this stringent approach, we are able to validate 80 of the 91 phosphorylation sites (88%) positively identified by manual examination of the spectra used for the MASCOT searches with a FDR &lt; 15%. Conclusions:High-throughput computational analysis can provide a viable second stage validation of primary mass spectrometry database search results. Such validation gives rapid access to a systems level overview of protein phosphorylation in the experiment under investigation. AVAILABILITY: A GPL licensed software implementation in Perl for analysis and spectrum annotation is available in the supplementary material and a web server can be assessed online at http://www.compbio.dundee.ac.uk/prophossi.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20651112&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Predictive models for population performance on real biological fitness landscapes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20639542</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20639542&lt;br/&gt;Authors: Rowe, W. - Wedge, D. C. - Platt, M. - Kell, D. B. - Knowles, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Directed evolution, in addition to its principal application of obtaining novel biomolecules, offers significant potential as a vehicle for obtaining useful information about the topologies of biomolecular fitness landscapes. In this article, we make use of a special type of model of fitness landscapes-based on finite state machines-which can be inferred from directed evolution experiments. Importantly, the model is constructed only from the fitness data and phylogeny, not sequence or structural information, which is often absent. The model, called a landscape state machine (LSM), has already been used successfully in the evolutionary computation literature to model the landscapes of artificial optimization problems. Here, we use the method for the first time to simulate a biological fitness landscape based on experimental evaluation. RESULTS: We demonstrate in this study that LSMs are capable not only of representing the structure of model fitness landscapes such as NK-landscapes, but also the fitness landscape of real DNA oligomers binding to a protein (allophycocyanin), data we derived from experimental evaluations on microarrays. The LSMs prove adept at modelling the progress of evolution as a function of various controlling parameters, as validated by evaluations on the real landscapes. Specifically, the ability of the model to 'predict' optimal mutation rates and other parameters of the evolution is demonstrated. A modification to the standard LSM also proves accurate at predicting the effects of recombination on the evolution.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20639542&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>BigWig and BigBed: enabling browsing of large distributed datasets.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20639541</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20639541&lt;br/&gt;Authors: Kent, W. J. - Zweig, A. S. - Barber, G. - Hinrichs, A. S. - Karolchik, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets. AVAILABILITY AND IMPLEMENTATION: Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/. Source code for the creation and visualization software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. The UCSC Genome Browser is available at http://genome.ucsc.edu.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20639541&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>RPPanalyzer: Analysis of reverse-phase protein array data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20634205</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20634205&lt;br/&gt;Authors: Mannsperger, H. A. - Gade, S. - Henjes, F. - Beissbarth, T. - Korf, U.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: RPPanalyzer is a statistical tool developed to read reverse-phase protein array data, to perform the basic data analysis and to visualize the resulting biological information. The R-package provides different functions to compare protein expression levels of different samples and to normalize the data. Implemented plotting functions permit a quality control by monitoring data distribution and signal validity. Finally, the data can be visualized in heatmaps, boxplots, time course plots and correlation plots. RPPanalyzer is a flexible tool and tolerates a huge variety of different experimental designs. AVAILABILITY: The RPPAanalyzer is open source and freely available as an R-Package on the CRAN platform http://cran.r-project.org/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20634205&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>PriorsEditor: a tool for the creation and use of positional priors in motif discovery.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20628076</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20628076&lt;br/&gt;Authors: Klepper, K. - Drablos, F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Computational methods designed to discover transcription factor binding sites in DNA sequences often have a tendency to make a lot of false predictions. One way to improve accuracy in motif discovery is to rely on positional priors to focus the search to parts of a sequence that are considered more likely to contain functional binding sites. We present here a program called PriorsEditor that can be used to create such positional priors tracks based on a combination of several features, including phylogenetic conservation, nucleosome occupancy, histone modifications, physical properties of the DNA helix and many more. AVAILABILITY: PriorsEditor is available as a web start application and downloadable archive from http://tare.medisin.ntnu.no/priorseditor (requires Java 1.6). The web site also provides tutorials, screenshots and example protocol scripts.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20628076&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Logic Forest: an ensemble classifier for discovering logical combinations of binary markers.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20628070</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20628070&lt;br/&gt;Authors: Wolf, B. J. - Hill, E. G. - Slate, E. H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Highly sensitive and specific screening tools may reduce disease -related mortality by enabling physicians to diagnose diseases in asymptomatic patients or at-risk individuals. Diagnostic tests based on multiple biomarkers may achieve the needed sensitivity and specificity to realize this clinical gain. RESULTS: Logic regression, a multivariable regression method predicting an outcome using logical combinations of binary predictors, yields interpretable models of the complex interactions in biologic systems. However, its performance degrades in noisy data. We extend logic regression for classification to an ensemble of logic trees (Logic Forest, LF). We conduct simulation studies comparing the ability of logic regression and LF to identify variable interactions predictive of disease status. Our findings indicate LF is superior to logic regression for identifying important predictors. We apply our method to single nucleotide polymorphism data to determine associations of genetic and health factors with periodontal disease. AVAILABILITY: LF code is publicly available on CRAN, http://cran.r-project.org/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20628070&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20624780</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20624780&lt;br/&gt;Authors: Arunachalam, M. - Jayasurya, K. - Tomancak, P. - Ohler, U.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Evolutionarily conserved non-coding genomic sequences represent a potentially rich source for the discovery of gene regulatory region such as transcriptional enhancers. However, detecting orthologous enhancers using alignment-based methods in higher eukaryotic genomes is particularly challenging, as regulatory regions can undergo considerable sequence changes while maintaining their functionality. RESULTS: We have developed an alignment-free method which identifies conserved enhancers in multiple diverged species. Our method is based on similarity metrics between two sequences based on the co-occurrence of sequence patterns regardless of their order and orientation, thus tolerating sequence changes observed in non-coding evolution. We show that our method is highly successful in detecting orthologous enhancers in distantly related species without requiring additional information such as knowledge about transcription factors involved, or predicted binding sites. By estimating the significance of similarity scores, we are able to discriminate experimentally validated functional enhancers from seemingly equally conserved candidates without function. We demonstrate the effectiveness of this approach on a wide range of enhancers in Drosophila, and also present encouraging results to detect conserved functional regions across large evolutionary distances. Our work provides encouraging steps on the way to ab initio unbiased enhancer prediction to complement ongoing experimental efforts. AVAILABILITY: The software, data and the results used in this article are available at http://www.genome.duke.edu/labs/ohler/research/transcription/fly_enhancer/ .&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20624780&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Dealing with sparse data in predicting outcomes of HIV combination therapies.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20624779</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20624779&lt;br/&gt;Authors: Bogojeska, J. - Bickel, S. - Altmann, A. - Lengauer, T.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: As there exists no cure or vaccine for the infection with human immunodeficiency virus (HIV), the standard approach to treating HIV patients is to repeatedly administer different combinations of several antiretroviral drugs. Because of the large number of possible drug combinations, manually finding a successful regimen becomes practically impossible. This presents a major challenge for HIV treatment. The application of machine learning methods for predicting virological responses to potential therapies is a possible approach to solving this problem. However, due to evolving trends in treating HIV patients the available clinical datasets have a highly unbalanced representation, which might negatively affect the usefulness of derived statistical models. RESULTS: This article presents an approach that tackles the problem of predicting virological response to combination therapies by learning a separate logistic regression model for each therapy. The models are fitted by using not only the data from the target therapy but also the information from similar therapies. For this purpose, we introduce and evaluate two different measures of therapy similarity. The models are also able to incorporate phenotypic knowledge on the therapy outcomes through a Gaussian prior. With our approach we balance the uneven therapy representation in the datasets and produce higher quality models for therapies with very few training samples. According to the results from the computational experiments our therapy similarity model performs significantly better than training separate models for each therapy by using solely their examples. Furthermore, the model's performance is as good as an approach that encodes therapy information in the input feature space with the advantage of delivering better results for therapies with very few training samples. AVAILABILITY: Code of the efficient logistic regression is available from http://www.mpi-inf.mpg.de/%7Ejasmina/fastLogistic.zip.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20624779&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>First insight into the prediction of protein folding rate change upon point mutation.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20616385</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20616385&lt;br/&gt;Authors: Huang, L. T. - Gromiha, M. M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: The accurate prediction of protein folding rate change upon mutation is an important and challenging problem in protein folding kinetics and design. In this work, we have collected experimental data on protein folding rate change upon mutation from various sources and constructed a reliable and non-redundant dataset with 467 mutants. These mutants are widely distributed based on secondary structure, solvent accessibility, conservation score and long-range contacts. From systematic analysis of these parameters along with a set of 49 amino acid properties, we have selected a set of 12 features for discriminating the mutants that speed up or slow down the folding process. We have developed a method based on quadratic regression models for discriminating the accelerating and decelerating mutants, which showed an accuracy of 74% using the 10-fold cross-validation test. The sensitivity and specificity are 63% and 76%, respectively. The method can be improved with the inclusion of physical interactions and structure-based parameters. AVAILABILITY: http://bioinformatics.myweb.hinet.net/freedom.htm.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20616385&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Ontogenomic study of the relationship between number of gene splice variants and GO categorization.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20616384</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20616384&lt;br/&gt;Authors: Kahn, A. B. - Zeeberg, B. R. - Ryan, M. C. - Jamison, D. C. - Rockoff, D. M. - Pommier, Y. - Weinstein, J. N.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Splice variation plays important roles in evolution and cancer. Different splice variants of a gene may be characteristic of particular cellular processes, subcellular locations or organs. Although several genomic projects have identified splice variants, there have been no large-scale computational studies of the relationship between number of splice variants and biological function. The Gene Ontology (GO) and tools for leveraging GO, such as GoMiner, now make such a study feasible. RESULTS: We partitioned genes into two groups: those with numbers of splice variants &lt;or=b and &gt;b (b=1,..., 10). Then we used GoMiner to determine whether any GO categories are enriched in genes with particular numbers of splice variants. Since there was no a priori 'appropriate' partition boundary, we studied those 'robust' categories whose enrichment did not depend on the selection of a particular partition boundary. Furthermore, because the distribution of splice variant number was a snapshot taken at a particular point in time, we confirmed that those observations were stable across successive builds of GenBank. A small number of categories were found for genes in the lower partitions. A larger number of categories were found for genes in the higher partitions. Those categories were largely associated with cell death and signal transduction. Apoptotic genes tended to have a large repertoire of splice variants, and genes with splice variants exhibited a distinctive 'apoptotic island' in clustered image maps (CIMs). AVAILABILITY: Supplementary tables and figures are available at URL http://discover.nci.nih.gov/OG/supplementaryMaterials.html. The Safari browser appears to perform better than Firefox for these particular items.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20616384&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20616383</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20616383&lt;br/&gt;Authors: Macas, J. - Neumann, P. - Novak, P. - Jiang, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20616383&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>METAL: fast and efficient meta-analysis of genomewide association scans.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20616382</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20616382&lt;br/&gt;Authors: Willer, C. J. - Li, Y. - Abecasis, G. R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: METAL provides a computationally efficient tool for meta-analysis of genome-wide association scans, which is a commonly used approach for improving power complex traits gene mapping studies. METAL provides a rich scripting interface and implements efficient memory management to allow analyses of very large data sets and to support a variety of input file formats. AVAILABILITY AND IMPLEMENTATION: METAL, including source code, documentation, examples, and executables, is available at http://www.sph.umich.edu/csg/abecasis/metal/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20616382&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A system-level investigation into the cellular toxic response mechanism mediated by AhR signal transduction pathway.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20610613</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20610613&lt;br/&gt;Authors: Gim, J. - Kim, H. S. - Kim, J. - Choi, M. - Kim, J. R. - Chung, Y. J. - Cho, K. H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Viewing a cellular system as a collection of interacting parts can lead to new insights into the complex cellular behavior. In this study, we have investigated aryl hydrocarbon receptor (AhR) signal transduction pathway from such a system-level perspective. AhR detects various xenobiotics, such as drugs or endocrine disruptors (e.g. dioxin), and mediates transcriptional regulation of target genes such as those in the cytochrome P450 (CYP450) family. On binding with 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), however, AhR becomes abnormally activated and conveys toxic effects on cells. Despite many related studies on the TCDD-mediated toxicity, quantitative system-level understanding of how TCDD-mediated toxicity generates various toxic responses is still lacking. RESULTS: Here, we present a manually curated TCDD-mediated AhR signaling pathway including crosstalks with the hypoxia pathway that copes with oxygen deficiency and the p53 pathway that induces a DNA damage response. Based on the integrated pathway, we have constructed a mathematical model and validated it through quantitative experiments. Using the mathematical model, we have investigated: (i) TCDD dose-dependent effects on AhR target genes; (ii) the crosstalk effect between AhR and hypoxia signals; and (iii) p53 inhibition effect of TCDD-liganded AhR. Our results show that cellular intake of TCDD induces AhR signaling pathway to be abnormally up-regulated and thereby interrupts other signaling pathways. Interruption of hypoxia and p53 pathways, in turn, can incur various hazardous effects on cells. Taken together, our study provides a system-level understanding of how AhR signal mediates various TCDD-induced toxicities under the presence of hypoxia and/or DNA damage in cells.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20610613&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>ROAST: rotation gene set tests for complex microarray experiments.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20610611</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20610611&lt;br/&gt;Authors: Wu, D. - Lim, E. - Vaillant, F. - Asselin-Labat, M. L. - Visvader, J. E. - Smyth, G. K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: A gene set test is a differential expression analysis in which a P-value is assigned to a set of genes as a unit. Gene set tests are valuable for increasing statistical power, organizing and interpreting results and for relating expression patterns across different experiments. Existing methods are based on permutation. Methods that rely on permutation of probes unrealistically assume independence of genes, while those that rely on permutation of sample are suitable only for two-group comparisons with a good number of replicates in each group. RESULTS: We present ROAST, a statistically rigorous gene set test that allows for gene-wise correlation while being applicable to almost any experimental design. Instead of permutation, ROAST uses rotation, a Monte Carlo technology for multivariate regression. Since the number of rotations does not depend on sample size, ROAST gives useful results even for experiments with minimal replication. ROAST allows for any experimental design that can be expressed as a linear model, and can also incorporate array weights and correlated samples. ROAST can be tuned for situations in which only a subset of the genes in the set are actively involved in the molecular pathway. ROAST can test for uni- or bi-direction regulation. Probes can also be weighted to allow for prior importance. The power and size of the ROAST procedure is demonstrated in a simulation study, and compared to that of a representative permutation method. Finally, ROAST is used to test the degree of transcriptional conservation between human and mouse mammary stems. AVAILABILITY: ROAST is implemented as a function in the Bioconductor package limma available from www.bioconductor.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20610611&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Genome-wide functional element detection using pairwise statistical alignment outperforms multiple genome footprinting techniques.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20610610</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20610610&lt;br/&gt;Authors: Satija, R. - Hein, J. - Lunter, G. A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Comparative genomic sequence analysis is a powerful approach for identifying putative functional elements in silico. The availability of full-genome sequences from many vertebrate species has resulted in the development of popular tools, for example, the phastCons software package that search large numbers of genomes to identify conserved elements. While phastCons can analyze many genomes simultaneously, it ignores potentially informative insertion and deletion events and relies on a fixed, precomputed multiple sequence alignment. RESULTS: We have developed a new method, GRAPeFoot, which simultaneously aligns two full genomes and annotates a set of conserved regions exhibiting reduced rates of insertion, deletion and substitution mutations. We tested GRAPeFoot using the human and mouse genomes and compared its performance to a set of phastCons predictions hosted on the UCSC genome browser. Our results demonstrate that despite the use of only two genomes, GRAPeFoot identified constrained elements at rates comparable with phastCons, which analyzed data from 28 vertebrate genomes. This study demonstrates how integrated modelling of substitutions, indels and purifying selection allows a pairwise analysis to exhibit a sensitivity similar to a heuristic analysis of many genomes. AVAILABILITY: The GRAPeFoot software and set of genome-wide functional element predictions are freely available to download online at http://www.stats.ox.ac.uk/ approximately satija/GRAPeFoot/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20610610&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>CNVineta: a data mining tool for large case-control copy number variation datasets.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20605930</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20605930&lt;br/&gt;Authors: Wittig, M. - Helbig, I. - Schreiber, S. - Franke, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Copy number variation (CNV), a major contributor to human genetic variation, comprises &gt;/= 1 kb genomic deletions and insertions. Yet, the identification of CNVs from microarray data is still hampered by high false negative and positive prediction rates due to the noisy nature of the raw data. Here, we present CNVineta, an R package for rapid data mining and visualization of CNVs in large case-control datasets genotyped with single nucleotide polymorphism oligonucleotide arrays. CNVineta is compatible with various established CNV prediction algorithms, can be used for genome-wide association analysis of rare and common CNVs and enables rapid and serial display of log(2) of raw data ratios as well as B-allele frequencies for visual quality inspection. In summary, CNVineta aides in the interpretation of large-scale CNV datasets and prioritization of target regions for follow-up experiments. Availability and Implementation: CNVineta is available as an R package and can be downloaded from http://www.ikmb.uni-kiel.de/CNVineta/; the package contains a tutorial outlining a typical workflow. The CNVineta compatible HapMap dataset can also be downloaded from the link above.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20605930&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MamPhEA: a web tool for mammalian phenotype enrichment analysis.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20605928</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20605928&lt;br/&gt;Authors: Weng, M. P. - Liao, B. Y.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: MamPhEA is a web application dedicated to understanding functional properties of mammalian gene sets based on mouse-mutant phenotypes. It allows users to conduct enrichment analysis on predefined or user-defined phenotypes, gives users the option to specify phenotypes derived from null mutations, produces easily comprehensible results and supports analyses on genes of all mammalian species with a fully sequenced genome. AVAILABILITY: http://evol.nhri.org.tw/MamPhEA/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20605928&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>TRANSWESD: inferring cellular networks with transitive reduction.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20605927</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20605927&lt;br/&gt;Authors: Klamt, S. - Flassig, R. J. - Sundmacher, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Distinguishing direct from indirect influences is a central issue in reverse engineering of biological networks because it facilitates detection and removal of false positive edges. Transitive reduction is one approach for eliminating edges reflecting indirect effects but its use in reconstructing cyclic interaction graphs with true redundant structures is problematic. RESULTS: We present TRANSWESD, an elaborated variant of TRANSitive reduction for WEighted Signed Digraphs that overcomes conceptual problems of existing versions. Major changes and improvements concern: (i) new statistical approaches for generating high-quality perturbation graphs from systematic perturbation experiments; (ii) the use of edge weights (association strengths) for recognizing true redundant structures; (iii) causal interpretation of cycles; (iv) relaxed definition of transitive reduction; and (v) approximation algorithms for large networks. Using standardized benchmark tests, we demonstrate that our method outperforms existing variants of transitive reduction and is, despite its conceptual simplicity, highly competitive with other reverse engineering methods.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20605927&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20605926</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20605926&lt;br/&gt;Authors: Pahl, R. - Schafer, H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: In genome-wide association studies (GWAS) examining hundreds of thousands of genetic markers, the potentially high number of false positive findings requires statistical correction for multiple testing. Permutation tests are considered the gold standard for multiple testing correction in GWAS, because they simultaneously provide unbiased type I error control and high power. At the same time, they demand heavy computational effort, especially with large-scale datasets of modern GWAS. In recent years, the computational problem has been circumvented by using approximations to permutation tests, which, however, may be biased. RESULTS: We have tackled the original computational problem of permutation testing in GWAS and herein present a permutation test algorithm one or more orders of magnitude faster than existing implementations, which enables efficient permutation testing on a genome-wide scale. Our algorithm does not rely on any kind of approximation and hence produces unbiased results identical to a standard permutation test. A noteworthy feature of our algorithm is a particularly effective performance when analyzing high-density marker sets. AVAILABILITY: Freely available on the web at http://www.permory.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20605926&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>G-SQZ: compact encoding of genomic sequence and quality data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20605925</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20605925&lt;br/&gt;Authors: Tembe, W. - Lowey, J. - Suh, E.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Large volumes of data generated by high-throughput sequencing instruments present non-trivial challenges in data storage, content access and transfer. We present G-SQZ, a Huffman coding-based sequencing-reads-specific representation scheme that compresses data without altering the relative order. G-SQZ has achieved from 65% to 81% compression on benchmark datasets, and it allows selective access without scanning and decoding from start. This article focuses on describing the underlying encoding scheme and its software implementation, and a more theoretical problem of optimal compression is out of scope. The immediate practical benefits include reduced infrastructure and informatics costs in managing and analyzing large sequencing data. AVAILABILITY: http://public.tgen.org/sqz. Academic/non-profit: Source: available at no cost under a non-open-source license by requesting from the web-site; Binary: available for direct download at no cost. For-Profit: Submit request for for-profit license from the web-site.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20605925&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Modeling and analyzing complex biological networks incooperating experimental information on both network topology and stable states.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20601441</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20601441&lt;br/&gt;Authors: Zou, Y. M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Linking the topology of a complex network to its long-term behavior is a basic problem in network theory, which has been on the focus of many recent research publications. To obtain a suitable Boolean model for a biological system, one must analyze the initial model and compare it with other experimental evidence, and if necessary, make adjustments by changing the topology of the wiring diagram. However, our knowledge on how to link the topology of a network to its long-term behavior is very limited due to the complexity of the problem. Since the need to consider complex biological networks has become ever greater, develop both theoretical foundation and algorithms for model selection and analysis has been brought to the forefront of biological network study. RESULTS: This article proposes a novel method to study intrinsically the relationship between experimental data and the possible Boolean networks, which can be used to model the underlying system. Simple and easy to use criteria for a Boolean network to have both a given network topology and a given set of stable states are derived. These criteria can be used to guide the selection of a Boolean network model for the system, as well as to gain information on the intrinsic properties, such as the robustness and the evolvability, of the system. A Boolean model for the fruit fly Drosophila melanogaster is used to explain the method.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20601441&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Rapid match-searching for gene silencing assessment.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20601440</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20601440&lt;br/&gt;Authors: Horn, M. E. - Waterhouse, P. M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Gene silencing, also called RNA interference, requires reliable assessment of silencer impacts. A critical task is to find matches between silencer oligomers and sites in the genome, in accordance with one-to-many matching rules (G-U matching, with provision for mismatches). Fast search algorithms are required to support silencer impact assessments in procedures for designing effective silencer sequences. RESULTS: The article presents a matching algorithm and data structures specialized for matching searches, including a kernel procedure that addresses a Boolean version of the database task called the skyline search. Besides exact matches, the algorithm is extended to allow for the location-specific mismatches applicable in plants. Computational tests show that the algorithm is significantly faster than suffix-tree alternatives. AVAILABILITY: Source code, executable, data and test results are freely available at ftp://ftp.csiro.au/Horn/RapidMatch.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20601440&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>mbmdr: an R package for exploring gene-gene interactions associated with binary or quantitative traits.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20595460</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20595460&lt;br/&gt;Authors: Calle, M. L. - Urrea, V. - Malats, N. - Van Steen, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: We describe mbmdr, an R package for implementing the model-based multifactor dimensionality reduction (MB-MDR) method. MB-MDR has been proposed by Calle et al. as a dimension reduction method for exploring gene-gene interactions in case-control association studies. It is an extension of the popular multifactor dimensionality reduction (MDR) method of Ritchie et al. allowing a more flexible definition of risk cells. In MB-MDR, risk categories are defined using a regression model which allows adjustment for covariates and main effects and, in addition to the classical low risk and high risk categories, MB-MDR considers a third category of indeterminate or not informative cells. An important improvement added to the current mbmdr algorithm with respect to the original MB-MDR formulation in Calle et al. and also to the classical MDR approach, is the extension of the methodology to different outcome types. While MB-MDR was initially proposed for binary traits in the context of case-control studies, the mbmdr package provides options to analyze both binary or quantitative traits for unrelated individuals. AVAILABILITY: http://cran.r-project.org/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20595460&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Biological knowledge bases using Wikis: combining the flexibility of Wikis with the structure of databases.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20591906</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20591906&lt;br/&gt;Authors: Brohee, S. - Barriot, R. - Moreau, Y.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: In recent years, the number of knowledge bases developed using Wiki technology has exploded. Unfortunately, next to their numerous advantages, classical Wikis present a critical limitation: the invaluable knowledge they gather is represented as free text, which hinders their computational exploitation. This is in sharp contrast with the current practice for biological databases where the data is made available in a structured way. Here, we present WikiOpener an extension for the classical MediaWiki engine that augments Wiki pages by allowing on-the-fly querying and formatting resources external to the Wiki. Those resources may provide data extracted from databases or DAS tracks, or even results returned by local or remote bioinformatics analysis tools. This also implies that structured data can be edited via dedicated forms. Hence, this generic resource combines the structure of biological databases with the flexibility of collaborative Wikis. AVAILABILITY: The source code and its documentation are freely available on the MediaWiki website: http://www.mediawiki.org/wiki/Extension:WikiOpener.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20591906&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20591905</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20591905&lt;br/&gt;Authors: Johannes, M. - Brase, J. C. - Frohlich, H. - Gade, S. - Gehrmann, M. - Falth, M. - Sultmann, H. - Beissbarth, T.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: One of the main goals of high-throughput gene-expression studies in cancer research is to identify prognostic gene signatures, which have the potential to predict the clinical outcome. It is common practice to investigate these questions using classification methods. However, standard methods merely rely on gene-expression data and assume the genes to be independent. Including pathway knowledge a priori into the classification process has recently been indicated as a promising way to increase classification accuracy as well as the interpretability and reproducibility of prognostic gene signatures. RESULTS: We propose a new method called Reweighted Recursive Feature Elimination. It is based on the hypothesis that a gene with a low fold-change should have an increased influence on the classifier if it is connected to differentially expressed genes. We used a modified version of Google's PageRank algorithm to alter the ranking criterion of the SVM-RFE algorithm. Evaluations of our method on an integrated breast cancer dataset comprising 788 samples showed an improvement of the area under the receiver operator characteristic curve as well as in the reproducibility and interpretability of selected genes. AVAILABILITY: The R code of the proposed algorithm is given in Supplementary Material.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20591905&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20591904</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20591904&lt;br/&gt;Authors: Ewing, G. - Hermisson, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: We have implemented a coalescent simulation program for a structured population with selection at a single diploid locus. The program includes the functionality of the simulator ms to model population structure and demography, but adds a model for deme- and time-dependent selection using forward simulations. The program can be used, e.g. to study hard and soft selective sweeps in structured populations or the genetic footprint of local adaptation. The implementation is designed to be easily extendable and widely deployable. The interface and output format are compatible with ms. Performance is comparable even with selection included. AVAILABILITY: The program is freely available from http://www.mabs.at/ewing/msms/ along with manuals and examples. The source is freely available under a GPL type license.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20591904&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Mining metabolic pathways through gene expression.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20587705</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20587705&lt;br/&gt;Authors: Hancock, T. - Takigawa, I. - Mamitsuka, H.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: An observed metabolic response is the result of the coordinated activation and interaction between multiple genetic pathways. However, the complex structure of metabolism has meant that a compete understanding of which pathways are required to produce an observed metabolic response is not fully understood. In this article, we propose an approach that can identify the genetic pathways which dictate the response of metabolic network to specific experimental conditions. RESULTS: Our approach is a combination of probabilistic models for pathway ranking, clustering and classification. First, we use a non-parametric pathway extraction method to identify the most highly correlated paths through the metabolic network. We then extract the defining structure within these top-ranked pathways using both Markov clustering and classification algorithms. Furthermore, we define detailed node and edge annotations, which enable us to track each pathway, not only with respect to its genetic dependencies, but also allow for an analysis of the interacting reactions, compounds and KEGG sub-networks. We show that our approach identifies biologically meaningful pathways within two microarray expression datasets using entire KEGG metabolic networks. Availability and implementation: An R package containing a full implementation of our proposed method is currently available from http://www.bic.kyoto-u.ac.jp/pathway/timhancock.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20587705&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Over-optimism in bioinformatics: an illustration.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20581402</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20581402&lt;br/&gt;Authors: Jelizarow, M. - Guillemot, V. - Tenenhaus, A. - Strimmer, K. - Boulesteix, A. L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: In statistical bioinformatics research, different optimization mechanisms potentially lead to 'over-optimism' in published papers. So far, however, a systematic critical study concerning the various sources underlying this over-optimism is lacking. RESULTS: We present an empirical study on over-optimism using high-dimensional classification as example. Specifically, we consider a 'promising' new classification algorithm, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. While this approach yields poor results in terms of error rate, we quantitatively demonstrate that it can artificially seem superior to existing approaches if we 'fish for significance'. The investigated sources of over-optimism include the optimization of datasets, of settings, of competing methods and, most importantly, of the method's characteristics. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should always be demonstrated on independent validation data. AVAILABILITY: The R codes and relevant data can be downloaded from http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professure n/boulesteix/overoptimism/, such that the study is completely reproducible.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20581402&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Deciphering subcellular processes in live imaging datasets via dynamic probabilistic networks.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20581401</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20581401&lt;br/&gt;Authors: Letinic, K. - Sebastian, R. - Barthel, A. - Toomre, D.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Designing mathematical tools that can formally describe the dynamics of complex intracellular processes remains a challenge. Live cell imaging reveals changes in the cellular states, but current simple approaches extract only minimal information of a static snapshot. RESULTS: We implemented a novel approach for analyzing organelle behavior in live cell imaging data based on hidden Markov models (HMMs) and showed that it can determine the number and evolution of distinct cellular states involved in a biological process. We analyzed insulin-mediated exocytosis of single Glut4-vesicles, a process critical for blood glucose homeostasis and impaired in type II diabetes, by using total internal reflection fluorescence microscopy (TIRFM). HMM analyses of movie sequences of living cells reveal that insulin controls spatial and temporal dynamics of exocytosis via the exocyst, a putative tethering protein complex. Our studies have validated the proof-of-principle of HMM for cellular imaging and provided direct evidence for the existence of complex spatial-temporal regulation of exocytosis in non-polarized cells. We independently confirmed insulin-dependent spatial regulation by using static spatial statistics methods. Conclusion: We propose that HMM-based approach can be exploited in a wide avenue of cellular processes, especially those where the changes of cellular states in space and time may be highly complex and non-obvious, such as in cell polarization, signaling and developmental processes.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20581401&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>EGM: encapsulated gene-by-gene matching to identify gene orthologs and homologous segments in genomes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20581400</link>
      <description>Publication Date: 2010 Sep 1 PMID: 20581400&lt;br/&gt;Authors: Mahmood, K. - Konagurthu, A. S. - Song, J. - Buckle, A. M. - Webb, G. I. - Whisstock, J. C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Identification of functionally equivalent genes in different species is essential to understand the evolution of biological pathways and processes. At the same time, identification of strings of conserved orthologous genes helps identify complex genomic rearrangements across different organisms. Such an insight is particularly useful, for example, in the transfer of experimental results between different experimental systems such as Drosophila and mammals. RESULTS: Here, we describe the Encapsulated Gene-by-gene Matching (EGM) approach, a method that employs a graph matching strategy to identify gene orthologs and conserved gene segments. Given a pair of genomes, EGM constructs a global gene match for all genes taking into account gene context and family information. The Hungarian method for identifying the maximum weight matching in bipartite graphs is employed, where the resulting matching reveals one-to-one correspondences between nodes (genes) in a manner that maximizes the gene similarity and context. Conclusion: We tested our approach by performing several comparisons including a detailed Human versus Mouse genome mapping. We find that the algorithm is robust and sensitive in detecting orthologs and conserved gene segments. EGM can sensitively detect rearrangements within large and small chromosomal segments. The EGM tool is fully automated and easy to use compared to other more complex methods that also require extensive manual intervention and input. AVAILABILITY: The EGM software, Supplementary information and other tools are available online from http://vbc.med.monash.edu.au/ approximately kmahmood/EGM.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20581400&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20576627</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20576627&lt;br/&gt;Authors: Liu, Y. - Schmidt, B. - Maskell, D. L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate multiple alignments is still a challenge. RESULTS: We present MSAProbs, a new and practical multiple alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Furthermore, two critical bioinformatics techniques, namely weighted probabilistic consistency transformation and weighted profile-profile alignment, are incorporated to improve alignment accuracy. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. Furthermore, MSAProbs is optimized for multi-core CPUs by employing a multi-threaded design, leading to a competitive execution time compared to other aligners. AVAILABILITY: The source code of MSAProbs, written in C++, is freely and publicly available from http://msaprobs.sourceforge.net.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20576627&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Single feature polymorphism detection using recombinant inbred line microarray expression data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20576626</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20576626&lt;br/&gt;Authors: Cui, X. - You, N. - Girke, T. - Michelmore, R. - Van Deynze, A.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The Affymetrix GeneChip microarray is currently providing a high-density and economical platform for discovery of genetic polymorphisms. Microarray data for single feature polymorphism (SFP) detection in recombinant inbred lines (RILs) can capitalize on the high level of replication available for each locus in the RIL population. It was suggested that the binding affinities from all of the RILs would form a multimodal distribution for a SFP. This motivated us to estimate the binding affinities from the robust multi-array analysis (RMA) method and formulate the SFP detection problem as a hypothesis testing problem, i.e. testing whether the underlying distribution of the estimated binding affinity (EBA) values of a probe is unimodal or multimodal. RESULTS: We developed a bootstrap-based hypothesis testing procedure using the 'dip' statistic. Our simulation studies show that the proposed procedure can reach satisfactory detection power with false discovery rate controlled at a desired level and is robust to the unimodal distribution assumption, which facilitates wide application of the proposed procedure. Our analysis of the real data identified more than four times the SFPs compared to the previous studies, covering 96% of their findings. The constructed genetic map using the SFP markers predicted from our procedure shows over 99% concordance of the genetic orders of these markers with their known physical locations on the genome sequence. AVAILABILITY: The R package 'dipSFP' can be downloaded from http://sites.google.com/a/bioinformatics.ucr.edu/xinping-cui/home/software .&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20576626&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A probabilistic framework for aligning paired-end RNA-seq data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20576625</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20576625&lt;br/&gt;Authors: Hu, Y. - Wang, K. - He, X. - Chiang, D. Y. - Prins, J. F. - Liu, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment. METHODS: A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment. RESULTS: The method was applied to 2 x 35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT-PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009). AVAILABILITY: Software available at http://www.netlab.uky.edu/p/bioinfo/MapSplice/PER.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20576625&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>EpiTOP--a proteochemometric tool for MHC class II binding prediction.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20576624</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20576624&lt;br/&gt;Authors: Dimitrov, I. - Garnev, P. - Flower, D. R. - Doytchinova, I.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: T-cell epitope identification is a critical immunoinformatic problem within vaccine design. To be an epitope, a peptide must bind an MHC protein. RESULTS: Here, we present EpiTOP, the first server predicting MHC class II binding based on proteochemometrics, a QSAR approach for ligands binding to several related proteins. EpiTOP uses a quantitative matrix to predict binding to 12 HLA-DRB1 alleles. It identifies 89% of known epitopes within the top 20% of predicted binders, reducing laboratory labour, materials and time by 80%. EpiTOP is easy to use, gives comprehensive quantitative predictions and will be expanded and updated with new quantitative matrices over time. AVAILABILITY: EpiTOP is freely accessible at http://www.pharmfac.net/EpiTOP.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20576624&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Bridges: a tool for identifying local similarities in long sequences.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20562450</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20562450&lt;br/&gt;Authors: Kondrashov, A. S. - Assis, R.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Bridges is a heuristic search tool that uses short word matches to rapidly identify local similarities between sequences. It consists of three stages: filtering input sequences, identifying local similarities and post-processing local similarities. As input sequence data are released from memory after the filtering stage, genome-scale datasets can be efficiently compared in a single run. Bridges also includes 20 parameters, which enable the user to dictate the sensitivity and specificity of a search. AVAILABILITY: Bridges is implemented in the C programming language and can be run on all platforms. Source code and documentation are available at http://github.com/rassis/bridges.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20562450&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Savant: genome browser for high-throughput sequencing data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20562449</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20562449&lt;br/&gt;Authors: Fiume, M. - Williams, V. - Brook, A. - Brudno, M.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. RESULTS: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. AVAILABILITY: Savant is freely available at http://compbio.cs.toronto.edu/savant.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20562449&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>CplexA: a Mathematica package to study macromolecular-assembly control of gene expression.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20562419</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20562419&lt;br/&gt;Authors: Vilar, J. M. - Saiz, L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Macromolecular assembly coordinates essential cellular processes, such as gene regulation and signal transduction. A major challenge for conventional computational methods to study these processes is tackling the exponential increase of the number of configurational states with the number of components. CplexA is a Mathematica package that uses functional programming to efficiently compute probabilities and average properties over such exponentially large number of states from the energetics of the interactions. The package is particularly suited to study gene expression at complex promoters controlled by multiple, local and distal, DNA binding sites for transcription factors. AVAILABILITY: CplexA is freely available together with documentation at http://sourceforge.net/projects/cplexa/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20562419&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Assemble: an interactive graphical tool to analyze and build RNA architectures at the 2D and 3D levels.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20562414</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20562414&lt;br/&gt;Authors: Jossinet, F. - Ludwig, T. E. - Westhof, E.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Assemble is an intuitive graphical interface to analyze, manipulate and build complex 3D RNA architectures. It provides several advanced and unique features within the framework of a semi-automated modeling process that can be performed by homology and ab initio with or without electron density maps. Those include the interactive editing of a secondary structure and a searchable, embedded library of annotated tertiary structures. Assemble helps users with performing recurrent and otherwise tedious tasks in structural RNA research. AVAILABILITY AND IMPLEMENTATION: Assemble is released under an open-source license (MIT license) and is freely available at http://bioinformatics.org/assemble. It is implemented in the Java language and runs on MacOSX, Linux and Windows operating systems.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20562414&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20562413</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20562413&lt;br/&gt;Authors: McLaren, W. - Pritchard, B. - Rios, D. - Chen, Y. - Flicek, P. - Cunningham, F.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species. AVAILABILITY: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20562413&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Causal relationship inference for a large-scale cellular network.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20554691</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20554691&lt;br/&gt;Authors: Zhou, T. - Wang, Y. L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Cellular networks usually consist of numerous chemical species, such as DNA, RNA, proteins and small molecules, etc. Different biological tasks are generally performed by complex interactions of these species. As these interactions can rarely be directly measured, it is widely recognized that causal relationship identification is essential in understanding biological behaviors of a cellular network. Challenging issues here include not only the large number of interactions to be estimated, but also many restrictions on probing signals. The purposes of this study are to incorporate power law in cellular network identification, in order to increase accuracy of causal regulation estimations, especially to reduce false positive errors. RESULTS: Two identification algorithms are developed that can be efficiently applied to causal regulation identification of a large-scale network from noisy steady-state experiment data. A distinguished feature of these algorithms is that power law has been explicitly incorporated into estimations, which is one important structural property that most large-scale cellular networks approximately have. Under the condition that parameters of the power law are known and measurement errors are Gaussian, a likelihood maximization approach is adopted. The developed estimation algorithms consist of three major steps. At first, angle minimization between subspaces is utilized to identify chemical elements that have direct influences on a prescribed chemical element, under the condition that the number of direct regulations is known. Second, interference coefficients from prescribed chemical elements are estimated through likelihood maximization with respect to measurement errors. Finally, direct regulation numbers are identified through maximizing a lower bound of an overall likelihood function. These methods have been applied to an artificially constructed linear system with 100 elements, a mitogen-activated protein kinase pathway model with 103 chemical elements, some DREAM initiative in silico data and some in vivo data. Compared with the widely adopted total least squares (TLS) method, computation results show that parametric estimation accuracy can be significantly increased and false positive errors can be greatly reduced. AVAILABILITY: The Matlab files for the methods are available at http://bioinfo.au.tsinghua.edu.cn/member/ylwang/Matlabfiles_CNI.zip.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20554691&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Poisson model for random multigraphs.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20554690</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20554690&lt;br/&gt;Authors: Ranola, J. M. - Ahn, S. - Sehl, M. - Smith, D. J. - Lange, K.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Biological networks are often modeled by random graphs. A better modeling vehicle is a multigraph where each pair of nodes is connected by a Poisson number of edges. In the current model, the mean number of edges equals the product of two propensities, one for each node. In this context it is possible to construct a simple and effective algorithm for rapid maximum likelihood estimation of all propensities. Given estimated propensities, it is then possible to test statistically for functionally connected nodes that show an excess of observed edges over expected edges. The model extends readily to directed multigraphs. Here, propensities are replaced by outgoing and incoming propensities. RESULTS: The theory is applied to real data on neuronal connections, interacting genes in radiation hybrids, interacting proteins in a literature curated database, and letter and word pairs in seven Shaskespearean plays. AVAILABILITY: All data used are fully available online from their respective sites. Source code and software is available from http://code.google.com/p/poisson-multigraph/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20554690&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Homology of SMP domains to the TULIP superfamily of lipid-binding proteins provides a structural basis for lipid exchange between ER and mitochondria.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20554689</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20554689&lt;br/&gt;Authors: Kopec, K. O. - Alva, V. - Lupas, A. N.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;Mitochondria must uptake some phospholipids from the endoplasmic reticulum (ER) for the biogenesis of their membranes. They convert one of these lipids, phosphatidylserine, to phosphatidylethanolamine, which can be re-exported via the ER to all other cellular membranes. The mechanisms underlying these exchanges between ER and mitochondria are poorly understood. Recently, a complex termed ER-mitochondria encounter structure (ERMES) was shown to be necessary for phospholipid exchange in budding yeast. However, it is unclear whether this complex is merely an inter-organelle tether or also the transporter. ERMES consists of four proteins: Mdm10, Mdm34 (Mmm2), Mdm12 and Mmm1, three of which contain the uncharacterized SMP domain common to a number of eukaryotic membrane-associated proteins. Here, we show that the SMP domain belongs to the TULIP superfamily of lipid/hydrophobic ligand-binding domains comprising members of known structure. This relationship suggests that the SMP domains of the ERMES complex mediate lipid exchange between ER and mitochondria.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20554689&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A Bayesian approach using covariance of single nucleotide polymorphism data to detect differences in linkage disequilibrium patterns between groups of individuals.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20554688</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20554688&lt;br/&gt;Authors: Clark, T. G. - Campino, S. G. - Anastasi, E. - Auburn, S. - Teo, Y. Y. - Small, K. - Rockett, K. A. - Kwiatkowski, D. P. - Holmes, C. C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Quantifying differences in linkage disequilibrium (LD) between sub-groups can highlight genetic regions or sites under selection and/or associated with disease, and may have utility in trans-ethnic mapping studies. RESULTS: We present a novel pseudo Bayes factor (PBF) approach that assess differences in covariance of genotype frequencies from single nucleotide polymorphism (SNP) data from a genome-wide study. The magnitude of the PBF reflects the strength of evidence for a difference, while accounting for the sample size and number of SNPs, without the requirement for permutation testing to establish statistical significance. Application of the PBF to HapMap and Gambian malaria SNP data reveals regional LD differences, some known to be under selection. AVAILABILITY AND IMPLEMENTATION: The PBF approach has been implemented in the BALD (Bayesian analysis of LD differences) C++ software, and is available from http://homepages.lshtm.ac.uk/tgclark/downloads.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20554688&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Molecular signatures-based prediction of enzyme promiscuity.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20551137</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20551137&lt;br/&gt;Authors: Carbonell, P. - Faulon, J. L.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Enzyme promiscuity, a property with practical applications in biotechnology and synthetic biology, has been related to the evolvability of enzymes. At the molecular level, several structural mechanisms have been linked to enzyme promiscuity in enzyme families. However, it is at present unclear to what extent these observations can be generalized. Here, we introduce for the first time a method for predicting catalytic and substrate promiscuity using a graph-based representation known as molecular signature. RESULTS: Our method, which has an accuracy of 85% for the non-redundant KEGG database, is also a powerful analytical tool for characterizing structural determinants of protein promiscuity. Namely, we found that signatures with higher contribution to the prediction of promiscuity are uniformly distributed in the protein structure of promiscuous enzymes. In contrast, those signatures that act as promiscuity determinants are significantly depleted around non-promiscuous catalytic sites. In addition, we present the study of the enolase and aminotransferase superfamilies as illustrative examples of characterization of promiscuous enzymes within a superfamily and achievement of enzyme promiscuity by protein reverse engineering. Recognizing the role of enzyme promiscuity in the process of natural evolution of enzymatic function can provide useful hints in the design of directed evolution experiments. We have developed a method with potential applications in the guided discovery and enhancement of latent catalytic capabilities surviving in modern enzymes. AVAILABILITY: http://www.issb.genopole.fr~faulon.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20551137&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20551136</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20551136&lt;br/&gt;Authors: Xin, F. - Myers, S. - Li, Y. F. - Cooper, D. N. - Mooney, S. D. - Radivojac, P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite this, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease. RESULTS: We propose a new kernel-based algorithm for the prediction of catalytic residues based on protein sequence, structure and evolutionary information. The method relies upon explicit modeling of similarity between residue-centered neighborhoods in protein structures. We present evidence that this algorithm evaluates favorably against established approaches, and also provides insights into the relative importance of the geometry, physicochemical properties and evolutionary conservation of catalytic residue activity. The new algorithm was used to identify known mutations associated with inherited disease whose molecular mechanism might be predicted to operate specifically though the loss or gain of catalytic residues. It should, therefore, provide a viable approach to identifying the molecular basis of disease in which the loss or gain of function is not caused solely by the disruption of protein stability. Our analysis suggests that both mechanisms are actively involved in human inherited disease. AVAILABILITY AND IMPLEMENTATION: Source code for the structural kernel is available at www.informatics.indiana.edu/predrag/.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20551136&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A new method for designing degenerate primers and its use in the identification of sequences in Brachiaria showing similarity to apomixis-associated genes.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20547638</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20547638&lt;br/&gt;Authors: Gorron, E. - Rodriguez, F. - Bernal, D. - Rodriguez-Rojas, L. M. - Bernal, A. - Restrepo, S. - Tohme, J.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: We developed a technique and a tool for degenerate primer design based on multiple local alignments employing the MEME algorithm supported with electronic PCR. The objective is to find adequate primers starting from sequences with poor global similarity. We show an example of its application in our laboratory to find sequences in Brachiaria with similarity to ESTs related to apomixis.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20547638&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20542890</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20542890&lt;br/&gt;Authors: Korcsmaros, T. - Farkas, I. J. - Szalay, M. S. - Rovo, P. - Fazekas, D. - Spiro, Z. - Bode, C. - Lenti, K. - Vellai, T. - Csermely, P.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Signaling pathways control a large variety of cellular processes. However, currently, even within the same database signaling pathways are often curated at different levels of detail. This makes comparative and cross-talk analyses difficult. RESULTS: We present SignaLink, a database containing eight major signaling pathways from Caenorhabditis elegans, Drosophila melanogaster and humans. Based on 170 review and approximately 800 research articles, we have compiled pathways with semi-automatic searches and uniform, well-documented curation rules. We found that in humans any two of the eight pathways can cross-talk. We quantified the possible tissue- and cancer-specific activity of cross-talks and found pathway-specific expression profiles. In addition, we identified 327 proteins relevant for drug target discovery. Conclusions: We provide a novel resource for comparative and cross-talk analyses of signaling pathways. The identified multi-pathway and tissue-specific cross-talks contribute to the understanding of the signaling complexity in health and disease, and underscore its importance in network-based drug target selection. AVAILABILITY: http://SignaLink.org.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20542890&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>A highly accurate statistical approach for the prediction of transmembrane beta-barrels.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20538726</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20538726&lt;br/&gt;Authors: Freeman, T. C. Jr - Wimley, W. C.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;MOTIVATION: Transmembrane beta-barrels (TMBBs) belong to a special structural class of proteins predominately found in the outer membranes of Gram-negative bacteria, mitochondria and chloroplasts. TMBBs are surface-exposed proteins that perform a variety of functions ranging from nutrient acquisition to osmotic regulation. These properties suggest that TMBBs have great potential for use in vaccine or drug therapy development. However, membrane proteins, such as TMBBs, are notoriously difficult to identify and characterize using traditional experimental approaches and current prediction methods are still unreliable. RESULTS: A prediction method based on the physicochemical properties of experimentally characterized TMBB structures was developed to predict TMBB-encoding genes from genomic databases. The Freeman-Wimley prediction algorithm developed in this study has an accuracy of 99% and MCC of 0.748 when using the most efficient prediction criteria, which is better than any previously published algorithm. AVAILABILITY: The MS Windows-compatible application is available for download at http://www.tulane.edu/~biochem/WW/apps.html.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20538726&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
    <item>
      <title>cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data.</title>
      <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=20538725</link>
      <description>Publication Date: 2010 Aug 15 PMID: 20538725&lt;br/&gt;Authors: Zhou, F. - Xu, Y.&lt;br/&gt;Journal: Bioinformatics&lt;br/&gt;&lt;br/&gt;SUMMARY: Huge amount of metagenomic sequence data have been produced as a result of the rapidly increasing efforts worldwide in studying microbial communities as a whole. Most, if not all, sequenced metagenomes are complex mixtures of chromosomal and plasmid sequence fragments from multiple organisms, possibly from different kingdoms. Computational methods for prediction of genomic elements such as genes are significantly different for chromosomes and plasmids, hence raising the need for separation of chromosomal from plasmid sequences in a metagenome. We present a program for classification of a metagenome set into chromosomal and plasmid sequences, based on their distinguishing pentamer frequencies. On a large training set consisting of all the sequenced prokaryotic chromosomes and plasmids, the program achieves approximately 92% in classification accuracy. On a large set of simulated metagenomes with sequence lengths ranging from 300 bp to 100 kbp, the program has classification accuracy from 64.45% to 88.75%. On a large independent test set, the program achieves 88.29% classification accuracy. AVAILABILITY: The program has been implemented as a standalone prediction program, cBar, which is available at http://csbl.bmb.uga.edu/~ffzhou/cBar.&lt;br/&gt;&lt;br/&gt;post to: &lt;a href = &quot;http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D20538725&amp;title=Entrez+Pubmed&quot;&gt;CiteULike&lt;/a&gt;</description>
    </item>
  </channel>
</rss>
