:markdown Overview ======== This application is designed for integration of data and functionalities. It design is based on the idea of the `Entity`: any thing that can be unambiguously identified and be subject of investigation. Examples of entities are: genes, proteins, SNPs, samples, pathways, etc. Each entity has a report, which depends on the type of entity it is. All reports are computed on the fly and cached. In addition to the main report, entities have `actions`, which are sub-reports that implement particular analysis. For instance, for gene entities, one of the actions is to display a summary of the relevance of that gene across the collection of studies that you have access to. ## Miscellaneous comments ### A word on performance This application is intended to allow interactive investigation. But be aware, some analysis are slow and will not feel that much interactive. Some process may require plenty of infrastructure: downloading of datasets, building databases, computing preliminary results, etc. A lot of this processing is reused system-wide. This means that the application starts slow but gets quicker. Unlike other applications with a more limited scope this system cannot foresee what the user will be interested in, so not all this infrastructure can be built before hand and must be built on demand. We pay a price for flexibility by sacrificing responsiveness. But the system has plenty of tricks to be as efficient as possible and to reuse as much as possible. ### Entity annotations The `Entity` subsystem annotates identifiers for entities with additional information to help their complete and unambiguous identification. #### Organisms and builds When you identify a gene with the string `TP53` we might think that the gene is unambiguously identified, however, it is not. First of all, we need to know to which organism this gene belongs to, each TP53 gene from different organisms is a different gene. Not only that, but the TP53 gene changes slightly from build to build, in particular, its chromosomal position may shift. For that reason genes are not only characterized by the organism they belong to but also by the version of the build we are considering. We annotate entities with their organism and build using the following convention. The organism is specified with a three letter code, the first is uppercase and is the first letter of the first term in the organism name ("H" for Homo) and the last two the first two letter of the second term ("sa" for sapiens); this is the convention followed by KeGG, it is succinct and collision free in our experience. The build is specified afterwards with a date code; "Hsa/may2009" represents the _Homo s._ organism as was known in May 2009 i.e. hg18 build; whereas "Hsa/jan2013" corresponds to a recent version of the hg19 build. #### Identifier formats Genes can be identified through a substantial number of identifier formats: Ensembl Gene ID, Entrez Gene ID, Associated Gene Name (gene symbol). We use the Ensembl BioMart to download an identifier translation resource. The name of the formats corresponds to the names used in the Ensembl BioMart and must be *followed to the letter including case*. The gene `Entity` is prepared to handle all the necessary translations between identifiers across different resources transparently. But the user *must* be aware of this fact or may run into trouble. %h2 Subsystems %ul %li %a(href='/help/entity' class="help") Entity subsystem %li %a(href='/help/workflow' class="help") Workflow subsystem