GO Annotation

Group

Analysis

Argument

-annotation

Short Description

Run GO Annotation.

Description

This is the process of selecting GO terms from the GO pool obtained by the Mapping step and assigning them to the query sequences.

GO annotation is carried out by applying an annotation rule (AR) on the found ontology term candidates. The rule seeks to find the most specific annotations with a certain level of reliability. This process is adjustable in specificity and stringency.

For each candidate GO an annotation score (AS) is computed. The AS is composed of two additive terms.

The first, direct term (DT), represents the highest hit similarity of this GO weighted by a factor corresponding to its EC.

The second term (AT) of the AS provides the possibility of abstraction. This is defined as an annotation to a parent node when several child nodes are present in the GO candidate collection. This term multiplies the number of total GOs unified at the node by a user-defined GO weight factor that controls the possibility and strength of abstraction. When GO weight is set to 0, no abstraction is done.

Finally, the AR selects the lowest term per branch that lies over a user-defined threshold. DT, AT and the AR terms are defined below.

DT = max(similarity * ECweight)

AT = (#GO - 1) * GOweight

AR: lowest.node (AS (DT + AT) ) >= threshold

To better understand how the annotation score works, the following reasoning can be done: When EC-weight is set to 1 for all ECs (no EC influence) and GO-weight equals zero (no abstraction), then the annotation score equals the maximum similarity value of the hits that have that GO term and the sequence will be annotated with that GO term if that score is above the given threshold provided. The situation when EC-weights are lower than 1 means that higher similarities are required to reach the threshold. If the GO-weight is different to 0 this means that the possibility is enabled that a parent node will reach the threshold while its various children nodes would not.

The annotation rule provides a general framework for annotation. The actual way annotation occurs depends on how the different parameters at the AS are set. These can be adjusted in the properties file.

  1. Annotation Cut-Off (threshold). The annotation rule selects the lowest term per branch that lies over this threshold (default=55).

  2. GO-Weight. This is the weight given to the contribution of mapped children terms to the annotation of a parent term (default=5).

  3. Filter GO by taxonomy: The filter will remove the Gene Ontology terms known not to be in the given taxonomy using the restrictions defined by Gene Ontology. You can select one of the given options or simply write a taxonomy id.

  4. E-Value-Hit-Filter. This value can be understood as a pre-filter: only GO terms obtained from hits with a greater e-value than given will be used for annotation and/or shown in a generated graph (default=1.0E-6).

  5. Hsp-Hit Coverage CutOff. Sets the minimum needed coverage between a Hit and his HSP. For example, a value of 80 would mean that the aligned HSP must cover at least 80% of the longitude of its Hit. Only annotations from Hit fulfilling this criterion will be considered for annotation transference.

  6. Hit Filter. This option allows you to consider only the first N hits during annotation. This option is correlative with "Only hits with GOs'' feature.

  7. Only hits with GOs. This option together with the "Hit Filter'' option allows to apply it only on hits that have a GO term candidate.

  8. EC-Weight. EC code weights can be modified at the following pages of the Run Annotation dialogue by clicking Next. Note that in case of influence by evidence codes is not wanted, you can set them all at 1. Alternatively, when you want to exclude GO annotations of a certain EC (for example IEAs), you can set this EC weight at 0.

Requirements

  • Load a sequence project in the same command, e.g. .box file containing sequences with Blast and GO Mapping results.

Properties File

AnnotationAlgoParameters