![]() |
preg |
A regular expression is a way of specifying an ambiguous pattern to search for. Regular expressions are commonly used in some computer programming languages and may be more familiar to some users than to others.
The following is a short guide to regular expressions in EMBOSS:
The following quantifier characters specify the number of time that the character before (in this case 'x') matches:
Quantifiers can follow any of the following types of character specification:
Combining some of these features gives these examples from the PROSITE patterns database:
'[STAGCN][RKH][LIVMAFY]$'
which is the 'Microbodies C-terminal targeting signal'.
'LP.TG[STGAVDE]'
which is the 'Gram-positive cocci surface proteins anchoring hexapeptide'.
Regular expressions are case-sensitive. The pattern 'AAAA' will not match the sequence 'aaaa'.
% preg Regular expression search of a protein sequence Input sequence(s): tsw:*_rat Regular expression pattern: IA[QWF]A Output file [100k_rat.preg]: |
Go to the input files for this example
Go to the output files for this example
Mandatory qualifiers: [-sequence] seqall Sequence database USA [-pattern] regexp Regular expression pattern [-outfile] outfile Output file name Optional qualifiers: (none) Advanced qualifiers: (none) Associated qualifiers: "-sequence" related qualifiers -sbegin1 integer First base used -send1 integer Last base used, def=seq length -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sopenfile1 string Input filename -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" related qualifiers -odirectory3 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for required and optional values -debug boolean Write debug output to program.dbg -acdlog boolean Write ACD processing log to program.acdlog -acdpretty boolean Rewrite ACD file as program.acdpretty -acdtable boolean Write HTML table of options -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-pattern] (Parameter 2) |
Regular expression pattern | Any regular epression pattern is accepted | Required |
[-outfile] (Parameter 3) |
Output file name | Output file | <sequence>.preg |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
(none) |
preg search of tsw:*_rat with pattern IA[QWF]A Matches in 100K_RAT 100K_RAT 390 IAQA |
Program name | Description |
---|---|
antigenic | Finds antigenic sites in proteins |
digest | Protein proteolytic enzyme or reagent cleavage digest |
fuzzpro | Protein pattern search |
fuzztran | Protein pattern search after translation |
helixturnhelix | Report nucleic acid binding motifs |
oddcomp | Finds protein sequence regions with a biased composition |
patmatdb | Search a protein sequence with a motif |
patmatmotifs | Search a PROSITE motif database with a protein sequence |
pepcoil | Predicts coiled coil regions |
pestfind | Finds PEST motifs as potential proteolytic cleavage sites |
pscan | Scans proteins using PRINTS |
sigcleave | Reports protein signal cleavage sites |
Other EMBOSS programs allow you to search for simple patterns and may be easier for the user who has never used regular expressions before: