Обновлено 09.03.2009 Автор: AdministratorSimple online service to count the number of distinct n-mers in a sequence. Pretty like compseq program from EMBOSS, but also estimates significance of difference between observed and expected frequences.
Set word size ( default 3)
Use "beg.index" and "end index" to count words in a window. Indexes should be 1-based
Script can handle several sequences in FASTA format. All word frequencies are summarized. If counting in a window, then estimation is done for each sequence separately.
Frequency of each nucleotide (optionally). The default frequency for each nucleotide is 0.25, hence 'Expected' frequency of any dimer is 1/16, of any trimer is 1/64 and etc. However, you can estimate frequencies of each nucleotide by specifying word size = 1 and replace expected frequences with observed.
For each n-mer report observed and expected counts and frequencies.
P-value of significance between observed and expected is provided. If p-value<0.05 then program outputs additional info if n-mer is significantly overrepresented( '+'), or significantly underrepresented ('-'). For p-values<0.01 two signs will be added, for p-values<0.001 three signs will be added.
Extremely long sequences are not supported due to server limitaions. Please, contact administrator directly to process your data nevertheless.