## Find some sequence in DNA

Occasional Contributor
Posts: 6

# Find some sequence in DNA

if i have 10000 DNA sequence data e,g;

1.G

2.C

3.A

4.C

....

....

How can i do to find no. GCC pattern in this dataset?

PROC Star
Posts: 1,400

## Re: Find some sequence in DNA

Posted in reply to LauChiFung

Just to clarify, your data looks something like this right?

``````data DNA;
input ID seq \$;
datalines;
1 G
2 C
3 A
4 C
;``````

.. and so on. Then you want the observation where a seq value of G is followed by a C and then another C right?

Occasional Contributor
Posts: 6

YES
Trusted Advisor
Posts: 1,848

## Re: Find some sequence in DNA

[ Edited ]
Posted in reply to LauChiFung

The way to deal with your query depends on your data file type and format.

1) assuming your data is a flat file then:

``````filename DNA '...path and filename';
data want;
infile DNA truncover end=eof;
length c1-c3 \$1 ;
array cx c1-c3;
retain c1-c3 ' '  i 0 ;
input   na \$1.;
link check;      keep pos c1-c3;
return;
check:
if i < 3 then do;
i+1; cx (i) = na;
end;
else do;          pos = _N_-2 ;  /* position of 1st NA = G */
if compress(c1||c2||c3) = 'GCC'
then output;
c1=c2;
c2=c3;
c3=na;     end;
return;
run;
``````

2)  Similarly, if the data is a sas dataset then the code should be,

assuming that NA is the variable with the Nuclear Acid code:

``````data want;
set have;
length c1-c3 \$1 ;
array cx c1-c3;
retain c1-c3 ' '  i 0 ;
link check;
keep pos c1-c3;
return;
check:
if i < 3 then do;
i+1; cx (i) = na;
end;
else do;
pos = _N_-2;  /* position of ````1st NA = G */``          if compress(c1||c2||c3) = 'GCC' then output;              c1=c2; c2=c3; c3=na;     end;return; run;````

Trusted Advisor
Posts: 1,387

## Re: Find some sequence in DNA

Posted in reply to LauChiFung

data _null_;

retain n_gcc 0;

set dna end=eod;

if lag2(seq)='G' and lag(seq)='C'  and seq='C'  then n_gcc+1;

if eod then put n_gcc=;

run;

Occasional Contributor
Posts: 6

## Re: Find some sequence in DNA

Posted in reply to LauChiFung
So thank for you all
Discussion stats
• 5 replies
• 227 views
• 1 like
• 4 in conversation