DATA Step, Macro, Functions and more

Find some sequence in DNA

Reply
Occasional Contributor
Posts: 6

Find some sequence in DNA

if i have 10000 DNA sequence data e,g;

 

1.G

2.C

3.A

4.C

....

....

How can i do to find no. GCC pattern in this dataset?

 

 

 

 

 

 

 

PROC Star
Posts: 554

Re: Find some sequence in DNA

Just to clarify, your data looks something like this right?

 

data DNA;
   input ID seq $;
   datalines;
   1 G
   2 C
   3 A 
   4 C
   ;

.. and so on. Then you want the observation where a seq value of G is followed by a C and then another C right? Smiley Happy

Occasional Contributor
Posts: 6

Re: Find some sequence in DNA

YES
Trusted Advisor
Posts: 1,401

Re: Find some sequence in DNA

[ Edited ]

The way to deal with your query depends on your data file type and format.

 

1) assuming your data is a flat file then:

filename DNA '...path and filename';
data want;
      infile DNA truncover end=eof;
      length c1-c3 $1 ;
      array cx c1-c3;
      retain c1-c3 ' '  i 0 ;
      input   na $1.;
      link check;
keep pos c1-c3; return; check: if i < 3 then do; i+1; cx (i) = na; end; else do;
pos = _N_-2 ; /* position of 1st NA = G */ if compress(c1||c2||c3) = 'GCC' then output; c1=c2; c2=c3; c3=na;
end; return; run;

2)  Similarly, if the data is a sas dataset then the code should be,

     assuming that NA is the variable with the Nuclear Acid code:

data want;
      set have;
      length c1-c3 $1 ;
      array cx c1-c3;
      retain c1-c3 ' '  i 0 ;      
      link check;
      keep pos c1-c3;
return;
check:
    if i < 3 then do;
       i+1; cx (i) = na;
    end;
    else do;
          pos = _N_-2;  /* position of 1st NA = G */
if compress(c1||c2||c3) = 'GCC' then output;
c1=c2; c2=c3; c3=na;
end;
return;
run;

 

 

 

Valued Guide
Posts: 797

Re: Find some sequence in DNA

data _null_;

  retain n_gcc 0;

  set dna end=eod;

 

  if lag2(seq)='G' and lag(seq)='C'  and seq='C'  then n_gcc+1;

  if eod then put n_gcc=;

run;

 

Occasional Contributor
Posts: 6

Re: Find some sequence in DNA

So thank for you all
Ask a Question
Discussion stats
  • 5 replies
  • 195 views
  • 1 like
  • 4 in conversation