Help using Base SAS procedures

Using arrays and substr function for sequencing DNA

Reply
Occasional Contributor
Posts: 14

Using arrays and substr function for sequencing DNA

I am new at arrasy, but I am trying to get the below 15 sequences of DNA, which are 60 characters, to be 60 variables, D1-D60, where D1 holds the first position, D2 the second position, and so on.  I have never used the substr function, so I am probably doing it incorrectly. Any help would be great

data dna2 (drop=dna i); set dna; array d(60); do i=1 to 60; d(i)=d(4/(i)); dna=substr (dna, 1, 15); end; run;
data dna;
	length dna $ 60;
	input dna $;
datalines;
TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACA
TGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTG
CTTCAAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAAATAAGGAGAGAAGAA
CAGCTTGTTACACCCTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGT
GTGGAAGTTTGACAGCCTCCTAGCATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTA
CTACAAAGACTGCTGACATCGAGCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG
GAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTACATATAAGCAGC
TGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTG
GCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAG
TGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAG
TGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGA
GGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG
GCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTG
CGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGC
CAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAAC
;
Trusted Advisor
Posts: 1,115

Re: Using arrays and substr function for sequencing DNA

[ Edited ]

How about this?

data dna2 (drop=dna i);
set dna;
array d(60) $1;
do i=1 to 60;
  d(i)=char(dna,i);
end;
run;

Edit: Alternatively, you could use the SUBSTR function:

d(i)=substr(dna,i,1);

 

Occasional Contributor
Posts: 14

Re: Using arrays and substr function for sequencing DNA

That works, I almost had it, but do you know how to use the substr function doing it at all?
Trusted Advisor
Posts: 1,115

Re: Using arrays and substr function for sequencing DNA

Please see the edited post. 

 

The CHAR function syntax is a bit shorter. With SUBSTR you have to specify where to start the substring (second argument) and how many characters it should contain (third argument). The latter is implied to be 1 with the CHAR function.

Occasional Contributor
Posts: 14

Re: Using arrays and substr function for sequencing DNA

Thank you, that makes sense to use the char over the substr.....and thank you for your help
Respected Advisor
Posts: 3,124

Re: Using arrays and substr function for sequencing DNA

This maybe a little obscured for you, but depending on the size of your data set, it will save you some time considerably. 

 

data want;
set dna;
array d(60) $1;
call pokelong(dna, addrlong(d(1)),60);
drop dna;
run;
Trusted Advisor
Posts: 1,115

Re: Using arrays and substr function for sequencing DNA

[ Edited ]

@Haikuo: Thanks for pointing out this interesting alternative. I had never used this call routine. The warnings in the documentation are quite intimidating, though ("devastating problems ... destroying a vital element ..."). So, maybe a bit too advanced for someone who had "never used the substr function."

Respected Advisor
Posts: 3,124

Re: Using arrays and substr function for sequencing DNA

@FreelanceReinhard, True and agreed. Direct memory write-operation has inherited risk. We implemented a few in the case where huge amount of data manipulation is required, and it does help. Here is just to add some new elements to the discussion, after all, besides seeking answers for specific questions, people also want to learn.

Super User
Posts: 6,972

Re: Using arrays and substr function for sequencing DNA

You could read the data directly into the array like that:

data dna;
array dna {60} $1 dna1-dna60;
do i = 1 to 60;
input dna{i} $1.@;
end;
drop i;
datalines;
TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACA
TGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTG
CTTCAAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAAATAAGGAGAGAAGAA
CAGCTTGTTACACCCTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGT
GTGGAAGTTTGACAGCCTCCTAGCATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTA
CTACAAAGACTGCTGACATCGAGCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG
GAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTACATATAAGCAGC
TGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTG
GCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAG
TGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAG
TGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGA
GGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG
GCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTG
CGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGC
CAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAAC
;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 8 replies
  • 490 views
  • 1 like
  • 4 in conversation