BookmarkSubscribeRSS Feed
zjhansen30
Fluorite | Level 6
I am new at arrasy, but I am trying to get the below 15 sequences of DNA, which are 60 characters, to be 60 variables, D1-D60, where D1 holds the first position, D2 the second position, and so on.  I have never used the substr function, so I am probably doing it incorrectly. Any help would be great

data dna2 (drop=dna i); set dna; array d(60); do i=1 to 60; d(i)=d(4/(i)); dna=substr (dna, 1, 15); end; run;
data dna;
	length dna $ 60;
	input dna $;
datalines;
TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACA
TGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTG
CTTCAAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAAATAAGGAGAGAAGAA
CAGCTTGTTACACCCTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGT
GTGGAAGTTTGACAGCCTCCTAGCATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTA
CTACAAAGACTGCTGACATCGAGCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG
GAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTACATATAAGCAGC
TGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTG
GCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAG
TGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAG
TGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGA
GGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG
GCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTG
CGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGC
CAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAAC
;
8 REPLIES 8
FreelanceReinh
Jade | Level 19

How about this?

data dna2 (drop=dna i);
set dna;
array d(60) $1;
do i=1 to 60;
  d(i)=char(dna,i);
end;
run;

Edit: Alternatively, you could use the SUBSTR function:

d(i)=substr(dna,i,1);

 

zjhansen30
Fluorite | Level 6
That works, I almost had it, but do you know how to use the substr function doing it at all?
FreelanceReinh
Jade | Level 19

Please see the edited post. 

 

The CHAR function syntax is a bit shorter. With SUBSTR you have to specify where to start the substring (second argument) and how many characters it should contain (third argument). The latter is implied to be 1 with the CHAR function.

zjhansen30
Fluorite | Level 6
Thank you, that makes sense to use the char over the substr.....and thank you for your help
Haikuo
Onyx | Level 15

This maybe a little obscured for you, but depending on the size of your data set, it will save you some time considerably. 

 

data want;
set dna;
array d(60) $1;
call pokelong(dna, addrlong(d(1)),60);
drop dna;
run;
FreelanceReinh
Jade | Level 19

@Haikuo: Thanks for pointing out this interesting alternative. I had never used this call routine. The warnings in the documentation are quite intimidating, though ("devastating problems ... destroying a vital element ..."). So, maybe a bit too advanced for someone who had "never used the substr function."

Haikuo
Onyx | Level 15

@FreelanceReinh, True and agreed. Direct memory write-operation has inherited risk. We implemented a few in the case where huge amount of data manipulation is required, and it does help. Here is just to add some new elements to the discussion, after all, besides seeking answers for specific questions, people also want to learn.

Kurt_Bremser
Super User

You could read the data directly into the array like that:

data dna;
array dna {60} $1 dna1-dna60;
do i = 1 to 60;
input dna{i} $1.@;
end;
drop i;
datalines;
TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACA
TGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTG
CTTCAAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAAATAAGGAGAGAAGAA
CAGCTTGTTACACCCTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGT
GTGGAAGTTTGACAGCCTCCTAGCATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTA
CTACAAAGACTGCTGACATCGAGCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG
GAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTACATATAAGCAGC
TGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTG
GCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAG
TGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAG
TGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGA
GGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG
GCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTG
CGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGC
CAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAAC
;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1361 views
  • 1 like
  • 4 in conversation