BookmarkSubscribeRSS Feed
zjhansen30
Fluorite | Level 6
I am new at arrasy, but I am trying to get the below 15 sequences of DNA, which are 60 characters, to be 60 variables, D1-D60, where D1 holds the first position, D2 the second position, and so on.  I have never used the substr function, so I am probably doing it incorrectly. Any help would be great

data dna2 (drop=dna i); set dna; array d(60); do i=1 to 60; d(i)=d(4/(i)); dna=substr (dna, 1, 15); end; run;
data dna;
	length dna $ 60;
	input dna $;
datalines;
TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACA
TGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTG
CTTCAAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAAATAAGGAGAGAAGAA
CAGCTTGTTACACCCTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGT
GTGGAAGTTTGACAGCCTCCTAGCATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTA
CTACAAAGACTGCTGACATCGAGCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG
GAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTACATATAAGCAGC
TGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTG
GCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAG
TGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAG
TGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGA
GGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG
GCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTG
CGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGC
CAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAAC
;
8 REPLIES 8
FreelanceReinh
Jade | Level 19

How about this?

data dna2 (drop=dna i);
set dna;
array d(60) $1;
do i=1 to 60;
  d(i)=char(dna,i);
end;
run;

Edit: Alternatively, you could use the SUBSTR function:

d(i)=substr(dna,i,1);

 

zjhansen30
Fluorite | Level 6
That works, I almost had it, but do you know how to use the substr function doing it at all?
FreelanceReinh
Jade | Level 19

Please see the edited post. 

 

The CHAR function syntax is a bit shorter. With SUBSTR you have to specify where to start the substring (second argument) and how many characters it should contain (third argument). The latter is implied to be 1 with the CHAR function.

zjhansen30
Fluorite | Level 6
Thank you, that makes sense to use the char over the substr.....and thank you for your help
Haikuo
Onyx | Level 15

This maybe a little obscured for you, but depending on the size of your data set, it will save you some time considerably. 

 

data want;
set dna;
array d(60) $1;
call pokelong(dna, addrlong(d(1)),60);
drop dna;
run;
FreelanceReinh
Jade | Level 19

@Haikuo: Thanks for pointing out this interesting alternative. I had never used this call routine. The warnings in the documentation are quite intimidating, though ("devastating problems ... destroying a vital element ..."). So, maybe a bit too advanced for someone who had "never used the substr function."

Haikuo
Onyx | Level 15

@FreelanceReinh, True and agreed. Direct memory write-operation has inherited risk. We implemented a few in the case where huge amount of data manipulation is required, and it does help. Here is just to add some new elements to the discussion, after all, besides seeking answers for specific questions, people also want to learn.

Kurt_Bremser
Super User

You could read the data directly into the array like that:

data dna;
array dna {60} $1 dna1-dna60;
do i = 1 to 60;
input dna{i} $1.@;
end;
drop i;
datalines;
TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACA
TGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTG
CTTCAAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAAATAAGGAGAGAAGAA
CAGCTTGTTACACCCTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGT
GTGGAAGTTTGACAGCCTCCTAGCATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTA
CTACAAAGACTGCTGACATCGAGCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG
GAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTACATATAAGCAGC
TGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTG
GCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAG
TGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAG
TGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGA
GGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG
GCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTG
CGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGC
CAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAAC
;
run;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1217 views
  • 1 like
  • 4 in conversation