BookmarkSubscribeRSS Feed
zjhansen30
Fluorite | Level 6
I am new at arrasy, but I am trying to get the below 15 sequences of DNA, which are 60 characters, to be 60 variables, D1-D60, where D1 holds the first position, D2 the second position, and so on.  I have never used the substr function, so I am probably doing it incorrectly. Any help would be great

data dna2 (drop=dna i); set dna; array d(60); do i=1 to 60; d(i)=d(4/(i)); dna=substr (dna, 1, 15); end; run;
data dna;
	length dna $ 60;
	input dna $;
datalines;
TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACA
TGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTG
CTTCAAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAAATAAGGAGAGAAGAA
CAGCTTGTTACACCCTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGT
GTGGAAGTTTGACAGCCTCCTAGCATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTA
CTACAAAGACTGCTGACATCGAGCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG
GAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTACATATAAGCAGC
TGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTG
GCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAG
TGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAG
TGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGA
GGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG
GCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTG
CGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGC
CAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAAC
;
8 REPLIES 8
FreelanceReinh
Jade | Level 19

How about this?

data dna2 (drop=dna i);
set dna;
array d(60) $1;
do i=1 to 60;
  d(i)=char(dna,i);
end;
run;

Edit: Alternatively, you could use the SUBSTR function:

d(i)=substr(dna,i,1);

 

zjhansen30
Fluorite | Level 6
That works, I almost had it, but do you know how to use the substr function doing it at all?
FreelanceReinh
Jade | Level 19

Please see the edited post. 

 

The CHAR function syntax is a bit shorter. With SUBSTR you have to specify where to start the substring (second argument) and how many characters it should contain (third argument). The latter is implied to be 1 with the CHAR function.

zjhansen30
Fluorite | Level 6
Thank you, that makes sense to use the char over the substr.....and thank you for your help
Haikuo
Onyx | Level 15

This maybe a little obscured for you, but depending on the size of your data set, it will save you some time considerably. 

 

data want;
set dna;
array d(60) $1;
call pokelong(dna, addrlong(d(1)),60);
drop dna;
run;
FreelanceReinh
Jade | Level 19

@Haikuo: Thanks for pointing out this interesting alternative. I had never used this call routine. The warnings in the documentation are quite intimidating, though ("devastating problems ... destroying a vital element ..."). So, maybe a bit too advanced for someone who had "never used the substr function."

Haikuo
Onyx | Level 15

@FreelanceReinh, True and agreed. Direct memory write-operation has inherited risk. We implemented a few in the case where huge amount of data manipulation is required, and it does help. Here is just to add some new elements to the discussion, after all, besides seeking answers for specific questions, people also want to learn.

Kurt_Bremser
Super User

You could read the data directly into the array like that:

data dna;
array dna {60} $1 dna1-dna60;
do i = 1 to 60;
input dna{i} $1.@;
end;
drop i;
datalines;
TGGAAGGGCTAATTTGGTCCCAAAAAAGACAAGAGATCCTTGATCTGTGGATCTACCACA
TGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTG
CTTCAAGTTAGTACCAGTTGAACCAGAGCAAGTAGAAGAGGCCAAATAAGGAGAGAAGAA
CAGCTTGTTACACCCTATGAGCCAGCATGGGATGGAGGACCCGGAGGGAGAAGTATTAGT
GTGGAAGTTTGACAGCCTCCTAGCATTTCGTCACATGGCCCGAGAGCTGCATCCGGAGTA
CTACAAAGACTGCTGACATCGAGCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG
GAGGTGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATGCTACATATAAGCAGC
TGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTG
GCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAG
TGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAG
TGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGA
GGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG
GCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTG
CGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGC
CAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAAC
;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1381 views
  • 1 like
  • 4 in conversation