DATA Step, Macro, Functions and more

how to count and output specific number of strings in a variable

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

how to count and output specific number of strings in a variable

Hello,

 

I have a variable in a SAS dataset that is a string of characters such as O-O-O-M-M-O-I-O, what I would like to do is try to shorten the value of the variable by adding a number that indicates how many times a character is repeated in a row. So for the above example I would like to see it be converted to 3O-2M-O-I-O.  Suggestions on how to do this?

 

here is a code that you can use:

 

data have;
input ID 1. DNA $30.;
datalines;
1	O-O-O-O-O-O-O-O-O-O-O-O-O-O-O	
2	M-M-M-M-M-M-O-I-I-I-I-O-O-M-M	
3	M-M-M-M-O	                
4	O-I-O-I-M-O-O-O-I-I	        
5	O	 
; run;

 

What I would like for the above is to get the "want DNA" column.

 

IDHAVE DNAWANT DNA
1O-O-O-O-O-O-O-O-O-O-O-O-O-O-O15O
2M-M-M-M-M-M-O-I-I-I-I-O-O-M-M6M-O-4I-2O-2M
3M-M-M-M-O4M-O
4O-I-O-I-M-O-O-O-I-IO-I-O-I-M-3O-2I
5OO

 

Thank you!


Accepted Solutions
Solution
‎01-31-2018 12:14 PM
Super Contributor
Posts: 320

Re: how to count and output specific number of strings in a variable

Posted in reply to sas_student1
data want;
  set have;
  length out_strand $30;
  length cur_strand cur_base $1;

  do _i = 1 to countw(dna);  
    cur_strand = scan(dna,_i,'-');  *identify the current base we are looking at;

	  do strand_count = 0 by 1 until (cur_strand ne cur_base);  *iterate over the scans to find the next nonmatch;
	    cur_base = scan(dna,strand_count+_i,'-');
      end;

      *compose the output string, checking to see if we need to append the number if >1 or not if =1;
	  out_strand = catx('-',out_strand,cats(ifc(strand_count>1,strand_count,''),cur_strand));

      _i = strand_count+_i-1;  *have to decrement one, since we go one past the match;
  end;
run;

This should work for what you need; basically you scan over the string and keep scanning until you reach a non-match.

View solution in original post


All Replies
Solution
‎01-31-2018 12:14 PM
Super Contributor
Posts: 320

Re: how to count and output specific number of strings in a variable

Posted in reply to sas_student1
data want;
  set have;
  length out_strand $30;
  length cur_strand cur_base $1;

  do _i = 1 to countw(dna);  
    cur_strand = scan(dna,_i,'-');  *identify the current base we are looking at;

	  do strand_count = 0 by 1 until (cur_strand ne cur_base);  *iterate over the scans to find the next nonmatch;
	    cur_base = scan(dna,strand_count+_i,'-');
      end;

      *compose the output string, checking to see if we need to append the number if >1 or not if =1;
	  out_strand = catx('-',out_strand,cats(ifc(strand_count>1,strand_count,''),cur_strand));

      _i = strand_count+_i-1;  *have to decrement one, since we go one past the match;
  end;
run;

This should work for what you need; basically you scan over the string and keep scanning until you reach a non-match.

Frequent Contributor
Posts: 109

Re: how to count and output specific number of strings in a variable

[ Edited ]
Posted in reply to sas_student1
data finale(drop= i j k count);
set have;
length want_dna $200;											*This needs to be set to a maximum lenghth;
do i=1 to n;
	count=0;
	key=substr(DNA,i,1);
	do j=i to n;
		if i=j and i ne 1 then do;
			do k=1 to k=i-1;
				if key=substr(DNA,k,1) then goto skip;
			end;
		end;
	if key=substr(DNA,j,1) then count=+1;
	end;

	if i=1 then do;
		if count gt 1 then 	WANT_DNA=key||'-'||put(count,best12.);
		else WANT_DNA=key;
	end;
	else do;
		if count gt 1 then 	WANT_DNA=WANT_DNA||'-'||key||'-'||put(count,best12.);
		else WANT_DNA=WANT_DNA||'-'||key;
	end;
	skip:
end;
run;

I have not tested this code, to be updated.

Contributor
Posts: 43

Re: how to count and output specific number of strings in a variable

Posted in reply to Satish_Parida

Thank you @Satish_Parida! I did get an error:

Variable n is uninitialized.

ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or

the BY expression is missing, zero, or invalid.

 

However, the code by @snoopy369 worked!

 

Thank you to you both!!

Super User
Posts: 10,611

Re: how to count and output specific number of strings in a variable

Posted in reply to sas_student1

I love this question absolutely.

data have;
infile cards expandtabs;
input ID 1. DNA $30.;
cards;
1 O-O-O-O-O-O-O-O-O-O-O-O-O-O-O	
2 M-M-M-M-M-M-O-I-I-I-I-O-O-M-M	
3 M-M-M-M-O	                
4 O-I-O-I-M-O-O-O-I-I	        
5 O	 
; run;
data temp;
 set have;
 do i=1 to countw(dna,'-');
  value=scan(dna,i,'-');output;
 end;
 drop i dna;
run;
proc summary data=temp ;
by id value notsorted;
output out=temp1;
run;
data want;
length want $ 200;
 do until(last.id);
  set temp1;
  by id;
  if _freq_=1 then want=catx('-',want,value);
   else want=catx('-',want,cats(_freq_,value));
 end;
 drop _type_ _freq_ value;
run;
proc print noobs;run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 124 views
  • 1 like
  • 4 in conversation