BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DBloom
Fluorite | Level 6

Hello!  I'm new to SAS (but not programming) and would really appreciate advice on an issue I'm working on.  Please note that the following is a simple test I've constructed for my own understanding before I apply it to a larger dataset with more criteria.

 

Objective: I'd like to declare and populate a dynamic array (word_array) with the words that precede each instance of the word "SAS."

Output: Using the example string below with 4 instances of "SAS", the desired output I'm after is...

 

word_array{1} = It looks like

word_array{2} = is the way to go I love

word_array{3} = oh yasss

word_array{4} = woohoo

 

Current Code: Is as follows...

 

 

%let string=It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS;

data _null_; 
   string=symget('string');
   cnt=countc(string,'SAS')/3;
   array word_array{*} _CHARACTER_;
   j=1;
   do i = 1 by 1 to cnt;
      put i=;
	  put j=;
	  put string=;
   	  if length(word_array{j})>lengthc(word_array{j}) then j+1;
	  word_array{j}=trim(scan(string,i,'SAS'));
	  put word_array{j}=;
      end; 
   stop; 
   run; 

 

 

What I can see from my puts (put i=; etc) is that my dynamic array word_array is not advancing to the next instance of "SAS" - and I'm not sure whether it's my poor syntax around (1) declaring the macro variable string or (2) the dynamic array word_array or (3) that j is not incrementing forward.

 

Current Output: Is shown below...

 

i=1
j=1
string=It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS
string=It looks like
i=2
j=1
string=It looks like
string= 
i=3
j=1
string= 
string= 
i=4
j=1
string= 
string= 

Many thanks in advance - I've been reading through a lot of online documentation and I'm sure there's got to be a simple solution that I'm just not seeing.  Let me know if I'm not following posting protocols and I will revise.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

An easy way to split strings is to use a special character as the marker. I chose 01x here.

It may or not be useful to you:

data _null_;
  STR1='It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS';
  STR2=tranwrd(STR1,'SAS','01'x);
  do I=1 to countw(STR2,'01'x);
    STR3=scan(STR2,I,'01'x);
    if STR3 ne ' ' then putlog STR3=;
  end;
run;

STR3=It looks like
STR3=is the way to go I love
STR3=oh yasss
STR3=woohoo

 

 

View solution in original post

11 REPLIES 11
Reeza
Super User

cnt=countc(string,'SAS')/3;

Perhaps use COUNTW to find the number of times SAS is in the string? 

COUNTC counts character so I'm not sure what that's supposed to be doing here.

 


  array word_array{*} _CHARACTER_;


This declares an array of all the character variables. SAS does not support dynamic sized arrays. _CHARACTER_ is a shortcut reference to all character variables. 

 

 do i = 1 by 1 to cnt;

 A do statement is usually 

 

DO i=1 to cnt by 1 (wrong order in yours).

 

 if length(word_array{j})>lengthc(word_array{j}) then j+1;

Since you only have one variable this will work the first time and then not again, which is what you're seeing.

 

Since SAS does not support dynamic arrays, an option that I would suggest would be to output the strings as a new line each and then transpose if desired.  

 

%let string=It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS;

data outStrings;
    string="&string";
    *Find the number of times SAS is in the string;
    count_sas=countw(string, 'SAS');
    start=1;

    do i=1 to count_sas;
        *Find the word in the string;
        loc=findw(string, 'SAS', start);
        *substring the component;
        outString=substr(string, start, loc-1-start);
        *output to a data set;
        output;
        *set the start location to after the last word, this can be dynamic if needed;
        start=loc+3;
    end;
    
    *keep only relevant variables;
    rename i=index;
    keep string outString i;
run;

*reshape to a wide format;
proc transpose data=outStrings out=Want prefix=String;
    by string;
    id index;
    var outString;
run;

@DBloom wrote:

Hello!  I'm new to SAS (but not programming) and would really appreciate advice on an issue I'm working on.  Please note that the following is a simple test I've constructed for my own understanding before I apply it to a larger dataset with more criteria.

 

Objective: I'd like to declare and populate a dynamic array (word_array) with the words that precede each instance of the word "SAS."

Output: Using the example string below with 4 instances of "SAS", the desired output I'm after is...

 

word_array{1} = It looks like

word_array{2} = is the way to go I love

word_array{3} = oh yasss

word_array{4} = woohoo

 

Current Code: Is as follows...

 

 

%let string=It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS;

data _null_; 
   string=symget('string');
   cnt=countc(string,'SAS')/3;
   array word_array{*} _CHARACTER_;
   j=1;
   do i = 1 by 1 to cnt;
      put i=;
	  put j=;
	  put string=;
   	  if length(word_array{j})>lengthc(word_array{j}) then j+1;
	  word_array{j}=trim(scan(string,i,'SAS'));
	  put word_array{j}=;
      end; 
   stop; 
   run; 

 

 

What I can see from my puts (put i=; etc) is that my dynamic array word_array is not advancing to the next instance of "SAS" - and I'm not sure whether it's my poor syntax around (1) declaring the macro variable string or (2) the dynamic array word_array or (3) that j is not incrementing forward.

 

Current Output: Is shown below...

 

i=1
j=1
string=It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS
string=It looks like
i=2
j=1
string=It looks like
string= 
i=3
j=1
string= 
string= 
i=4
j=1
string= 
string= 

Many thanks in advance - I've been reading through a lot of online documentation and I'm sure there's got to be a simple solution that I'm just not seeing.  Let me know if I'm not following posting protocols and I will revise.


 

DBloom
Fluorite | Level 6

Many thanks Reeza!  Quick question - my complete dataset will inevitably contain special characters...for instance, in place of "SAS" in the below example, the data may contain multiple semicolons ";"  that will be used as delimiters - how should I adjust the macro declaration for this?

Reeza
Super User

@DBloom wrote:

Many thanks Reeza!  Quick question - my complete dataset will inevitably contain special characters...for instance, in place of "SAS" in the below example, the data may contain multiple semicolons ";"  that will be used as delimiters - how should I adjust the macro declaration for this?


It shouldn't affect the code I provided, unless you need to account for them somehow. In that case you need to provide an example.

ChrisNZ
Tourmaline | Level 20

An easy way to split strings is to use a special character as the marker. I chose 01x here.

It may or not be useful to you:

data _null_;
  STR1='It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS';
  STR2=tranwrd(STR1,'SAS','01'x);
  do I=1 to countw(STR2,'01'x);
    STR3=scan(STR2,I,'01'x);
    if STR3 ne ' ' then putlog STR3=;
  end;
run;

STR3=It looks like
STR3=is the way to go I love
STR3=oh yasss
STR3=woohoo

 

 

novinosrin
Tourmaline | Level 20

@ChrisNZ Sir, That's very smart clever. I am jealous!!!!!!!!!!!!!!!!!!!!!!

ChrisNZ
Tourmaline | Level 20

Flattery works. +1 for you 🙂

 

And thank you!

DBloom
Fluorite | Level 6

Hi ChrisNZ,

May I ask what does your hexadecimal marker "01X" do here?  For instance, if I had to apply this to a different string that was looking up numerical strings as delimiters, would this still work?

Many thanks!

Dale

ChrisNZ
Tourmaline | Level 20

'01'x is a non printable-character with hexadecimal code 01.

Like 09x is a tab and 20x is the space character.

Since 01x is extremely rare in strings, it is a good choice for such cases.  

DBloom
Fluorite | Level 6

Thank you!  I accepted this as an answer due to its brevity and updated it for my use case:

data w;
  str1='It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS';
  str2=tranwrd(str1,'SAS','01'x);
  do i=1 to countw(str1,'SAS');
    str3=trim(scan(str2,i,'01'x));
    if str3 ne ' ' then putlog str3=;
	output;
  end;
  keep str1 i str3;
run;

proc transpose data=w label=i name=str1 prefix=str3;
run;
novinosrin
Tourmaline | Level 20
data w;
string='It looks like SAS is the way to go I love SAS oh yasss SAS woohoo SAS';
length word $50;
k=1;
do n=1 to countw(string);
if scan(string,n)='SAS' then  do;
call scan(string, n, position, length);
word=substr(string, k, position-k);
k=position+length;
output;
end;
end;
keep word;
run;

proc transpose data=w out=want prefix=word;
var word;
run;

 

DBloom
Fluorite | Level 6

Many thanks novinosrin!  Out of curiosity (since the full dataset I'm working with will inevitably have special characters)... if the string contained semicolons ";" as delimiters instead of the word "SAS" , how would I adjust this?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1278 views
  • 13 likes
  • 4 in conversation