BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lpy0521
Fluorite | Level 6

Hello, community.

I am fairly new to SAS programming and get stuck on this:

Suppose I have two string combos as: combo1(word1 word2 word3 ) combo2(word2, word3, word4), how can I output:

the union of these two combos as: (word1, word2, word3, word4)

the intersection of these two combos as: (word2, word3)

 

Any comment is highly appreciated!!!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;

var1="word1 word2 word3 word4";
var2="word4 word3                  ";
output;

var2="word1 word2 word3 word4";
var1="word5 word6";
output;

var1="word1 word2 word3 word4";
var2="word1 word3";
output;

run;

data want;
 set have;
 length intersect union $ 200;
 array x{99} $ 32;
 n=0; 
 do i=1 to countw(var1,' ');
  temp=scan(var1,i,' ');
  if temp not in x then do;n+1;x{n}=temp;end;
 end;
 do i=1 to countw(var2,' ');
  temp=scan(var2,i,' ');
  if temp not in x then do;n+1;x{n}=temp;end;
   else intersect=catx(' ',intersect,temp);
 end;
 union=catx(' ',of x{*});
drop x: n i temp;
 run;

View solution in original post

9 REPLIES 9
Reeza
Super User

Best best is to separate the strings into the individual components and then use proc sort to remove duplicates. 

Depending on if you know the length of the string and it's fixed or dynamic there may be other approaches. 

 


@lpy0521 wrote:

Hello, community.

I am fairly new to SAS programming and get stuck on this:

Suppose I have two string combos as: combo1(word1 word2 word3 ) combo2(word2, word3, word4), how can I output:

the union of these two combos as: (word1, word2, word3, word4)

the intersection of these two combos as: (word2, word3)

 

Any comment is highly appreciated!!!


 

lpy0521
Fluorite | Level 6

@Reeza wrote:

Best best is to separate the strings into the individual components and then use proc sort to remove duplicates. 

Depending on if you know the length of the string and it's fixed or dynamic there may be other approaches. 

 


@lpy0521 wrote:

Hello, community.

I am fairly new to SAS programming and get stuck on this:

Suppose I have two string combos as: combo1(word1 word2 word3 ) combo2(word2, word3, word4), how can I output:

the union of these two combos as: (word1, word2, word3, word4)

the intersection of these two combos as: (word2, word3)

 

Any comment is highly appreciated!!!


 


My string combo actually contains more than 10 words, so split into individual component may create too many redundant variables for me.

 

Thanks.

Reeza
Super User

Why create multiple variables? You can have a single variable with multiple values. 

 


@lpy0521 wrote:


My string combo actually contains more than 10 words, so split into individual component may create too many redundant variables for me.

 

Thanks.


 

Astounding
PROC Star

General approach:  create additional variables COMBO3 and COMBO4 holding the results you want.  

 

SAS has tools (FINDW, SCAN) that make this task possible.  But to start programming, you have to clarify:

 

  • Do the incoming variables actually contain parentheses?
  • Do the incoming variables actually contain commas?
  • Same questions for the results ... are parentheses and commas required?  Optional?
  • Are two words considered to be the same if they contain the same letters but different capitalization?

 

lpy0521
Fluorite | Level 6

@Astounding wrote:

General approach:  create additional variables COMBO3 and COMBO4 holding the results you want.  

 

SAS has tools (FINDW, SCAN) that make this task possible.  But to start programming, you have to clarify:

 

  • Do the incoming variables actually contain parentheses?
  • Do the incoming variables actually contain commas?
  • Same questions for the results ... are parentheses and commas required?  Optional?
  • Are two words considered to be the same if they contain the same letters but different capitalization?

 


For the incoming variables, each words are space delimited and all in Capital, no parentheses nor commas required. For the results, the same criteria applies: space delimited and in cap and no parentheses or commas.

 

If would be very grateful if you can show me some sample code on how to achieve this.

 

Thanks! 

Reeza
Super User

Can you post some sample data to accurately reflect your data? Quick example is below.

 

data have;

var1="word1, word2, word3, word4";
var2="word4, word3";
output;

var1="word1, word2, word3, word4";
var2="word4, word3";
output;

var1="word1, word2, word3, word4";
var2="word4, word3";
output;

run;
PGStats
Opal | Level 21

Building on @Reeza example set.

 

data have;

var1="word1 word2 word3 word4";
var2="word4 word3";
output;

var1="word1 word2 word3 word4";
var2="word5 word6";
output;

var1="word1 word2 word3 word4";
var2="word1 word3";
output;

run;

data want;
length word $16 var3 var4 $120;
set have;
var3 = var1;
var4 = " ";
do i = 1 to countw(var2);
    word = scan(var2, i);
    if not findw(var3, word,, "ts") then var3 = catx(" ", var3, word);
    if findw(var1, word,, "ts") then var4 = catx(" ", var4, word);
    end;
drop word i;
run;

proc print data=want noobs; run;
PG
Ksharp
Super User
data have;

var1="word1 word2 word3 word4";
var2="word4 word3                  ";
output;

var2="word1 word2 word3 word4";
var1="word5 word6";
output;

var1="word1 word2 word3 word4";
var2="word1 word3";
output;

run;

data want;
 set have;
 length intersect union $ 200;
 array x{99} $ 32;
 n=0; 
 do i=1 to countw(var1,' ');
  temp=scan(var1,i,' ');
  if temp not in x then do;n+1;x{n}=temp;end;
 end;
 do i=1 to countw(var2,' ');
  temp=scan(var2,i,' ');
  if temp not in x then do;n+1;x{n}=temp;end;
   else intersect=catx(' ',intersect,temp);
 end;
 union=catx(' ',of x{*});
drop x: n i temp;
 run;
Rick_SAS
SAS Super FREQ

For a discussion of how to use COUNTW and SCAN to split a string into words (with arbitrary delimiters), see the article "Break a sentence into words in SAS."

You can then use KSharp's approach to find the union and intersection.

 

For SAS/IML programmers, the solution is quite short:

 

proc iml;
a = "to be or not to be";
b = "2 b or not 2 b";
delims = ' ,.!';
a = scan(a, 1:countw(a, delims), delims); 
b = scan(b, 1:countw(b, delims), delims); 
intersect = xsect(a,b);
union = union(a,b);
print intersect, union;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 4801 views
  • 2 likes
  • 6 in conversation