BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hi Everyone,


My data has a list of Name column (name1-name3) corresponding to var column (var1-var3).
I want to compare name(i) with name(j). If they are the same, I will join var(i) and var(j) under a new variable name var_ij.

 

Based on my data want, the output should be like that 

for row 1, as name1=name2, note will be 12 and jointext_12=1,2
for row 2, as name1=name3, note will be 13 and jointext_13=10,30
for row 3, as name1=name2=name3, note will be 123 and jointext_123=100,200,300

for row4, as name2=name3, note will be 23 and jointext_23 =2000,3000

Row 5 has no name the same, so there is nothing to do.

 

Can you please help with my problem?

 

Thank you,

 

HHC

data have; input name1 $ name2 $ name3 $ var1 var2 var3;
datalines;
aa aa abc 1 2 3
aa dsd aa 10 20 30
aa aa aa 100 200 300
cc aa aa 1000 2000 3000
aa bb cc 1 2 3
;run;

data want; set have;
array name(*) name1-name3;
	do i=1 to dim(name)-1;
	do j=i+1 to dim(name);
		if name[i]=name[j] then do; 
			note=cat(i,j); 
			jointext_&i&j=catx(','var&i,var&j);
			end; 
	end;
	end;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

data WANT; 
  set HAVE;
  array NAME[*] NAME1-NAME3 ;
  array VAR[*] VAR1-VAR3;
  length NOTE JOIN $40;
	do I=1 to dim(NAME)-1;
  	do J=I+1 to dim(NAME);
      if NAME[I]=NAME[J] then do;     
        if ^index(NOTE,cats(I)) then do;
          NOTE=catx(',',NOTE,I); 
          JOIN=catx(',',JOIN,VAR[I]);
         end; 
  	     if ^index(NOTE,cats(J)) then do;
           NOTE=catx(',',NOTE,J); 
  	       JOIN=catx(',',JOIN,VAR[J]);
         end; 
      end; 
    end;                 
  end;
run;

I do not believe creating variable names with data in the name is a good idea, so I skipped that.


NOTE JOIN
1,2 1,2
1,3 10,30
1,2,3 100,200,300
2,3 2000,3000
   

 

View solution in original post

6 REPLIES 6
ballardw
Super User

Please try to describe the rules a little clearer. Listing examples does not provide a rule.

 

And where do the two macro variables get the values from?

 

Are you expecting to use this as a "generic" solution for an unknown number of Name and var variables?

ChrisNZ
Tourmaline | Level 20

Like this?

data WANT; 
  set HAVE;
  array NAME[*] NAME1-NAME3 ;
  array VAR[*] VAR1-VAR3;
  length NOTE JOIN $40;
	do I=1 to dim(NAME)-1;
  	do J=I+1 to dim(NAME);
      if NAME[I]=NAME[J] then do;     
        if ^index(NOTE,cats(I)) then do;
          NOTE=catx(',',NOTE,I); 
          JOIN=catx(',',JOIN,VAR[I]);
         end; 
  	     if ^index(NOTE,cats(J)) then do;
           NOTE=catx(',',NOTE,J); 
  	       JOIN=catx(',',JOIN,VAR[J]);
         end; 
      end; 
    end;                 
  end;
run;

I do not believe creating variable names with data in the name is a good idea, so I skipped that.


NOTE JOIN
1,2 1,2
1,3 10,30
1,2,3 100,200,300
2,3 2000,3000
   

 

hhchenfx
Barite | Level 11

WOW, that's amazing piece of code.

I wonder what this line do? 

if ^index(NOTE,cats(I)) then do;

Thank you so much.

HHCFX

ChrisNZ
Tourmaline | Level 20

this line 

if ^index(NOTE,cats(I)) then do;

is the same as 

if not index(NOTE,cats(I)) then do;

 

mkeintz
PROC Star

Is this supposed to generalize to more than 3 names? 

 

If so, you will need a rule for multiple values having matches  (i.e. 'aa','bb','aa','bb','aa'). And you won't be able to just instantly look for matches as you currently attepmt.  You'll have to determine whichever value has the most matches, assuming that is the goal.  Not to mention you'll need a rule for what to do if you have ties.

 

But if it's only 3 names then arrays are no particular help. You would just to direct comparisons using the variable names.

 

 

 

But if you do need an array, then you need an array over names, as you have specified, and also an array over vars, which you haven't (instead you are attempting to use macro value which won't help here).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

Can you have two multiples? Ie aa, aa, ab, ab => 2xaa and 2xab?

 

I would flip it long, sort and combine into the desired calculation. And then transpose  it back. I suspect you want to check a few different combinations  and a long form would work better as it's easier to control the join conditions. Depending on exactly what you're doing you may also want to consider dummy coding the values instead and then using PROC CORR to get a distance type matrix trick but you didn't explain what you're trying to do.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 897 views
  • 3 likes
  • 5 in conversation