Hi Everyone,
My data has a list of Name column (name1-name3) corresponding to var column (var1-var3).
I want to compare name(i) with name(j). If they are the same, I will join var(i) and var(j) under a new variable name var_ij.
Based on my data want, the output should be like that
for row 1, as name1=name2, note will be 12 and jointext_12=1,2
for row 2, as name1=name3, note will be 13 and jointext_13=10,30
for row 3, as name1=name2=name3, note will be 123 and jointext_123=100,200,300
for row4, as name2=name3, note will be 23 and jointext_23 =2000,3000
Row 5 has no name the same, so there is nothing to do.
Can you please help with my problem?
Thank you,
HHC
data have; input name1 $ name2 $ name3 $ var1 var2 var3;
datalines;
aa aa abc 1 2 3
aa dsd aa 10 20 30
aa aa aa 100 200 300
cc aa aa 1000 2000 3000
aa bb cc 1 2 3
;run;
data want; set have;
array name(*) name1-name3;
do i=1 to dim(name)-1;
do j=i+1 to dim(name);
if name[i]=name[j] then do;
note=cat(i,j);
jointext_&i&j=catx(','var&i,var&j);
end;
end;
end;
run;
Like this?
data WANT;
set HAVE;
array NAME[*] NAME1-NAME3 ;
array VAR[*] VAR1-VAR3;
length NOTE JOIN $40;
do I=1 to dim(NAME)-1;
do J=I+1 to dim(NAME);
if NAME[I]=NAME[J] then do;
if ^index(NOTE,cats(I)) then do;
NOTE=catx(',',NOTE,I);
JOIN=catx(',',JOIN,VAR[I]);
end;
if ^index(NOTE,cats(J)) then do;
NOTE=catx(',',NOTE,J);
JOIN=catx(',',JOIN,VAR[J]);
end;
end;
end;
end;
run;
I do not believe creating variable names with data in the name is a good idea, so I skipped that.
NOTE | JOIN |
---|---|
1,2 | 1,2 |
1,3 | 10,30 |
1,2,3 | 100,200,300 |
2,3 | 2000,3000 |
Please try to describe the rules a little clearer. Listing examples does not provide a rule.
And where do the two macro variables get the values from?
Are you expecting to use this as a "generic" solution for an unknown number of Name and var variables?
Like this?
data WANT;
set HAVE;
array NAME[*] NAME1-NAME3 ;
array VAR[*] VAR1-VAR3;
length NOTE JOIN $40;
do I=1 to dim(NAME)-1;
do J=I+1 to dim(NAME);
if NAME[I]=NAME[J] then do;
if ^index(NOTE,cats(I)) then do;
NOTE=catx(',',NOTE,I);
JOIN=catx(',',JOIN,VAR[I]);
end;
if ^index(NOTE,cats(J)) then do;
NOTE=catx(',',NOTE,J);
JOIN=catx(',',JOIN,VAR[J]);
end;
end;
end;
end;
run;
I do not believe creating variable names with data in the name is a good idea, so I skipped that.
NOTE | JOIN |
---|---|
1,2 | 1,2 |
1,3 | 10,30 |
1,2,3 | 100,200,300 |
2,3 | 2000,3000 |
WOW, that's amazing piece of code.
I wonder what this line do?
if ^index(NOTE,cats(I)) then do;
Thank you so much.
HHCFX
this line
if ^index(NOTE,cats(I)) then do;
is the same as
if not index(NOTE,cats(I)) then do;
Is this supposed to generalize to more than 3 names?
If so, you will need a rule for multiple values having matches (i.e. 'aa','bb','aa','bb','aa'). And you won't be able to just instantly look for matches as you currently attepmt. You'll have to determine whichever value has the most matches, assuming that is the goal. Not to mention you'll need a rule for what to do if you have ties.
But if it's only 3 names then arrays are no particular help. You would just to direct comparisons using the variable names.
But if you do need an array, then you need an array over names, as you have specified, and also an array over vars, which you haven't (instead you are attempting to use macro value which won't help here).
Can you have two multiples? Ie aa, aa, ab, ab => 2xaa and 2xab?
I would flip it long, sort and combine into the desired calculation. And then transpose it back. I suspect you want to check a few different combinations and a long form would work better as it's easier to control the join conditions. Depending on exactly what you're doing you may also want to consider dummy coding the values instead and then using PROC CORR to get a distance type matrix trick but you didn't explain what you're trying to do.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.