Hi Everyone,
I posted a version of this problem before and then discovered another aspect of my data. Thus I add this new situation here. Please be patient with me. Many thanks.
My data has a list of Name column (name1-name4) corresponding to var column (var1-var4).
I want to compare name(i) with name(j). If they are the same, I will join var(i) and var(j) under a new variable name var_i.
Since I can have (name1=name3) and (name2=name4) (see row 3), I need to create a full range of jointext_1-jointext_4.
Based on my data want, the output should be like that
for row 1, as name1=name2, note will be 12 and jointext_1=1,2 and jointext_2=blank (as we don't touch var3 and var4, jointext_3=var3, jointext_4=var4)
for row 2, as name1=name3=name4, note will be 134 and jointext_1=10,30,40 and jointext_3=blank and jointext_4=blank (as we don't touch var2, jointext_2=var2)
for row 3 is the NEW situation where I have two pair of same name:
name1= name3, note will be 13 and jointext_1=100,300 and jointext_3=blank
name2=name4, note will be 24 and jointext_2=200,4000 and jointext_4=blank
for row4, as name2=name3=name4, note will be 234 and jointext_2 =2000,3000,3 and jointext_3=blank and and jointext_4=blank (as we don't touch var1, jointext_1=var1)
Row 5 has no name the same, so jointext[i]=var[i]
Can you please help me again with my problem?
Thank you so much.
HHC
data have; input name1 $ name2 $ name3 $ name4 $ var1 var2 var3 var4;
datalines;
aa aa abc ee 1 2 3 4
aa dsd aa aa 10 20 30 40
aa ee aa ee 100 200 300 4000
cc aa aa aa 1000 2000 3000 3
vv bb cc aa 1 2 3 4
;run;
data WANT;
set HAVE;
drop i j;
length new_name1-new_name4 JOINTEXT1-JOINTEXT4 $20;
array new_name[*] new_name1-new_name4;
array NAME[*] NAME1-NAME4 ;
array VAR[*] VAR1-VAR4;
array JOINTEXT[*] JOINTEXT1-JOINTEXT4;
do v=1 to dim(NAME);
new_name[v]=name[v];
JOINTEXT[v]='';
end;
length NOTE JOIN $40;
do I=1 to dim(NAME)-1;
do J=I+1 to dim(NAME);
if NAME[I]=NAME[J] then do;
if ^index(NOTE,cats(I)) then do;
NOTE=catx(',',NOTE,I);
JOIN=catx(',',JOIN,VAR[I]);
end;
if ^index(NOTE,cats(J)) then do;
NOTE=catx(',',NOTE,J);
JOIN=catx(',',JOIN,VAR[J]);
new_name[J]='';
end;
end;
end;
end;
run;
Hi Ksharp,
I wish I am able to modify your code to take care of the last section in the statement below
for row 1, as name1=name2, note will be 12 and jointext_1=1,2 and jointext_2=blank (as we don't touch var3 and var4, jointext_3=var3, jointext_4=var4)
However, I am able to add a step that finally get what I am looking for.
Quite lengthy!
data have; input name1 $ name2 $ name3 $ name4 $ name5 $ var1 var2 var3 var4 var5;
datalines;
aa aa abc ee ee 1 2 3 4 5
aa dsd aa aa dsd 10 20 30 40 50
aa ee aa ee ee 100 200 300 400 500
cc aa aa aa cc 1 2 3 4 5
vv vv cc cc aa 10 20 30 40 50
aa vv bb nn mm 1 2 3 4 5
;run;
data WANT;
set HAVE;
drop i j;
length new_name1-new_name5 $10;
length JOINTEXT1-JOINTEXT5 $20;
length new_note1-new_note5 $10;
array new_name[*] new_name:;
array NAME[*] NAME: ;
array VAR[*] VAR:;
array JOINTEXT[*] JOINTEXT:;
array new_note[*] new_note:;
first=0;
do v=1 to dim(NAME);
new_name[v]=name[v];
JOINTEXT[v]='';
end;
length NOTE $10;
length JOIN $20;
do I=1 to dim(NAME)-1;
Note='';
Join='';
do J=I+1 to dim(NAME);
if NAME[I]=NAME[J] then do;
if first=0 then first=i;
if ^index(NOTE,cats(I)) then do;
NOTE=catx(',',NOTE,I);
JOIN=catx(',',JOIN,VAR[I]);
end;
if ^index(NOTE,cats(J)) then do;
NOTE=catx(',',NOTE,J);
JOIN=catx(',',JOIN,VAR[J]);
new_name[J]='';
end;
end;
Jointext[i]=Join;
new_note[i]=note;
end;
end;
run;
*final modify;
data want_final2; set want;
drop new_note: first v note Join i j;
array new_note[*] new_note:;
array new_name[*] new_name:;
array jointext[*] jointext:;
array VAR[*] VAR:;
do i=1 to dim(new_note)-1;
do j=i+1 to dim(new_note);
if new_note[i]^='' and new_note[j]^='' and index(compress(new_note[i]),compress(new_note[j]))>0 then new_note[j]='';
end;
if new_note[i]='' then jointext[i]='';
end;
do v=1 to dim(new_name);
if jointext[v]='' and new_name[v]^='' then jointext[v]=var[v];
end;
run;
If you have more than 3 variables, the problem is more complex than you think.
You could start building two distinct groups only to discover later than they are actually one single group.
The best way to solve this is to reorganise your data vertically, in pairs.
@PGStats has a macro to handle this https://communities.sas.com/t5/SAS-Communities-Library/How-to-find-all-connected-components-in-a-gra...
data have; input name1 $ name2 $ name3 $ name4 $ var1 var2 var3 var4; datalines; aa aa abc ee 1 2 3 4 aa dsd aa aa 10 20 30 40 aa ee aa ee 100 200 300 4000 cc aa aa aa 1000 2000 3000 3 vv bb cc aa 1 2 3 4 ;run; data want; set have; array n{*} $ name: ; array v{*} $ var: ; array join{3} $ 80 jointext_1 - jointext_3; array find{4} $ 80 _temporary_; call missing(of find{*}); do i=1 to dim(n)-1; k=0; if n{i} not in find then do; do j=i+1 to dim(n); if n{i}=n{j} then do; k+1; if k=1 then join{i}=catx(',',v{i},v{j}); else join{i}=catx(',',join{i},v{j}); end; end; end; find{i}=n{i}; end; drop i j k; run;
Hi Ksharp,
I wish I am able to modify your code to take care of the last section in the statement below
for row 1, as name1=name2, note will be 12 and jointext_1=1,2 and jointext_2=blank (as we don't touch var3 and var4, jointext_3=var3, jointext_4=var4)
However, I am able to add a step that finally get what I am looking for.
Quite lengthy!
data have; input name1 $ name2 $ name3 $ name4 $ name5 $ var1 var2 var3 var4 var5;
datalines;
aa aa abc ee ee 1 2 3 4 5
aa dsd aa aa dsd 10 20 30 40 50
aa ee aa ee ee 100 200 300 400 500
cc aa aa aa cc 1 2 3 4 5
vv vv cc cc aa 10 20 30 40 50
aa vv bb nn mm 1 2 3 4 5
;run;
data WANT;
set HAVE;
drop i j;
length new_name1-new_name5 $10;
length JOINTEXT1-JOINTEXT5 $20;
length new_note1-new_note5 $10;
array new_name[*] new_name:;
array NAME[*] NAME: ;
array VAR[*] VAR:;
array JOINTEXT[*] JOINTEXT:;
array new_note[*] new_note:;
first=0;
do v=1 to dim(NAME);
new_name[v]=name[v];
JOINTEXT[v]='';
end;
length NOTE $10;
length JOIN $20;
do I=1 to dim(NAME)-1;
Note='';
Join='';
do J=I+1 to dim(NAME);
if NAME[I]=NAME[J] then do;
if first=0 then first=i;
if ^index(NOTE,cats(I)) then do;
NOTE=catx(',',NOTE,I);
JOIN=catx(',',JOIN,VAR[I]);
end;
if ^index(NOTE,cats(J)) then do;
NOTE=catx(',',NOTE,J);
JOIN=catx(',',JOIN,VAR[J]);
new_name[J]='';
end;
end;
Jointext[i]=Join;
new_note[i]=note;
end;
end;
run;
*final modify;
data want_final2; set want;
drop new_note: first v note Join i j;
array new_note[*] new_note:;
array new_name[*] new_name:;
array jointext[*] jointext:;
array VAR[*] VAR:;
do i=1 to dim(new_note)-1;
do j=i+1 to dim(new_note);
if new_note[i]^='' and new_note[j]^='' and index(compress(new_note[i]),compress(new_note[j]))>0 then new_note[j]='';
end;
if new_note[i]='' then jointext[i]='';
end;
do v=1 to dim(new_name);
if jointext[v]='' and new_name[v]^='' then jointext[v]=var[v];
end;
run;
I used this as a PROC TRANSPOSE exercise for myself. I was wondering if you need a new_name1 and a new_name_2 on a single row? If not, this maybe useful to you.
PROC TRANSPOSE DATA=HAVE
OUT=TRNSTRANSPOSED_name
PREFIX=_name
NAME=source1
;
BY n;
VAR name1-name5;
PROC TRANSPOSE DATA=HAVE
OUT=TRNSTransposed_var
PREFIX=_var
NAME=source2
;
BY n;
VAR var1-var5;
data mrg;
do m=1 by 1 until (last.n);
merge TRNSTRANSPOSED_name
TRNSTransposed_var
;
by n;
output;
end;
run;
proc sort;
by n _name1 m;
data want (keep=n new_name note jointext);
length new_name note jointext $20;
NOTE=""; jointext="";
do p=1 by 1 until (last._name1);
set mrg; by n _name1 ;
new_name=_name1;
Note=cats(Note,m);
jointext=catx(",",jointext,_var1);
end;
if p>1 then output;
run;
Please vote this up Proc Transpose : add flexibility to the BY and VAR... - SAS Ballot.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.