Solved: Array to join text - extension

hhchenfx · Posted 02-17-2021 10:45 PM

Hi Everyone,

I posted a version of this problem before and then discovered another aspect of my data. Thus I add this new situation here. Please be patient with me. Many thanks.

My data has a list of Name column (name1-name4) corresponding to var column (var1-var4).
I want to compare name(i) with name(j). If they are the same, I will join var(i) and var(j) under a new variable name var_i.

Since I can have (name1=name3) and (name2=name4) (see row 3), I need to create a full range of jointext_1-jointext_4.

Based on my data want, the output should be like that

for row 1, as name1=name2, note will be 12 and jointext_1=1,2 and jointext_2=blank (as we don't touch var3 and var4, jointext_3=var3, jointext_4=var4)
for row 2, as name1=name3=name4, note will be 134 and jointext_1=10,30,40 and jointext_3=blank and jointext_4=blank (as we don't touch var2, jointext_2=var2)

for row 3 is the NEW situation where I have two pair of same name:

name1= name3, note will be 13 and jointext_1=100,300 and jointext_3=blank

name2=name4, note will be 24 and jointext_2=200,4000 and jointext_4=blank

for row4, as name2=name3=name4, note will be 234 and jointext_2 =2000,3000,3 and jointext_3=blank and and jointext_4=blank (as we don't touch var1, jointext_1=var1)

Row 5 has no name the same, so jointext[i]=var[i]

Can you please help me again with my problem?

Thank you so much.

HHC

data have; input name1 $ name2 $ name3 $ name4 $ var1 var2 var3 var4;
datalines;
aa aa abc ee 1 2 3 4
aa dsd aa aa 10 20 30 40
aa ee aa ee 100 200 300 4000
cc aa aa aa 1000 2000 3000 3
vv bb cc aa 1 2 3 4
;run;

data WANT; 
  set HAVE;
  drop i j;
  length new_name1-new_name4 JOINTEXT1-JOINTEXT4 $20;
  array new_name[*] new_name1-new_name4;
  array NAME[*] NAME1-NAME4 ;
  array VAR[*] VAR1-VAR4;
  array JOINTEXT[*] JOINTEXT1-JOINTEXT4;

	do v=1 to dim(NAME);
		new_name[v]=name[v];
		JOINTEXT[v]='';
	end;

  length NOTE JOIN $40;
	do I=1 to dim(NAME)-1;
  	do J=I+1 to dim(NAME);
      if NAME[I]=NAME[J] then do;     
        if ^index(NOTE,cats(I)) then do;
          NOTE=catx(',',NOTE,I); 
          JOIN=catx(',',JOIN,VAR[I]);

         end; 
  	     if ^index(NOTE,cats(J)) then do;
           NOTE=catx(',',NOTE,J); 
  	       JOIN=catx(',',JOIN,VAR[J]);
		   new_name[J]=''; 
         end; 
      end; 
    end;                 
  end;
run;

hhchenfx · Posted 02-18-2021 01:05 PM

Hi Ksharp,

I wish I am able to modify your code to take care of the last section in the statement below

for row 1, as name1=name2, note will be 12 and jointext_1=1,2 and jointext_2=blank (as we don't touch var3 and var4, jointext_3=var3, jointext_4=var4)

However, I am able to add a step that finally get what I am looking for.

Quite lengthy!

data have; input name1 $ name2 $ name3 $ name4 $ name5 $ var1 var2 var3 var4 var5;
datalines;
aa aa abc ee ee 1 2 3 4 5
aa dsd aa aa dsd 10 20 30 40 50
aa ee aa ee ee 100 200 300 400 500
cc aa aa aa cc 1 2 3 4 5
vv vv cc cc aa 10 20 30 40 50
aa vv bb nn mm 1 2 3 4 5
;run;

data WANT; 
  set HAVE;
  drop i j;
  length new_name1-new_name5 $10;
  length JOINTEXT1-JOINTEXT5 $20;
  length new_note1-new_note5 $10;

  array new_name[*] new_name:;
  array NAME[*] NAME: ;
  array VAR[*] VAR:;
  array JOINTEXT[*] JOINTEXT:;
  array new_note[*] new_note:;

  first=0;

	do v=1 to dim(NAME);
		new_name[v]=name[v];
		JOINTEXT[v]='';
	end;

  length NOTE $10;
  length JOIN $20;
	do I=1 to dim(NAME)-1;
	Note='';
	Join='';
  	do J=I+1 to dim(NAME);
	      if NAME[I]=NAME[J] then do; 
				if first=0 then first=i; 
	        if ^index(NOTE,cats(I)) then do;
	          NOTE=catx(',',NOTE,I); 
	          JOIN=catx(',',JOIN,VAR[I]);

	         end; 
	  	     if ^index(NOTE,cats(J)) then do;
	           NOTE=catx(',',NOTE,J); 
	  	       JOIN=catx(',',JOIN,VAR[J]);
			   new_name[J]=''; 
	         end; 
      end;
	Jointext[i]=Join; 
	new_note[i]=note;
    end;                 
  end;
run;

*final modify;
data want_final2; set want;
drop new_note: first v note Join i j;
array new_note[*] new_note:;
array new_name[*] new_name:;
array jointext[*] jointext:;
array VAR[*] VAR:;

do i=1 to dim(new_note)-1;
do j=i+1 to dim(new_note);

if new_note[i]^='' and new_note[j]^='' and index(compress(new_note[i]),compress(new_note[j]))>0  then new_note[j]='';
end;
if new_note[i]='' then jointext[i]='';
end;

do v=1 to dim(new_name);
if jointext[v]='' and new_name[v]^='' then jointext[v]=var[v];
end;
run;

View solution in original post

ChrisNZ · Posted 02-18-2021 02:07 AM

If you have more than 3 variables, the problem is more complex than you think.

You could start building two distinct groups only to discover later than they are actually one single group.

The best way to solve this is to reorganise your data vertically, in pairs.

@PGStats has a macro to handle this https://communities.sas.com/t5/SAS-Communities-Library/How-to-find-all-connected-components-in-a-gra...

High-Performance SAS Coding - Third Edition

Ksharp · Posted 02-18-2021 06:42 AM

data have; input name1 $ name2 $ name3 $ name4 $ var1 var2 var3 var4;
datalines;
aa aa abc ee 1 2 3 4
aa dsd aa aa 10 20 30 40
aa ee aa ee 100 200 300 4000
cc aa aa aa 1000 2000 3000 3
vv bb cc aa 1 2 3 4
;run;

data want;
 set have;
 array n{*} $ name: ;
 array v{*} $ var: ;
 array join{3} $ 80 jointext_1 - jointext_3;
 array find{4} $ 80 _temporary_;
call missing(of find{*});
do i=1 to dim(n)-1;
  k=0;
  if n{i} not in find then do;
  do j=i+1 to dim(n);
   if n{i}=n{j}  then do;
    k+1;
    if k=1 then join{i}=catx(',',v{i},v{j});
	 else  join{i}=catx(',',join{i},v{j});
   end;
  end;
  end;
  find{i}=n{i};
end;
drop i j k;
run;

hhchenfx · Posted 02-18-2021 01:05 PM

Hi Ksharp,

I wish I am able to modify your code to take care of the last section in the statement below

for row 1, as name1=name2, note will be 12 and jointext_1=1,2 and jointext_2=blank (as we don't touch var3 and var4, jointext_3=var3, jointext_4=var4)

However, I am able to add a step that finally get what I am looking for.

Quite lengthy!

data have; input name1 $ name2 $ name3 $ name4 $ name5 $ var1 var2 var3 var4 var5;
datalines;
aa aa abc ee ee 1 2 3 4 5
aa dsd aa aa dsd 10 20 30 40 50
aa ee aa ee ee 100 200 300 400 500
cc aa aa aa cc 1 2 3 4 5
vv vv cc cc aa 10 20 30 40 50
aa vv bb nn mm 1 2 3 4 5
;run;

data WANT; 
  set HAVE;
  drop i j;
  length new_name1-new_name5 $10;
  length JOINTEXT1-JOINTEXT5 $20;
  length new_note1-new_note5 $10;

  array new_name[*] new_name:;
  array NAME[*] NAME: ;
  array VAR[*] VAR:;
  array JOINTEXT[*] JOINTEXT:;
  array new_note[*] new_note:;

  first=0;

	do v=1 to dim(NAME);
		new_name[v]=name[v];
		JOINTEXT[v]='';
	end;

  length NOTE $10;
  length JOIN $20;
	do I=1 to dim(NAME)-1;
	Note='';
	Join='';
  	do J=I+1 to dim(NAME);
	      if NAME[I]=NAME[J] then do; 
				if first=0 then first=i; 
	        if ^index(NOTE,cats(I)) then do;
	          NOTE=catx(',',NOTE,I); 
	          JOIN=catx(',',JOIN,VAR[I]);

	         end; 
	  	     if ^index(NOTE,cats(J)) then do;
	           NOTE=catx(',',NOTE,J); 
	  	       JOIN=catx(',',JOIN,VAR[J]);
			   new_name[J]=''; 
	         end; 
      end;
	Jointext[i]=Join; 
	new_note[i]=note;
    end;                 
  end;
run;

*final modify;
data want_final2; set want;
drop new_note: first v note Join i j;
array new_note[*] new_note:;
array new_name[*] new_name:;
array jointext[*] jointext:;
array VAR[*] VAR:;

do i=1 to dim(new_note)-1;
do j=i+1 to dim(new_note);

if new_note[i]^='' and new_note[j]^='' and index(compress(new_note[i]),compress(new_note[j]))>0  then new_note[j]='';
end;
if new_note[i]='' then jointext[i]='';
end;

do v=1 to dim(new_name);
if jointext[v]='' and new_name[v]^='' then jointext[v]=var[v];
end;
run;

PhilC · Posted 02-18-2021 02:16 PM

I used this as a PROC TRANSPOSE exercise for myself. I was wondering if you need a new_name1 and a new_name_2 on a single row? If not, this maybe useful to you.

PROC TRANSPOSE DATA=HAVE
	OUT=TRNSTRANSPOSED_name
	PREFIX=_name
	NAME=source1
;
	BY n;
	VAR name1-name5;
PROC TRANSPOSE DATA=HAVE
	OUT=TRNSTransposed_var
	PREFIX=_var
	NAME=source2
;
	BY n;
	VAR var1-var5;
data mrg;
  do m=1 by 1 until (last.n);
    merge TRNSTRANSPOSED_name
          TRNSTransposed_var 
          ;
      by n;
    output;
  end;  
run;
proc sort;
 by n _name1 m;
data want (keep=n new_name note jointext);
  length  new_name note jointext $20;
  NOTE="";  jointext="";
  do p=1 by 1 until (last._name1);
    set mrg; by n _name1 ;
    new_name=_name1;
    Note=cats(Note,m);
    jointext=catx(",",jointext,_var1);
  end;
  if p>1 then output;
run;

Please vote this up Proc Transpose : add flexibility to the BY and VAR... - SAS Ballot.

Array to join text - extension

Re: Array to join text - extension

Re: Array to join text - extension

Re: Array to join text - extension

Re: Array to join text - extension

Re: Array to join text - extension

Array to join text - extension

Re: Array to join text - extension

Re: Array to join text - extension

Re: Array to join text - extension

Re: Array to join text - extension

Re: Array to join text - extension

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away