Comparing values accross rows and columns

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Comparing values accross rows and columns

Hello everyone, I am new to SAS and data management in general so I would really appreciate any help! 

 

I have a dataset with multiple rows per subject and multiple columns of diagnoses codes for each row. For each subject i want to set any duplicate diagnoses as missing. The problem is that the duplicate diagnoses are in different rows. So I have:

 

Subject    D1    D2   D3   D4   D5 

1               a      b     c       d      e

1               a      d      .      .       .

1               f      a     b       .       .

2               g      c    h        e      d

2               e      d      j       .         .

 

and  want:

 

Subject    D1    D2   D3   D4   D5 

1               a      b     c       d      e

1               .      .      .      .       .

1               f      .      .      .       .

2               g      c    h        e      d

2               .      .      j       .         .

 

I can transpose from long to wide format and then use an array to compare among codes, set duplicates as missing and transpose back to long. But this is inefficient however as in the wide format i will end up with more than 2000 columns. Is there a more efficient way?  

 

Thank you in advance!


Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 2,061

Re: Comparing values accross rows and columns

data have;
input (Subject    D1    D2   D3   D4   D5 ) ($);
cards;
1               a      b     c       d      e
1               a      d      .      .       .
1               f      a     b       .       .
2               g      c    h        e      d
2               e      d      j       .         .
;


data want;
 if _n_ = 1 then do ;                          
    dcl hash h ( multidata:"y") ;
    h.definekey ("subject",'d') ;                   
    h.definedata ("subject",'d') ; 
	h.definedone(); 
	end;
set have ;
by subject;
array t(*) d1-d5;
do n=1 to dim(t);
d=t(n);
rc=h.check();
h.add(); 
if rc=0  then call missing(t(n));
end;
drop rc n d;
run;

View solution in original post


All Replies
Solution
3 weeks ago
Super User
Posts: 2,061

Re: Comparing values accross rows and columns

data have;
input (Subject    D1    D2   D3   D4   D5 ) ($);
cards;
1               a      b     c       d      e
1               a      d      .      .       .
1               f      a     b       .       .
2               g      c    h        e      d
2               e      d      j       .         .
;


data want;
 if _n_ = 1 then do ;                          
    dcl hash h ( multidata:"y") ;
    h.definekey ("subject",'d') ;                   
    h.definedata ("subject",'d') ; 
	h.definedone(); 
	end;
set have ;
by subject;
array t(*) d1-d5;
do n=1 to dim(t);
d=t(n);
rc=h.check();
h.add(); 
if rc=0  then call missing(t(n));
end;
drop rc n d;
run;
Occasional Contributor
Posts: 5

Re: Comparing values accross rows and columns

Posted in reply to novinosrin

Thank you so much! Worked like a charm and it is so efficient! 

Trusted Advisor
Posts: 1,394

Re: Comparing values accross rows and columns

Hash can be useful, but for this nail, I don't think you need that big a hammer.    Just keep a retained space-separated list of d values (variables _d_list) for a given id.  Then check subsequent d values against that list, with the result of either expanding the list with a new d value, or setting the d value to missing.

 

data have;
input (Subject    D1    D2   D3   D4   D5 ) ($);
cards;
1               a      b     c       d      e
1               a      d      .      .       .
1               f      a     b       .       .
2               g      c    h        e      d
2               e      d      j       .         .
;

data want;
  set have;
  by subject;
  length _d_list $300;
  retain _d_list;
  array d {*} d1-d5;
  if first.subject then _d_list=catx(' ',of d{*});
  else do _i=1 to dim(d) while (d{_i}^=' ');
    if findw(trim(_d_list),trim(d{_i}))=0 then _d_list=catx(' ',_d_list,d{_i});
    else d{_i}=' ';
  end;
  drop _: ;
run;

 

This program assumes (1) the data are sorted by subject, (2) the first record for each subject has no duplicate d-values, (3) that obeservations with less the 5 non-missing d-values have all the non-missing at the left end of the d-variables.

 

Make sure the length of _d_list is sufficient to hold the maximum number of space-separated distinct d values.

Occasional Contributor
Posts: 7

Re: Comparing values accross rows and columns

Easiest would probably be to just transpose vertically and deduplicate with PROC SORT by subject and diagnosis.  But if you want to keep the original structure, maybe this:

 

proc sort data=have; by subject; run;

proc sql noprint;
select max(n)*5 into :maxD from	
	(select subject, count(*) as n 
	from have group by subject);
quit;

data have;
set have;
by subject;
array U {&maxD} $10 _temporary_;
array d {*} D1-D5;
if first.subject then do;
	do i=1 to dim(U);
		U[i]='';
	end;
	n=0;
end;
do i=1 to dim(d);
	if not missing(d[i]) then do;
		if d[i] in U then d[i]='';
		else do;
			n+1;
			U[n]=d[i];
		end;
	end;
end;
run;
Super User
Posts: 2,061

Re: Comparing values accross rows and columns

[ Edited ]
/*a little cleaner than previous*/
data want;
 if _n_ = 1 then do ;                          
    declare hash h ( ) ;*further edited;
    h.definekey ("subject",'d') ;                   
    h.definedata ("subject",'d') ; 
	h.definedone(); 
	end;
set have ;
by subject;
array t(*) d1-d5;
do n=1 to dim(t);
d=t(n);
rc=h.check();
h.replace();  *further edited;
if rc=0  then call missing(t(n));
end;
drop rc n d;
run;

 

Occasional Contributor
Posts: 5

Re: Comparing values accross rows and columns

Posted in reply to novinosrin

Thanks so much everyone for the help, I made it work clean and fast! 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 141 views
  • 1 like
  • 4 in conversation