Hello,
I have a dataset like below
ID Date Dx1 Dx2 Dx3 Dx4
1 100 H1 H2 H3 H4
1 100 H3 H2
1 100 H5 H1
1 200 H5
1 200 H3 H2
I want to remove the partial duplicate data. If ID and Date are the same, then keep all the values of Dx1-Dx4 but just in 1 row only. My scenario is more likely like the ID 1 person in the first 3 rows.
The final data set look like below
ID Date Dx1 Dx2 Dx3 Dx4
1 100 H1 H2 H3 H4
1 200 H5 H3 H2
Thank you!
why 1 500 H5 lost?
data temp;
set a;
array array_a Dx1-Dx4;
length newVal $2;
do over array_a;
if array_a^='' then do;
newVal=array_a;
output;
end;
end;
drop Dx1-Dx4;
run;
proc sort data=temp nodupkey;
by id date newVal;
run;
proc transpose data=temp out=result(drop=_name_) prefix=Dx;
by id date;
var newVal;
run;
Can you clarify my assumption
1 200 H5
1 200 H3 H2
does the first obs on the by group as in the example H5 will occupy Dx1 replacing H3 and displacing H3 to Dx2?
Thanks! I want to keep H5, H3 and H2 also
why 1 500 H5 lost?
data temp;
set a;
array array_a Dx1-Dx4;
length newVal $2;
do over array_a;
if array_a^='' then do;
newVal=array_a;
output;
end;
end;
drop Dx1-Dx4;
run;
proc sort data=temp nodupkey;
by id date newVal;
run;
proc transpose data=temp out=result(drop=_name_) prefix=Dx;
by id date;
var newVal;
run;
@learsaas wrote:
why 1 500 H5 lost?
data temp; set a; array array_a Dx1-Dx4; length newVal $2; do over array_a; if array_a^='' then do; newVal=array_a; output; end; end; drop Dx1-Dx4; run; proc sort data=temp nodupkey; by id date newVal; run; proc transpose data=temp out=result(drop=_name_) prefix=Dx; by id date; var newVal; run;
A tranpose to long and then flip to wide after removing duplicates is a great solution. Only modification I'd consider to @learsaas excellent solution, is to replace DO OVER since it's deprecated for an explicit loop. Otherwise it will handle all various situations such as multiple lines and duplicate codes as shown in your example.
Reeza, how would you change the code to replace DO OVER? I didn't understand the part about it being deprecated for an explicit loop
Hi @janus58 Take a look at this thread and Soren Lassen's reponse
09-28-2018 05:19 AM
Implicit array references (do over) are not encouraged, because SAS Institute intends to do away with them (it has been an undocumented feature for years, now). So if you use them, you risk that your program will have to be rewritten when the next version of SAS comes out. Performance-wise, DO OVER is just the same as explicit array processing (a temporary variable named _I_ is created in the background and used to loop through the array).
thanks a lot for the explanation
You're welcome. Actually we should rather thank @Reeza for the prompt and wise correction. I'm glad for your question. Have a good one!
Thank you! I want to keep H5 also. Sorry for the confusion!
ID Date Dx1 Dx2 Dx3 Dx4
1 100 H1 H2 H3 H4
2 200 A9 H3 H2 A1
3 340 B2 C1 H1
4 250 X1
5 350 Y1 C1
My result now looks like this. But I get stuck with the next step. I want to find if any Dx1-Dx4 has H* values, if there is any then create variable dx= value of the first Dx* which have H* appear (ex:id=2 the dx=value of dx2=H3) if not then take other value.
1. The result desires to look like this
ID Date Dx
1 100 H1
2 200 H3
3 340 H1
4 250 X1
5 200 Y1
2. Additionally I want to make a table (with one column) with all unique characters which are different from H*
Many thanks!!
data have;
infile cards truncover;
input ID Date (Dx1 Dx2 Dx3 Dx4 ) ($);
cards;
1 100 H1 H2 H3 H4
1 100 H3 H2
1 100 H5 H1
1 200 H5
1 200 H3 H2
;
data _null_;
if _n_=1 then do;
declare hash H (ordered: "A") ;
h.definekey ("id","date","dx") ;
h.definedone () ;
end;
set have end=lr;
by id date;
array t(*) dx1-dx4;
do _n_=1 to dim(t);
if not missing(t(_n_)) then do;
dx=t(_n_);
rc=h.add();
end;
end;
if lr then h.output(dataset:'w');
run;
proc transpose data=w out=want(drop=_:) prefix=Dx;
by id date;
var dx;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.