I have a dataset that has blanks and I would like to fill in the spaces based on other columns. HT, WT, and BI are filled in for patient and visit by variable. I would like to copy over the values of each patient and visit into the other variables. Can this be done by writing if patient and visit match, then copy value over? Or is there another way?
data have;
input Patient $ Visit Variable $ HT WT BI;
cards;
A 1 H 1 . .
A 2 H 3 . .
A 3 H 4 . .
A 1 W . 2 .
A 2 W . 5 .
A 3 W . 9 .
A 1 B . . 0
A 2 B . . 8
A 3 B . . 8
B 1 H 1 . .
B 2 H 3 . .
B 3 H 1 . .
B 1 W . 2 .
B 2 W . 4 .
B 3 W . 1 .
B 1 B . . 0
B 2 B . . 0
B 3 B . . 8
;
data want;
input Patient $ Visit Variable $ HT WT BI;
cards;
A 1 H 1 2 0
A 2 H 3 5 0
A 3 H 4 9 8
A 1 W 1 2 0
A 2 W 3 5 8
A 3 W 4 9 8
A 1 B 1 2 0
A 2 B 3 5 8
A 3 B 4 9 8
B 1 H 1 2 0
B 2 H 3 4 0
B 3 H 1 1 8
B 1 W 1 2 0
B 2 W 3 4 0
B 3 W 1 1 8
B 1 B 1 2 0
B 2 B 3 4 0
B 3 B 1 1 8
;
Initially I thought this could be fixed with an if-else statement but this may be more complex?
It's harder than it looks. Here's a solution that depends on VISIT actually taking on values of 1, 2, and 3 as indicated in your sample data:
data want;
array htvals {3} _temporary_;
array wtvals {3} _temporary_;
array bivals {3} _temporary_;
do until (last.patient);
set have;
by patient;
if ht > . then htvals{visit} = ht;
if wt > . then wtvals{visit} = wt;
if bi > . then bivals{visit} = bi;
end;
do until (last.patient);
set have;
by patient;
if ht = . then ht = htvals{visit};
if wt = . then wt = wtvals{visit};
if bi = . then bi = bivals{visit};
output;
end;
do _n_=1 to 3;
htvals{_n_} = .;
wtvals{_n_} = .;
bivals{_n_} = .;
end;
run;
Wish I found an easier way, but this is all I came up with.
It's harder than it looks. Here's a solution that depends on VISIT actually taking on values of 1, 2, and 3 as indicated in your sample data:
data want;
array htvals {3} _temporary_;
array wtvals {3} _temporary_;
array bivals {3} _temporary_;
do until (last.patient);
set have;
by patient;
if ht > . then htvals{visit} = ht;
if wt > . then wtvals{visit} = wt;
if bi > . then bivals{visit} = bi;
end;
do until (last.patient);
set have;
by patient;
if ht = . then ht = htvals{visit};
if wt = . then wt = wtvals{visit};
if bi = . then bi = bivals{visit};
output;
end;
do _n_=1 to 3;
htvals{_n_} = .;
wtvals{_n_} = .;
bivals{_n_} = .;
end;
run;
Wish I found an easier way, but this is all I came up with.
Without actually knowing what you want to do with the data it seems like you have duplication of the data.
Either you should remove the variable "Variable" or you should have the variables Patient,Visit, Variable and a new one called "Value" or similar. This is an example of the first one:
proc sort data=have out=have2;
*We need to make sure that the data is sorted on patient and visit for the next step.;
by patient visit ;
run;
data want (rename=(HT_=HT WT_=WT BI_=BI));
set have2;
by patient visit ;
retain HT_ WT_ BI_; *We want to keep the values from the variables, but can only do this if they are not from the input data set.;
*Put the right values in the right variable.;
if variable = "H" then HT_ = HT;
else if variable = "W" then WT_ = WT;
else if variable = "B" then BI_ = BI;
*Only output the last row of each Patient and visit. ;
if last.visit then output;
*Get rid of some variables. ;
drop variable HT WT BI;
run;
This will give you the following data:
Patient Visit HT WT BI
A 1 1 2 0
A 2 3 5 8
A 3 4 9 8
B 1 1 2 0
B 2 3 4 0
B 3 1 1 8
Please try the proc sql approach
data have;
input Patient $ Visit Variable $ HT WT BI;
cards;
A 1 H 1 . .
A 2 H 3 . .
A 3 H 4 . .
A 1 W . 2 .
A 2 W . 5 .
A 3 W . 9 .
A 1 B . . 0
A 2 B . . 8
A 3 B . . 8
B 1 H 1 . .
B 2 H 3 . .
B 3 H 1 . .
B 1 W . 2 .
B 2 W . 4 .
B 3 W . 1 .
B 1 B . . 0
B 2 B . . 0
B 3 B . . 8
;
proc sql;
create table want as select a.patient,a.visit,a.variable,b.ht,b.wt,b.bi from have as a inner join (select patient,visit,max(ht) as ht, max(wt) as wt, max(bi) as bi from have group by patient, visit) as b on a.patient=b.patient and a.visit=b.visit order by a.patient,a.variable,a.visit;
quit;
data have;
input Patient $ Visit Variable $ HT WT BI;
cards;
A 1 H 1 . .
A 2 H 3 . .
A 3 H 4 . .
A 1 W . 2 .
A 2 W . 5 .
A 3 W . 9 .
A 1 B . . 0
A 2 B . . 8
A 3 B . . 8
B 1 H 1 . .
B 2 H 3 . .
B 3 H 1 . .
B 1 W . 2 .
B 2 W . 4 .
B 3 W . 1 .
B 1 B . . 0
B 2 B . . 0
B 3 B . . 8
;
run;
data want;
input Patient $ Visit Variable $ HT WT BI;
cards;
A 1 H 1 2 0
A 2 H 3 5 0
A 3 H 4 9 8
A 1 W 1 2 0
A 2 W 3 5 8
A 3 W 4 9 8
A 1 B 1 2 0
A 2 B 3 5 8
A 3 B 4 9 8
B 1 H 1 2 0
B 2 H 3 4 0
B 3 H 1 1 8
B 1 W 1 2 0
B 2 W 3 4 0
B 3 W 1 1 8
B 1 B 1 2 0
B 2 B 3 4 0
B 3 B 1 1 8
;
run;
proc summary data=have nway;
class Patient Visit;
var HT WT BI;
output out=max max=;
run;
data test;
* set PDV order (optional) ;
if 0 then set have;
* initialize &_hashnum_ to zero then declare hash objects ;
%let _hashnum_=0;
%hash_define(data=max, keys=Patient Visit, vars=HT WT BI);
set have;
%hash_lookup;
drop _rc:;
run;
proc compare base=want comp=test;
run;
Do you have a typo in your want dataset?
Alternate approach:
data have;
input Patient $ Visit Variable $ HT WT BI;
cards;
A 1 H 1 . .
A 2 H 3 . .
A 3 H 4 . .
A 1 W . 2 .
A 2 W . 5 .
A 3 W . 9 .
A 1 B . . 0
A 2 B . . 8
A 3 B . . 8
B 1 H 1 . .
B 2 H 3 . .
B 3 H 1 . .
B 1 W . 2 .
B 2 W . 4 .
B 3 W . 1 .
B 1 B . . 0
B 2 B . . 0
B 3 B . . 8
;
run;
data want;
input Patient $ Visit Variable $ HT WT BI;
cards;
A 1 H 1 2 0
A 2 H 3 5 0
A 3 H 4 9 8
A 1 W 1 2 0
A 2 W 3 5 8
A 3 W 4 9 8
A 1 B 1 2 0
A 2 B 3 5 8
A 3 B 4 9 8
B 1 H 1 2 0
B 2 H 3 4 0
B 3 H 1 1 8
B 1 W 1 2 0
B 2 W 3 4 0
B 3 W 1 1 8
B 1 B 1 2 0
B 2 B 3 4 0
B 3 B 1 1 8
;
run;
proc summary data=have nway;
class Patient Visit;
var HT WT BI;
output out=max max= / autoname;
run;
data test;
* set PDV order (optional) ;
if 0 then set have;
* initialize &_hashnum_ to zero then declare hash objects ;
%let _hashnum_=0;
%hash_define(data=max, keys=Patient Visit, vars=HT_Max WT_Max BI_Max);
set have;
%hash_lookup;
array vars{*} HT WT BI;
array max{*} HT_Max WT_Max BI_Max;
do i=1 to dim(vars);
if missing(vars{i}) then vars{i}=max{i};
end;
drop _rc: i HT_Max WT_Max BI_Max;
run;
proc compare base=want comp=test;
run;
See
https://github.com/scottbass/SAS/blob/master/Macro/hash_define.sas and
https://github.com/scottbass/SAS/blob/master/Macro/hash_lookup.sas
If your data differs from what you posted then this approach may not work.
(Can you add "run;" statements to your have and want data steps as best practice?)
@Jagadishkatam 's approach is similar to mine, and I think is a clearer, cleaner approach.
data have; input Patient $ Visit Variable $ HT WT BI; cards; A 1 H 1 . . A 2 H 3 . . A 3 H 4 . . A 1 W . 2 . A 2 W . 5 . A 3 W . 9 . A 1 B . . 0 A 2 B . . 8 A 3 B . . 8 B 1 H 1 . . B 2 H 3 . . B 3 H 1 . . B 1 W . 2 . B 2 W . 4 . B 3 W . 1 . B 1 B . . 0 B 2 B . . 0 B 3 B . . 8 ; proc sort data=have(drop= Variable ) out=temp; by Patient Visit; run; data key; update temp(obs=0) temp; by Patient Visit; run; data want; if _n_=1 then do; if 0 then set key; declare hash h(dataset:'key'); h.definekey( 'Patient' ,'Visit'); h.definedata('HT', 'WT', 'BI'); h.definedone(); end; set have(drop=HT WT BI); call missing(HT ,WT ,BI); h.find(); run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.