I have the below data. I am interested in getting the dx_date as the start date and the last entry for snap date as the end date. The problem is for many patients the last couple of entries for dob and status are missing, and for eth Z is coded for unknown. How can I extract the last entry for snap date as the end date while keeping all other data?
data fake_data;
input pat_id $ dob:MMDDYY10. eth $ status $ dx_date:MMDDYY10. snap_date:MMDDYY10. ;
format dob MMDDYY10. dx_date MMDDYY10. snap_date MMDDYY10.;
datalines;
A45926 09/26/1975 C A 05/1/2020 08/1/2020
A45926 09/26/1975 C A 05/1/2020 09/1/2020
A45926 09/26/1975 C A 05/1/2020 10/1/2020
A45926 09/26/1975 C A 05/1/2020 11/1/2020
A45926 09/26/1975 C A 05/1/2020 12/1/2020
A45926 09/26/1975 C A 05/1/2020 01/1/2021
A45926 09/26/1975 C A 05/1/2020 02/1/2021
A45926 09/26/1975 C A 05/1/2020 03/1/2021
A45926 09/26/1975 C A 05/1/2020 04/1/2021
A45926 09/26/1975 C A 05/1/2020 05/1/2021
A45926 . Z A . 06/1/2021
A45926 . Z A . 07/1/2021
;
Here is a solution for filling out missing values of DOB and DX_DATE, and replacing ETH='Z' with the last non-Z value:
data want;
set fake_data;
by pat_id;
if first.pat_id then do;
_dx_date=dx_date;
_dob=dob;
_eth=eth;
end;
else do;
if not missing(dx_date) then
_dx_date=dx_date;
else
dx_date=_dx_date;
if not missing(dob) then
_dob=dob;
else
dob=_dob;
if eth ne 'Z' then
_eth=eth;
else
eth=_eth;
end;
retain _:;
drop _:;
run;
Thanks for posting data in usable form!
Is it sure, that the first obs (for each patient) contains all data, or is it necessary to find the most complete observation first?
data work.want;
if 0 then set work.fake_data;
set work.fake_data(rename=(
dob = _dob
eth = _eth
status = _status
dx_date = _dx_date
));
by pat_id;
retain dob eth status dx_date;
if first.pat_id then do;
dob = _dob;
eth = _eth;
status = _status;
dx_date = _dx_date;
end;
if last.pat_id then do;
output;
end;
drop _:;
run;
Thanks for both solutions. They both worked!
Here is a solution for filling out missing values of DOB and DX_DATE, and replacing ETH='Z' with the last non-Z value:
data want;
set fake_data;
by pat_id;
if first.pat_id then do;
_dx_date=dx_date;
_dob=dob;
_eth=eth;
end;
else do;
if not missing(dx_date) then
_dx_date=dx_date;
else
dx_date=_dx_date;
if not missing(dob) then
_dob=dob;
else
dob=_dob;
if eth ne 'Z' then
_eth=eth;
else
eth=_eth;
end;
retain _:;
drop _:;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.