Hello,
I'm working with a wide dataset. Each row represents a person and includes treatment records as well as corresponding dates in which treatments are performed. My data essentially looks like this:
ID treatment1 treatment2 treatment3 treatment4 treatment5 date1 date2 date3 date4 date5
John A B B B A Jan1 Jan15 Jan16 Jan17 Feb1
David A A B B A Jan1 March15 Jan5 Jan5 Feb15
What I need to do is for each person is to select the latest date in which he received treatment A (date5 for John and date2 for David in this case). I thought a simple array with a do-loop would do the trick but I realized that the dates1-5 are not necessarily recorded in chronological order. Any help would be greatly appreciated!
Thanks!
Further improvised:
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 05Jan2018 05Jan2018 15Feb2018
;
data want;
set have;
array j(*) treatment:;
array t(*) date:;
do _n_=dim(j) to 1 by -1;
if j(_n_)= 'A' then latest_date=max(latest_date,t(_n_));
end;
format latest_date date9.;
keep id latest_Date;
run;
@Jhan1 wrote:
Hello,
I'm working with a wide dataset. Each row represents a person and includes treatment records as well as corresponding dates in which treatments are performed. My data essentially looks like this:
perhaps-->
data want;
set have;
latest_date=max(of date: );
format latest_date your_format;
run;
@novinosrin only for treatment A though.
It's probably easiest to transpose it to a long format. Then you can sort and get it pretty quickly.
Thank you @Reeza let me go back to starb and ask for refund
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 05Jan2018 05Jan2018 15Feb2018
;
data want;
set have;
array j(*) treatment:;
array t(*) date:;
do _n_=dim(j) to 1 by -1;
if j(_n_) ne 'A' then call missing(j(_n_),t(_n_));
end;
latest_date=max(of t(*));
format latest_date date9.;
keep id latest_Date;
run;
Great solution. Super fast performance!!!
Further improvised:
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 05Jan2018 05Jan2018 15Feb2018
;
data want;
set have;
array j(*) treatment:;
array t(*) date:;
do _n_=dim(j) to 1 by -1;
if j(_n_)= 'A' then latest_date=max(latest_date,t(_n_));
end;
format latest_date date9.;
keep id latest_Date;
run;
Just wanted say thank-you for your input!
I've tried different ways you've suggested and others (including transposing the data) - this suggestion fit my data the best...I had other criteria I had to consider and working with a very large dataset and wanted to avoid transposing data if I could although that was an excellent suggestion as well.
Thanks to others who responded - it was first time posting on this community and I'm glad I chose to!
Its easy to solve this problem for specific hard coded treatment using array. But for a generic solution I will suggest to use proc transpose. Something like this should work:
Data Have;
infile cards truncover;
input (ID Treatment1 Treatment2 Treatment3 Treatment4 Treatment5) ($) (Date1 Date2 Date3 Date4 Date5) (:date9.);
format date: date9.;
cards;
John A B B B A '01Jan2018'd '15Jan2018'd '16Jan2018'd '17Jan2018'd '01Feb2018'd
David A A B B A '01Jan2018'd '15Mar2018'd '05Jan2018'd '05Jan2018'd '05Feb2018'd
;
run;
proc sort data=Have;
by ID;
run;
proc transpose data=Have out=Treatments(Drop=_Name_ Rename=(Treatment1=Treatment)) Prefix=Treatment;
by ID;
Var Treatment:;
run;
proc transpose data=Have out=Dates(Drop=_Name_ rename=(Date1=Date)) Prefix=Date;
by ID;
Var Date:;
run;
Data Have_Transposed;
merge Treatments Dates;
by ID;
run;
proc sql;
Select ID, Treatment,max(Date) as Max_Date format date9. from Have_Transposed
group by ID, Treatment;
quit;
Some generic fun for finding latest_Date for all treatments:
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 08Jan2018 05Jan2018 15Feb2018
;
data want;
if _n_=1 then do;
if 0 then set have;
length treat $8 dt 8;
format dt latest_date date9.;
dcl hash H (ordered: "A",multidata:'y') ;
h.definekey ("treat") ;
h.definedata ("treat","dt") ;
h.definedone () ;
call missing(treat,dt);
end;
set have ;
array j(*) treatment1-treatment5;
array t(*) date1-date5;
do i=1 to dim(t);
rc=h.add(key:j(i),data:j(i),data:t(i));
end;
do i=1 to dim(j);
rc=h.check(key:j(i));
if rc=0 then do;
do _iorc_=h.find(key:j(i)) by 0 while(_iorc_=0);
latest_date=max(latest_date,dt);
_iorc_=h.find_next();
end;
output;
h.remove(key:j(i));
call missing(latest_date,dt);
end;
end;
keep id treat latest_date;
run;
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 08Jan2018 05Jan2018 15Feb2018
;
data temp;
set have;
array j(*) treatment1-treatment5;
array t(*) date1-date5;
do i=1 to dim(t);
treatment=j(i);
dt=t(i);
output;
end;
keep id treatment dt;
format dt date9.;
run;
proc sql;
create table want as
select id,treatment,max(dt) as latest_date format=date9.
from temp
group by id,treatment;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.