Hello,
I'm working with a wide dataset. Each row represents a person and includes treatment records as well as corresponding dates in which treatments are performed. My data essentially looks like this:
ID treatment1 treatment2 treatment3 treatment4 treatment5 date1 date2 date3 date4 date5
John A B B B A Jan1 Jan15 Jan16 Jan17 Feb1
David A A B B A Jan1 March15 Jan5 Jan5 Feb15
What I need to do is for each person is to select the latest date in which he received treatment A (date5 for John and date2 for David in this case). I thought a simple array with a do-loop would do the trick but I realized that the dates1-5 are not necessarily recorded in chronological order. Any help would be greatly appreciated!
Thanks!
Further improvised:
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 05Jan2018 05Jan2018 15Feb2018
;
data want;
set have;
array j(*) treatment:;
array t(*) date:;
do _n_=dim(j) to 1 by -1;
if j(_n_)= 'A' then latest_date=max(latest_date,t(_n_));
end;
format latest_date date9.;
keep id latest_Date;
run;
@Jhan1 wrote:
Hello,
I'm working with a wide dataset. Each row represents a person and includes treatment records as well as corresponding dates in which treatments are performed. My data essentially looks like this:
perhaps-->
data want;
set have;
latest_date=max(of date: );
format latest_date your_format;
run;
@novinosrin only for treatment A though.
It's probably easiest to transpose it to a long format. Then you can sort and get it pretty quickly.
Thank you @Reeza let me go back to starb and ask for refund
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 05Jan2018 05Jan2018 15Feb2018
;
data want;
set have;
array j(*) treatment:;
array t(*) date:;
do _n_=dim(j) to 1 by -1;
if j(_n_) ne 'A' then call missing(j(_n_),t(_n_));
end;
latest_date=max(of t(*));
format latest_date date9.;
keep id latest_Date;
run;
Great solution. Super fast performance!!!
Further improvised:
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 05Jan2018 05Jan2018 15Feb2018
;
data want;
set have;
array j(*) treatment:;
array t(*) date:;
do _n_=dim(j) to 1 by -1;
if j(_n_)= 'A' then latest_date=max(latest_date,t(_n_));
end;
format latest_date date9.;
keep id latest_Date;
run;
Just wanted say thank-you for your input!
I've tried different ways you've suggested and others (including transposing the data) - this suggestion fit my data the best...I had other criteria I had to consider and working with a very large dataset and wanted to avoid transposing data if I could although that was an excellent suggestion as well.
Thanks to others who responded - it was first time posting on this community and I'm glad I chose to!
Its easy to solve this problem for specific hard coded treatment using array. But for a generic solution I will suggest to use proc transpose. Something like this should work:
Data Have;
infile cards truncover;
input (ID Treatment1 Treatment2 Treatment3 Treatment4 Treatment5) ($) (Date1 Date2 Date3 Date4 Date5) (:date9.);
format date: date9.;
cards;
John A B B B A '01Jan2018'd '15Jan2018'd '16Jan2018'd '17Jan2018'd '01Feb2018'd
David A A B B A '01Jan2018'd '15Mar2018'd '05Jan2018'd '05Jan2018'd '05Feb2018'd
;
run;
proc sort data=Have;
by ID;
run;
proc transpose data=Have out=Treatments(Drop=_Name_ Rename=(Treatment1=Treatment)) Prefix=Treatment;
by ID;
Var Treatment:;
run;
proc transpose data=Have out=Dates(Drop=_Name_ rename=(Date1=Date)) Prefix=Date;
by ID;
Var Date:;
run;
Data Have_Transposed;
merge Treatments Dates;
by ID;
run;
proc sql;
Select ID, Treatment,max(Date) as Max_Date format date9. from Have_Transposed
group by ID, Treatment;
quit;
Some generic fun for finding latest_Date for all treatments:
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 08Jan2018 05Jan2018 15Feb2018
;
data want;
if _n_=1 then do;
if 0 then set have;
length treat $8 dt 8;
format dt latest_date date9.;
dcl hash H (ordered: "A",multidata:'y') ;
h.definekey ("treat") ;
h.definedata ("treat","dt") ;
h.definedone () ;
call missing(treat,dt);
end;
set have ;
array j(*) treatment1-treatment5;
array t(*) date1-date5;
do i=1 to dim(t);
rc=h.add(key:j(i),data:j(i),data:t(i));
end;
do i=1 to dim(j);
rc=h.check(key:j(i));
if rc=0 then do;
do _iorc_=h.find(key:j(i)) by 0 while(_iorc_=0);
latest_date=max(latest_date,dt);
_iorc_=h.find_next();
end;
output;
h.remove(key:j(i));
call missing(latest_date,dt);
end;
end;
keep id treat latest_date;
run;
data have;
infile cards truncover;
input (ID treatment1 treatment2 treatment3 treatment4 treatment5) ($) (date1 date2 date3 date4 date5) (:date9.);
format date: date9.;
cards;
John A B B B A 01Jan2018 15Jan2018 16Jan2018 17Jan2018 01Feb2018
David A A B B A 01Jan2018 15Mar2018 08Jan2018 05Jan2018 15Feb2018
;
data temp;
set have;
array j(*) treatment1-treatment5;
array t(*) date1-date5;
do i=1 to dim(t);
treatment=j(i);
dt=t(i);
output;
end;
keep id treatment dt;
format dt date9.;
run;
proc sql;
create table want as
select id,treatment,max(dt) as latest_date format=date9.
from temp
group by id,treatment;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.