Hi All,
As per title, I need help in selecting the observation with the latest date (from 5 date variables, no hierarchy) and from there selecting the one with the highest VarId by group.
Example Data:
Group,VarID,VarDate1,VarDate2,VarDate3,VarDate4,VarDate5
A,15,Oct2018,Oct2018,.,.,Oct2018
A,2,.,.,.,.,Jan2019
A,3,.,.,Sept2018,.,.,
A,14,.,.,.,.,Dec2018
B,45,July2018,.,.,.,July2018
B,46,.,.,.,.,.July2018
Result Data should be:
A,2,.,.,.,.,Jan2019
B,46,.,.,.,.,.July2018
Thank you!
Assuming that you have actual dates in those variables and not the strings you are showing then you can just make new variable with the maximum of the individual date variables. If you just have strings then you will need to convert them to actual dates first.
You can then use the new variable to order the data and find the last value per group.
data step1;
set have ;
max_date=max(of vardate1-vardate5);
format max_date yymmdd10.;
run;
proc sort data=step1 out=want ;
by group max_date varid ;
run;
data want;
set want;
by group;
if last.group;
run;
Hi @ohemgee Welcome to SAS communities. Assuming those are all SAS numeric dates, the following may work for your requirement though untested
data want;
do _n_=1 by 1 until(last.group);
set have;
by group;
array t(*) VarDate1--VarDate5;
max=max(max,max(of t(*)));
end;
do _n_=1 to _n_;
set have;
if max in t then output;
end;
run;
Assuming that you have actual dates in those variables and not the strings you are showing then you can just make new variable with the maximum of the individual date variables. If you just have strings then you will need to convert them to actual dates first.
You can then use the new variable to order the data and find the last value per group.
data step1;
set have ;
max_date=max(of vardate1-vardate5);
format max_date yymmdd10.;
run;
proc sort data=step1 out=want ;
by group max_date varid ;
run;
data want;
set want;
by group;
if last.group;
run;
A data step works here as well:
data want;
set have;
by group;
retain _temp;
if first.group then call missing(_temp);
_temp = max(of vardate1-vardate5);
_current_max = max(_temp, _current_max);
*keep right varID;
if _current_max = _temp then newVarID = varID;
if last.group then output;
keep group newvarID _current_max;
run;
@ohemgee wrote:
Hi All,
As per title, I need help in selecting the observation with the latest date (from 5 date variables, no hierarchy) and from there selecting the one with the highest VarId by group.
Example Data:
Group,VarID,VarDate1,VarDate2,VarDate3,VarDate4,VarDate5
A,15,Oct2018,Oct2018,.,.,Oct2018
A,2,.,.,.,.,Jan2019
A,3,.,.,Sept2018,.,.,
A,14,.,.,.,.,Dec2018
B,45,July2018,.,.,.,July2018
B,46,.,.,.,.,.July2018
Result Data should be:
A,2,.,.,.,.,Jan2019
B,46,.,.,.,.,.July2018
Thank you!
Welcome to the SAS Community 🙂
data have(drop=i);
array VarDate{5};
do ID="A", "B", "C";
do varID=1 to 5;
do i=1 to 5;
VarDate[i]=rand('integer', '01jan2019'd, today());
end;
output;
end;
end;
format VarDate: date9.;
run;
data want;
do until (last.ID);
set have;
by ID;
if max(of VarDate:) ge _iorc_ then _iorc_=max(of VarDate:);
end;
do until (last.ID);
set have;
by ID;
if max(of VarDate:)=_iorc_ then output;
end;
_iorc_=.;
run;
Thanks a lot for all your help. I still can't make it work though. Modifying a bit so that it sort of represents the data wherein Group is actually 3 variables. And yes, those are SAS numeric dates. Also looking for the most efficient way as this will run through millions of records.
Example Data:
Group1, Group 2, Group3, VarID,VarDate1,VarDate2,VarDate3,VarDate4,VarDate5
A,AA,AAA,15,Oct2018,Oct2018,.,.,Oct2018
A,AA,BBB,2,.,.,.,.,Jan2019
A,AA,CCC,3,.,.,Sept2018,.,.,
A,AA,DDD,14,.,.,.,.,Dec2018
B,AA,AAA,45,July2018,.,.,.,July2018
B,AA,BBB,46,.,.,.,.,.July2018
Result Data should be:
Group1, Group 2, Group3, VarID,VarDate1,VarDate2,VarDate3,VarDate4,VarDate5
A,AA,BBB,2,.,.,.,.,Jan2019
B,AA,BBB,46,.,.,.,.,.July2018
If your data is representable of your actual data, which means that your data is sorted by Group1, Group2, Group3, then you can do something like this
data have(drop=i);
array VarDate{5};
do Group1="A", "B", "C", "D";
do Group2="AA", "BB", "CC", "DD";
do Group3="AAA", "BBB", "CCC", "DDD";
VarID=ceil(rand('Uniform')*100);
do i=1 to 5;
VarDate[i]=ceil(rand('Uniform', '01jan2019'd, today()));
end;
output;
end;
end;
end;
format VarDate: date9.;
run;
data want(drop=max);
do until (last.Group1);
set have;
by Group1;
max=max(of VarDate:);
if max gt _iorc_ then _iorc_=max;
end;
do until (last.Group1);
set have;
by Group1;
if max(of VarDate:)=_iorc_ then output;
end;
_iorc_=.;
run;
Went with the ff. code: Let me know what you think and a more efficient way is much appreciated. 🙂
data want;
set have;
max_dte = max(of vardate1-vardate5);
run;
proc sort data=want out=want2;
by group1 group2 group3 descending max_dte descending varid;
run;
data want3;
set want2;
by group1 group2 group3 descending max_dte descending varid;
if first.max_dte;
run;
proc sort nodupkey data=want3
out=want4 (index=(indx = (group 1 group2 group3)/unique));
by group1 group2 group3;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.