hi all:
How may I find the last date from many data sets?
eg. from Visit (visdat), Test A (testadat), Test C(testcdat) , Vs(vsdat)...
I can use proc sort and but if there are more than 30 datasets, maybe there is a betterway.
thank you,
Purple
What is the expected OUTPUT from those four input datasets?
If I wanted to code it by hand for just those four datasets I would do:
data want ;
set d1-d4 ;
by name;
if first.name then max=.;
retain max;
max=max(of _numeric_);
if last.name;
keep name max;
format max yymmdd10.;
run;
Results:
Obs name max 1 CAMPBELL 2015-09-12 2 COOK 2019-03-20 3 EDWARDS 2018-02-21 4 MORRIS 2019-12-04 5 PEREZ 2021-08-01 6 ROGERS 2012-09-01 7 SANCHEZ 2021-07-25 8 STEWART 2022-03-17 9 TURNER 2018-03-31
Are those the values you are looking for?
You can use a loop to cycle through all 30 data sets and find the maximum of each and then the max of the max
Or you can combine all the datasets into one very big data set (and only on variable, the date) and then run PROC SUMMARY to find the maximum.
The hard part is to get the list of dataset and variables.
But let's do the easy part first.
Just set all of the observations from all of the dataset, only keeping the date variables.
data want;
retain _max_date ;
set Visit (keep=visdat) TestA (keep=testadat) TestC(keep=testcdat) Vs(keep=vsdat) end=eof;
_max_date = max(of _numeric_);
if eof ;
put _max_date=;
keep _max_date;
format _max_date date9.;
run;
Now if you had a dataset with a structure like:
data dates ;
input libname :$8. memname :$32. name :$32. ;
cards;
WORK VISIT VISDAT
WORK TESTA TESTADAT
WORK TESTB TESTBDAT
WORK VS VSDAT
;
You could generate that SET statement with a simple data step.
filename code temp;
data _null_;
set dates end=eof;
by libname memname ;
file code;
if _n_=1 then put 'set ' @;
if first.memname then put libname +(-1) '.' memname '(keep=' @;
put name @;
if last.memname then put ') ' @ ;
if eof then put 'end=eof;' ;
run;
So then just replace the SET statement in the previous code with :
%include code / source2;
Now how can you get the list of variables?
Use PROC CONTENTS to get a list of ALL of the variables in all of the datasets in your library. Then just filter to the ones that have date type formats attached to them.
So your whole program might look like this:
libname mylib 'directory with SAS datasets';
proc contents data=mylib._all_ noprint out=contents; run;
filename code temp;
data _null_;
set contents end=eof;
where 'date' = fmtinfo('cat',format);
by libname memname ;
file code;
if _n_=1 then put 'set ' @;
if first.memname then put libname +(-1) '.' memname '(keep=' @;
put name @;
if last.memname then put ') ' @ ;
if eof then put 'end=eof;' ;
run;
data want;
retain _max_date ;
%include code / source2;
_max_date = max(of _numeric_);
if eof ;
put _max_date=;
keep _max_date;
format _max_date date9.;
run;
You might need to define exactly what "last date" means.
Is the value of the variable on the last record of a data set or the latest date, regardless of data order, in a set.
Are you looking at more than one date variable in a given data set?
Also, are your "dates" actual SAS date values? If they are character values they likely need to be converted to actual date values for proper comparison. Or if the values are numbers that look like a date such a 10272021 then you may need to convert to dates to get a proper comparison.
IF the values are actually dates you get the largest value (latest date) using the MAX statistic. Such as
Proc summary data=somedatasetname; var datevariable; output out=summary1 max=maxdate; run;
Could create a bunch of these summary data sets, append together and then find the maximum of all the individual maxdate values.
If you are dealing with multiple variables you may need to do something like provide a few examples as data step code and describe explicitly which would be the "last" value and any not obvious rules.
Hi Paigemiller, Tom, Ballardw:
Thank you so much for such a quick response with solution.
I need to figure out how to code in detail and get back to you asap, in case i need more help, will ask for help again. Appreciated.
Best wish and have a great day 👍
purple
Hi all:
Here is a sample code I made, I used sashelp.vcolomn to get 40 data sets name and the variables I need , here is a simplified version.
Thanks all,
/*CREATE DATA SETS*/
data d4;
infile datalines dlm='' dsd;
input name $ v1dt :mmddyy10. v2dt :mmddyy10. v3dt :mmddyy10. v4dt :mmddyy10.;
format v1dt mmddyy10. v2dt mmddyy10. v3dt mmddyy10. v4dt mmddyy10.;
datalines;
CAMPBELL 12/15/2013 09/12/2015 03/15/2015
COOK 09/05/2011 03/20/2014
EDWARDS 02/21/2018 09/01/2015 03/21/2017
MORRIS 02/04/2014
PEREZ 06/03/2011 03/05/2011 10/12/2011 08/01/2021
ROGERS 07/08/2012
SANCHEZ 09/01/2019 09/18/2017 03/25/2011 03/01/2011
STEWART 1/19/2013 03/21/2016 03/17/2022
;
data d3;
infile datalines dlm='' dsd;
input name $ v1dt :mmddyy10. v2dt :mmddyy10. v3dt :mmddyy10.;
format v1dt mmddyy10. v2dt mmddyy10. v3dt mmddyy10. ;
datalines;
CAMPBELL 12/01/2012 09/11/2015
COOK 03/01/2011 03/20/2011
PEREZ 06/01/2011 03/05/2011 10/12/2011
ROGERS 07/01/2012
SANCHEZ 09/01/2019 09/11/2017 03/25/2011
STEWART 1/19/2013 03/21/2016
TURNER 03/31/2018 03/20/2014
;
data d2;
infile datalines dlm='' dsd;
input name $ v1dt :mmddyy10. v2dt :mmddyy10.;
format v1dt mmddyy10. v2dt mmddyy10.;
datalines;
CAMPBELL 12/21/2012
COOK 03/21/2011 03/20/2019
SANCHEZ 05/21/2019 09/11/2017
STEWART 03/19/2013
TURNER 03/31/2018 03/20/2014
;
data d1;
infile datalines dlm='' dsd;
input name $ v1dt :mmddyy10. ;
format v1dt mmddyy10. ;
datalines;
CAMPBELL 12/01/2012
COOK 03/19/2011
EDWARDS 09/01/2015
MORRIS 12/04/2019
PEREZ 11/09/2014
ROGERS 09/01/2012
SANCHEZ 07/25/2021
STEWART 03/21/2016
TURNER 03/20/2014
;
/*1.GET THE LARGEST VALUE BY ROW BY EACH DATA SET*/
options mprint spool;
%macro pd2(ord,ds,v1,v2,v3,v4);
data &ds.a ;
length ds $10.;
set &ds(keep= name v1dt %if &ord=1 %then %do; &v1. &v2. %end;
%else %if &ord=2 %then %do; &v1. &v2. &v3. %end;
%else %if &ord=3 %then %do; &v1. &v2. &v3. &v4.%end;);
ds="&ds.";
_lastf=v1dt;
%if &ord>=1 %then %do;
array dt &v1. &v2.;
_lastf=max(of dt[*]);
%end;
%if &ord>=2 %then %do;
array dt2 &v1. &v2. &v3. ;
_lastf=max(of dt2[*]);
%end;
%if &ord=3 %then %do;
array dt3 &v1. &v2. &v3. &v4.;
_lastf=max(of dt3[*]);
%end;
format _lastf yymmdd10.;
run;
*1).GET THE LAST DATE;
proc sort;by name decending _lastf;run;
proc sort data=&ds.a out=&ds.f nodupkey;by name;run;
run;
*2). APPEND ALL DS;
%if &ord=-1 %then %do;
proc append base=d1f Data=&ds.f;
%end;
%else %if &ord=1 %then %do;
proc append base=d2f Data=&ds.f;
%end;
%else %if &ord=2 %then %do;
proc append base=d3f Data=&ds.f;
%end;
%else %if &ord=3 %then %do;
proc append base=d4f Data=&ds.f;
%end;
run;
%mend;
%pd2(3,D4,v1dt,v2dt,v3dt,v4dt)
%pd2(2,D3,v1dt,v2dt,v3dt,)
%pd2(1,D2,v1dt,v2dt, ,)
%pd2(-1,D1,v1dt, , ,)
/* 2. GET THE LARGEST DATE FROM DIFFERENT DATA SETS */
data all;
set d1f d2f d3f d4f;
run;
proc sort;by name _lastf;run;
data allf;
retain name _lastf;
set all;
by name _lastf;
if last.name;
run;
If anyone has other way of coding for this purpose, please feel free to share. thanks again
What is the expected OUTPUT from those four input datasets?
If I wanted to code it by hand for just those four datasets I would do:
data want ;
set d1-d4 ;
by name;
if first.name then max=.;
retain max;
max=max(of _numeric_);
if last.name;
keep name max;
format max yymmdd10.;
run;
Results:
Obs name max 1 CAMPBELL 2015-09-12 2 COOK 2019-03-20 3 EDWARDS 2018-02-21 4 MORRIS 2019-12-04 5 PEREZ 2021-08-01 6 ROGERS 2012-09-01 7 SANCHEZ 2021-07-25 8 STEWART 2022-03-17 9 TURNER 2018-03-31
Are those the values you are looking for?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.