BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

hi all:

How may I find the last date from many data sets? 

eg. from Visit (visdat), Test A (testadat), Test C(testcdat) , Vs(vsdat)...

I can use proc sort and but if there are more than 30 datasets, maybe there is a betterway.

 

thank you,

Purple  

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

What is the expected OUTPUT from those four input datasets?

If I wanted to code it by hand for just those four datasets I would do:

data want ;
  set d1-d4 ;
  by name;
  if first.name then max=.;
  retain max;
  max=max(of _numeric_);
  if last.name;
  keep name max;
  format max yymmdd10.;
run;

Results:

 Obs    name               max
  1     CAMPBELL    2015-09-12
  2     COOK        2019-03-20
  3     EDWARDS     2018-02-21
  4     MORRIS      2019-12-04
  5     PEREZ       2021-08-01
  6     ROGERS      2012-09-01
  7     SANCHEZ     2021-07-25
  8     STEWART     2022-03-17
  9     TURNER      2018-03-31

Are those the values you are looking for?

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

You can use a loop to cycle through all 30 data sets and find the maximum of each and then the max of the max

 

Or you can combine all the datasets into one very big data set (and only on variable, the date) and then run PROC SUMMARY to find the maximum.

--
Paige Miller
Tom
Super User Tom
Super User

The hard part is to get the list of dataset and variables.

But let's do the easy part first.

Just set all of the observations from  all of the dataset, only keeping the date variables.

data want;
    retain _max_date ;
    set Visit (keep=visdat) TestA (keep=testadat)  TestC(keep=testcdat) Vs(keep=vsdat) end=eof;
     _max_date = max(of _numeric_);
    if eof ;
    put _max_date=;
    keep _max_date;
    format _max_date date9.;
run;

Now if you had a dataset with a structure like:

data dates ;
   input libname :$8. memname :$32. name :$32. ;
cards;
WORK VISIT VISDAT
WORK TESTA TESTADAT
WORK TESTB TESTBDAT
WORK VS VSDAT
;

You could generate that SET statement with a simple data step.

filename code temp;
data _null_;
  set dates end=eof;
  by libname memname ;
  file code;
  if _n_=1 then put 'set ' @;
  if first.memname then put libname +(-1) '.' memname '(keep=' @;
  put name @;
  if last.memname then put ') ' @ ;
  if eof then put 'end=eof;' ;
run;

So then just replace the SET statement in the previous code with :

%include code / source2;

Now how can you get the list of variables?

Use PROC CONTENTS to get a list of ALL of the variables in all of the datasets in your library.  Then just filter to the ones that have date type formats attached to them.

So your whole program might look like this:

libname mylib 'directory with SAS datasets';
proc contents data=mylib._all_ noprint out=contents; run;
filename code temp;
data _null_;
  set contents end=eof;
  where 'date' = fmtinfo('cat',format);
  by libname memname ;
  file code;
  if _n_=1 then put 'set ' @;
  if first.memname then put libname +(-1) '.' memname '(keep=' @;
  put name @;
  if last.memname then put ') ' @ ;
  if eof then put 'end=eof;' ;
run;

data want;
  retain _max_date ;
%include code / source2;
  _max_date = max(of _numeric_);
  if eof ;
  put _max_date=;
  keep _max_date;
  format _max_date date9.;
run;

 

ballardw
Super User

You might need to define exactly what "last date" means.

Is the value of the variable on the last record of a data set or the latest date, regardless of data order, in a set.

Are you looking at more than one date variable in a given data set?

Also, are your "dates" actual SAS date values? If they are character values they likely need to be converted to actual date values for proper comparison. Or if the values are numbers that look like a date such a 10272021 then you may need to convert to dates to get a proper comparison.

 

IF the values are actually dates you get the largest value (latest date) using the MAX statistic. Such as

Proc summary data=somedatasetname;
    var datevariable;
    output out=summary1 max=maxdate;
run;

Could create a bunch of these summary data sets, append together and then find the maximum of all the individual maxdate values.

 

If you are dealing with multiple variables you may need to do something like provide a few examples as data step code and describe explicitly which would be the "last" value and any not obvious rules.

purpleclothlady
Pyrite | Level 9

Hi Paigemiller, Tom, Ballardw:

Thank you so much for such a quick response with solution.

I need to figure out how to code in detail and get back to you asap, in case i need more help, will ask for help again. Appreciated.

Best wish and have a great day 👍

purple

purpleclothlady
Pyrite | Level 9

Hi all:

Here is a sample code I made, I used sashelp.vcolomn to get 40 data sets name and the variables I need , here is a simplified version.

Thanks all,

 



/*CREATE DATA SETS*/
data d4;
infile datalines dlm='' dsd;	
input name $ v1dt :mmddyy10. v2dt :mmddyy10.  v3dt :mmddyy10. v4dt :mmddyy10.;
format v1dt mmddyy10. v2dt  mmddyy10. v3dt mmddyy10. v4dt mmddyy10.;
datalines;
CAMPBELL 12/15/2013  09/12/2015 03/15/2015
COOK 09/05/2011 03/20/2014   
EDWARDS  02/21/2018 09/01/2015 03/21/2017
MORRIS 02/04/2014      
PEREZ 06/03/2011 03/05/2011 10/12/2011 08/01/2021
ROGERS  07/08/2012    
SANCHEZ 09/01/2019 09/18/2017 03/25/2011 03/01/2011
STEWART  1/19/2013 03/21/2016 03/17/2022
;

data d3;
infile datalines dlm='' dsd;	
input name $ v1dt :mmddyy10. v2dt :mmddyy10.  v3dt :mmddyy10.;
format v1dt mmddyy10. v2dt  mmddyy10. v3dt mmddyy10. ;
datalines;
CAMPBELL 12/01/2012  09/11/2015
COOK 03/01/2011 03/20/2011  
PEREZ 06/01/2011 03/05/2011 10/12/2011
ROGERS  07/01/2012  
SANCHEZ 09/01/2019 09/11/2017 03/25/2011
STEWART  1/19/2013 03/21/2016
TURNER  03/31/2018 03/20/2014
;

data d2;
infile datalines dlm='' dsd;	
input name $ v1dt :mmddyy10. v2dt :mmddyy10.;
format v1dt mmddyy10. v2dt  mmddyy10.;
datalines;
CAMPBELL 12/21/2012  
COOK 03/21/2011 03/20/2019
SANCHEZ 05/21/2019 09/11/2017
STEWART  03/19/2013
TURNER 03/31/2018 03/20/2014
;

data d1;
infile datalines dlm='' dsd;	
input name $ v1dt :mmddyy10. ;
format v1dt mmddyy10.  ;
datalines;
CAMPBELL 12/01/2012
COOK 03/19/2011
EDWARDS 09/01/2015
MORRIS 12/04/2019
PEREZ 11/09/2014
ROGERS 09/01/2012
SANCHEZ 07/25/2021
STEWART 03/21/2016
TURNER 03/20/2014
;


 /*1.GET THE LARGEST VALUE BY ROW BY EACH DATA SET*/

options mprint  spool;
%macro pd2(ord,ds,v1,v2,v3,v4);

data &ds.a ;
length ds $10.;
   set &ds(keep= name v1dt  %if &ord=1 %then %do; &v1. &v2. %end;
                   %else %if &ord=2 %then %do; &v1. &v2. &v3. %end;
                   %else %if &ord=3 %then %do; &v1. &v2. &v3. &v4.%end;);		
    	
		  ds="&ds.";
			_lastf=v1dt;

		 %if &ord>=1 %then %do;		 	 
			 array dt &v1. &v2.;
			 _lastf=max(of dt[*]);
		%end;
		 %if &ord>=2 %then %do;
			 array dt2 &v1. &v2. &v3. ;
			 _lastf=max(of dt2[*]);
		%end;
		%if &ord=3 %then %do;
       array dt3 &v1. &v2. &v3. &v4.;
			 _lastf=max(of dt3[*]);
		%end;	
    format _lastf yymmdd10.; 
run;

*1).GET THE LAST DATE;
proc sort;by name decending _lastf;run;
proc sort data=&ds.a out=&ds.f nodupkey;by name;run;
run;

*2). APPEND ALL DS;
%if &ord=-1 %then %do;
proc append base=d1f  Data=&ds.f;
%end;
%else %if &ord=1 %then %do;
proc append base=d2f  Data=&ds.f;
%end;
%else %if &ord=2 %then %do;
proc append base=d3f  Data=&ds.f;
%end;
 
%else %if &ord=3 %then %do;
proc append base=d4f  Data=&ds.f;
%end;
run;
%mend;
%pd2(3,D4,v1dt,v2dt,v3dt,v4dt)
%pd2(2,D3,v1dt,v2dt,v3dt,)
%pd2(1,D2,v1dt,v2dt, ,)
%pd2(-1,D1,v1dt, , ,)
 
/* 2. GET THE LARGEST DATE FROM DIFFERENT DATA SETS	*/
data all;
  set d1f d2f d3f d4f;
run;

proc sort;by name _lastf;run;

data allf;
retain name _lastf;
set all;
by name _lastf;
if last.name;
run;
purpleclothlady
Pyrite | Level 9

If anyone has other way of coding for this purpose, please feel free to share. thanks again

Tom
Super User Tom
Super User

What is the expected OUTPUT from those four input datasets?

If I wanted to code it by hand for just those four datasets I would do:

data want ;
  set d1-d4 ;
  by name;
  if first.name then max=.;
  retain max;
  max=max(of _numeric_);
  if last.name;
  keep name max;
  format max yymmdd10.;
run;

Results:

 Obs    name               max
  1     CAMPBELL    2015-09-12
  2     COOK        2019-03-20
  3     EDWARDS     2018-02-21
  4     MORRIS      2019-12-04
  5     PEREZ       2021-08-01
  6     ROGERS      2012-09-01
  7     SANCHEZ     2021-07-25
  8     STEWART     2022-03-17
  9     TURNER      2018-03-31

Are those the values you are looking for?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1739 views
  • 1 like
  • 4 in conversation