BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SAS-PD
Obsidian | Level 7

Hello,

 

I am trying to find min and max of the date per subject from all datasets in a library 

 

I have created I have created DateVariables dataset using proc content to list all date variables from table1 and table2.

*Want1;

*I would like to list all dates available for the subject in entire library;
/*subjID date11 date12 date13 date21 ...(all dates available)*/

subjID date11 date12 date13 date21
---------------------------------------------------------------------------
subj1 01/01/21 01/01/22 01/01/23 02/02/02
subj2 02/01/21 02/01/22 02/01/23

*want2;
*find min and max of the date by subjID ;

subjID date_min date_max
------------------------------------------------------------
subj1 02/02/02 01/01/23
subj2 02/01/21 02/01/23

 

data d_test.table1;
infile datalines ;
input subjID $ ar5 $ var6 $ date11: ddmmyy10. date12: ddmmyy10. date13: ddmmyy10.;
format date11 mmddyy10. date12 mmddyy10. date13 mmddyy10.;
datalines;
subj1 xxx sy 01/01/21 01/01/22 01/01/23
subj2 xttx ry 02/01/21 02/01/22 02/01/23
;
run;

data d_test.table2;
infile datalines ;
input subjID $ var3 $ date21: ddmmyy10.;
format date21 mmddyy10.;
datalines;
subj1 xx 02/02/02
subj2 z .
;
run;


*list all date variables from table1 and table2;
data d_test.DateVariables;
infile datalines ;
input table $ varname $ ;
datalines;
table1 date11
table1 date12
table1 date13
table2 date21
table3 date31
table3 date32
table3 date33
table3 date34
;
run;

 

Thanks for your guidance!

1 ACCEPTED SOLUTION

Accepted Solutions
dxiao2017
Pyrite | Level 9

Hi @SAS-PD , have you figured out and solved your question? I write a macro to convert all tables with all columns of character date values to numeric (I assume that in your raw datasets, all date variables are character type, and I write the macro base on this condition). But some major problems need to be solved before use this macro: 1) because I used a nested do loop, it produced 2*3=6 tables (in fact I do not know why it is 6, I thought it should be 2*4=8), i.e., for each of my raw data table, it produced two more duplicates. Also, each time I run this macro again, it produced new variables and tables according to the statement and make more duplicates of the datasets and variables. Without solve this problem the macro cannot be used because it causes mess by the duplications, 2) I am not sure if I should use any %local statement inside the macro to prevent it overwrite any macro variable that already has that macro name, @Tom , could you please take a look on my code and offer some suggestions and modification, thanks very much! And @SAS-PD , could you please let me know if this solve your questions, thanks a lot!

data vdraw1;
   input subjid $ var5 $ var6 $ v11dat $ 
         v12dat $ v13dat $;
   datalines;
subj1 xxx sy 01/01/21 01/01/22 01/01/23
subj2 xttx ry 02/01/21 02/01/22 02/01/23
;
run;
proc print data=vdraw1;run;
data vdraw2;
   input subjid $ var3 $ v21dat $;
   datalines;
subj1 xx 02/02/02
subj2 z .
;
run;
proc print data=vdraw2;run;
proc sql;
select distinct memname
   into :table1-
   from dictionary.columns
   where libname='WORK' and
         memname like 'VDRAW%' and
         name like '%dat';
quit;
%let tableobs=&sqlobs;
%put &tableobs;
proc sql;
select distinct name
   into :var1-
   from dictionary.columns
   where libname='WORK' and
         memname like 'VDRAW%' and
         name like '%dat';
quit;
%let varobs=&sqlobs;
%put &varobs;
%macro char2num;
%do i=1 %to &tableobs;
data &&table&i.._num;
   set &&table&i;
   %do j=1 %to &varobs;
   &&var&j.._num=input(&&var&j,mmddyy10.);
   %end;
run;
proc print data=&&table&i.._num;
proc contents data=&&table&i.._num;
%end;
%mend char2num;
%char2num;

dxiao2017_0-1745679893951.png

dxiao2017_1-1745680278757.png

dxiao2017_2-1745680469747.png

 

dxiao2017_3-1745680562540.png

dxiao2017_4-1745680676173.png

Another thing I would like to mention is that, @SAS-PD , in your raw datasets, the informat and format of the date variables are not consistent, I marked it out (see picture), the informat is ddmmyy., the format is mmddyy., i.e., the date and month are reversed and confusing without knowing which is month and which is date.

Untitled.png

SAS help cars; we are cars; that is why my default image;

View solution in original post

28 REPLIES 28
Tom
Super User Tom
Super User

Getting the min/max is pretty simple.

Let's make your example datasets first.

Spoiler
data table1;
  input subjID $ ar5 $ var6 $ (date11 - date13) (:ddmmyy.);
format date11-date13 date9. ;
datalines;
subj1 xxx sy 01/01/21 01/01/22 01/01/23
subj2 xttx ry 02/01/21 02/01/22 02/01/23
;

data table2;
  input subjID $ var3 $ date21 :ddmmyy.;
  format date21 date9.;
datalines;
subj1 xx 02/02/02
subj2 z .
;

data DateVariables;
  input (table varname) (:$32.) ;
datalines;
table1 date11
table1 date12
table1 date13
table2 date21
;

Now we just need to get the dataset list and variable list from your metadata table.

proc sql noprint;
select distinct varname into :varlist separated by ' ' from DateVariables;
select distinct table into :dslist separated by ' ' from DateVariables;
quit;

To be able to generate a dataset to find the min/max over all date variables in all datasets in one step.

data want;
  set &dslist ;
  by subjid;
  array _dates &varlist ;
  if first.subjid then do;
    min=.;max=.;
  end;
  retain min max;
  min=min(of min &varlist);
  max=max(of max &varlist);
  if last.subjid;
  keep subjid min max;
  format min max date9.;
run;

Results

       subj
Obs     ID            min          max

 1     subj1    02FEB2002    01JAN2023
 2     subj2    02JAN2021    02JAN2023

 

 

SAS-PD
Obsidian | Level 7

Thank you, Tom!

 

I apologize...I think I gave wrong sample datasets as date variables name is not static.

 

I am trying to find min and max of the date per subject from all datasets in a library where variable ends with 'DAT'

in one library, I have 50+ datasets that has 120+ date variables named with 'DAT' 

Among all those 120+ dates I need to find out min and max of the date per subject

Tom
Super User Tom
Super User

You said you had already solved the issue of finding the list of variables to use.

 

Do you need help with that? 

 

If so the FMTINFO() function and the 'CAT' option will help.  I would recommend using PROC CONTENTS to get the list of variables since it will provide the bare format name required to use that function.

proc contents data=mylib._all_ noprint out=contents; run;
proc sql noprint;
create table DateVariables as 
  select memname,name 
  from contents
  where 'date'=fmtinfo(format,'cat')
  group by memname
  having 1=max(upcase(name)='SUBJID')
  order by memname,name
;
select distinct memname into :dslist separated by ' ' from DateVariables;
select distinct nliteral(name) into :varlist separated by ' ' from DateVariables;
quit;
SAS-PD
Obsidian | Level 7

Thanks again!

 

Your code works with test data, however while using my datasets it is giving ERROR: BY variables are not properly sorted on data set

 

Could you please guide?

 

 

Tom
Super User Tom
Super User

The code assumes the individual datasets were already sorted by SUBJID since that was your KEY variable.

You could add code to sort the datasets (or index them) before the final data step.

You could change the code to collect the individual observation level min/max values and then use another step like PROC SUMMARY to produce the overall min/max per SUBJID.  You could even make the first step be a VIEW instead of a physical dataset to reduce the amount of diskspace you use.

data details/view=details;
  set &dslist ;
  min=min(of min &varlist);
  max=max(of max &varlist);
  keep subjid min max;
  format min max date9.;
run;
proc summary data=details nway;
  class subjid;
  var min max;
  output out=want min(min)=min max(max)=max;
run;
SAS-PD
Obsidian | Level 7

Thank you for all the explanation.

 

now, I am getting 

Invalid numeric data, testDAT='6/6/2023' error

 

All my dates are char $24. 

Tom
Super User Tom
Super User

@SAS-PD wrote:

Thank you for all the explanation.

 

now, I am getting 

Invalid numeric data, testDAT='6/6/2023' error

 

All my dates are char $24. 


Then they are NOT dates.  They are character strings. 

 

The error message is probably from the SAS data step trying to autoconvert that string into a number using the normal numeric informat.

1    data test;
2      string='6/6/2025';
3      min = min(of min string);
4    run;

NOTE: Character values have been converted to numeric
      values at the places given by: (Line):(Column).
      3:20
NOTE: Invalid numeric data, string='6/6/2025' , at line 3 column 20.
string=6/6/2025 min=. _ERROR_=1 _N_=1
NOTE: Missing values were generated as a result of performing an operation on
      missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 3:9
NOTE: The data set WORK.TEST has 1 observations and 2 variables.

Did you use the code I posted to find the variable based on the type of FORMAT that was attached them?  It would not have found that variable since it should not have had a DATE type format attached.   Or if it did then you would get an error message when trying to use that dataset that character format named $DATE (or $MMDDYY) did not exist because SAS will automatically prefix the $ to any format attached to a character variable.

 

Is it possible that the same variable name is character in one dataset and numeric in another?  If so then how were the dataset in the libref created?  Do you perhaps let PROC IMPORT automatically guess what type of variable to create?

 

If you do want to treat the character string like '6/6/2025' as a DATE you will need to add pre-processing step to make a numeric variable from it.  And you will need to decide if the first 6 represent the month of the year of the day of the month to decide which INFORMAT to use to convert that string into a date.

dxiao2017
Pyrite | Level 9

Hi @Tom , base on your code, I made minor changes, this is my version: (1) I use a proc sql; select into :varlist step with dictionary.columns instead of the original proc contents step and your proc sql step (and this is for solve @SAS-PD 's question that the columns' name are end with 'DAT'), (2) add a proc sort step to solve the possible error msg of by var are not properly sorted. I tried to avoid using array (because I never used array the way in your code and it is a bit advanced for me, so I was going to use the simple SAS/base min() and max() functions instead) in the last step but was not able to find a simpler solution, and thus I think array in your code is the best (or necessary or only) solution to find the min and max value for several date columns in the &varlist macro, (3) create an intermediate dataset (that serves as an analysis dataset base on which to produce the final reports or tables) to be used in your data want; set &dslist step.

data vdraw1;
   input subjid $ var5 $ var6 $ v11dat v12dat v13dat;
   informat v11dat v12dat v13dat ddmmyy.;
   format v11dat v12dat v13dat date9.;
   datalines;
subj1 xxx sy 01/01/21 01/01/22 01/01/23
subj2 xttx ry 02/01/21 02/01/22 02/01/23
;
run;
proc print data=vdraw1;run;
data vdraw2;
   input subjid $ var3 $ v21dat ddmmyy8.;
   format v21dat date9.;
   datalines;
subj1 xx 02/02/02
subj2 z .
;
run;
proc print data=vdraw2;run;
proc sql;
select memname,
       name
   into :dslist separated by ' ',
        :varlist separated by ' '
   from dictionary.columns
   where libname='WORK' and
         memname like 'VDRAW%' and
         name like '%dat';
quit;
data vdate;
   set &dslist;
   keep subjid &varlist;
run;
proc print data=vdate;run;
proc sort data=vdate out=vdsort nodup;
   by subjid;
run;
proc print data=vdsort;run;
data vdfinal;
   set vdsort;
   by subjid;
   array vds &varlist;
   if first.subjid then do;
      min=.;
      max=.;
   end;
   retain min max;
   min=min(of min &varlist);
   max=max(of max &varlist);
   if last.subjid;
   keep subjid min max;
   format min max date9.;
run;
proc print data=vdfinal;run;

dxiao2017_0-1745362880962.png

dxiao2017_2-1745362978474.png

dxiao2017_3-1745363062918.png

 

SAS help cars; we are cars; that is why my default image;
Tom
Super User Tom
Super User

Your code is not even trying to find the actual DATE variables.  Instead it is selecting based on the NAME of the variable.  That is probably why you ended up selecting a character variable.

 

Using DICTIONARY.COLUMNS will make it harder to find the actual DATE variable.  That is because in DICTIONARy.COLUMNS the value of FORMAT for your date variables will be 'DATE9.' which is NOT compatible for passing to the FMTINFO() function.  You can add some extra code to strip the digits and period from the end of FORMAT value.

fmtinfo(prxchange('s/\d*\.*\d*$//',-1,trim(format)),'cat')

 

The ARRAY statemen tis not needed, as your code shows.

dxiao2017
Pyrite | Level 9

Thank you very much for telling me the standard and most accurate way to find the actual DATE variables, which uses fmtinfo()='date' as the only criteria. I'll learn to use this function later on. However, in this case, I think the rule should not be so rigid, because it is only a small library that contains 50+ datesets with 120+ date variables, and the variable names are clearly defined (end with 'DAT'). Use dictionary.columns should be sufficient and accurate enough to find out the date variables. 

 

You are right about that the array was not needed, in fact I myself wondered why the array was used that way (in the statements followed there was even no anything referred to the array name, and I thought that was an advanced technique that I should learn it later), meanwhile I was thinking to apply same calculation on a set of variables, array is the right choice, and thus I did not find that the array was not necessary and was not used in the proper way. I deleted the array statement and the result is the same:

data vdfinal;
   set vdsort;
   by subjid;
   if first.subjid then do;
      min=.;
      max=.;
   end;
   retain min max;
   min=min(of min &varlist);
   max=max(of max &varlist);
   if last.subjid;
   keep subjid min max;
   format min max date9.;
run;
proc print data=vdfinal;run;

dxiao2017_0-1745371143660.png

 

SAS help cars; we are cars; that is why my default image;
dxiao2017
Pyrite | Level 9

And since array is not needed here, an amendment to my previous thread: the best solution here to find out each subject's min and max values for a set of date variables in a &varlist macro is the by var and if.first/if.last method you used in your answer. I thought to use array or sql but did not come up with an optimal solution. Proc means works but need to produce one more dataset and need one more step to apply date9. format to the date values.

SAS help cars; we are cars; that is why my default image;
SAS-PD
Obsidian | Level 7

Thank you, Tom!

 

I changed the code to take dates that are ending with DAT as per requirement and those happens to be character string dates (example 4/26/2023).  what changes to be made in code to convert all those character dates to resolve the error?  

 

Tom
Super User Tom
Super User

The main thing is you would need to add code to convert the string to a date.  Once you have a date you could then check if for min/max values.

 

Say you have a variable named CHAR_DAT in a dataset name HAVE.

You could make a new dataset named WANT that adds a variable named NUM_DAT by using code something like this:

data want;
  set have;
  num_dat = input(char_dat,mmddyy10.);
  format num_dat date9.;
run;

The INPUT() function uses an INFORMAT to convert text strings into values. When you use a numeric informat, like MMDDYY, then the value is numeric.  If your strings are actually in DMY order instead of the MDY order that MMDDYY informat understands then use the DDMMYY informat instead.

 

If you never have cases where the same variable name, day VISIT_DAT, exists as character in one dataset and numeric in another then you might be able to extend your current code by making two macro variables with the variable lists.  One for the numeric date variables and other for the character ones.  Then you could use that build the step that gathers the individual observations.  Note for this an array is required.

So if you created the macro variables DSLIST and CHARLIST and NUMLIST then the step to combine all of the dataset could look like this.  

data vdate;
  set &dslist;
  array _char_ &charlist;
  do over _char_ ;
    min=min(min,input(_char_,mmddyy10.));
    max=max(max,input(_char_,mmddyy10.));
  end;
  min=min(of min &numlist);
  max=max(of max &numlist);
  format min max date9.;
  keep subjid min max /* &charlist &numlist */ ;
run;

If you do have the same variable exist as different types then you need to treat each dataset separately.  At that point I would probably just use the dataset with the list of tables and variables to drive the writing of code.  So you might have it write a series of steps like this that will gather the min/max observations for a dataset and then append them into a summary dataset.

data min_max;
  set TABLE1 ;
  min=min(NUMVAR1,NUMVAR2,input(CHARVAR1,mmddyy10.));
  max=max(NUMVAR1,NUMVAR2,input(CHARVAR1,mmddyy10.));
  format min max date9.;
  if n(min,max);
  keep subjid min max;
run;
proc append base=VDATE data=min_max force;
run;

So the next one might reference different variables.

data min_max;
  set TABLE2 ;
  min=min(NUMVAR2,input(CHARVAR2,mmddyy10.));
  max=max(NUMVAR2,input(CHARVAR2,mmddyy10.));
  format min max date9.;
  if n(min,max);
  keep subjid min max;
run;
proc append base=VDATE data=min_max force;
run;

Once you have done that for all of your dataset with DAT variables you will have a similar VDATE dataset you can then collapse to one observation per SUBJID.

SAS-PD
Obsidian | Level 7

Thanks for explaining all the scenarios.  

 

Is there a way to check those DAT variables only? That way I will only have all character dates to find min and max? Also, how do I create CHARLIST and NUMLIST macro variable? 

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 28 replies
  • 1196 views
  • 8 likes
  • 3 in conversation