BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Kurt_Bremser
Super User

So you can first merge all datasets by subjid. Then, retrieve the names of all numeric variables in this new dataset which are not your start and end dates (DICTIONARY.COLUMNS) into a macro variable. Use this macro variable to create an array, so you can loop through it and flag unwanted values.

SAS-PD
Fluorite | Level 6

I created a work dataset with all datasets with all date variables in library using Proc Content not sure how to proceed from here...kindly help!

data All_Dates;
infile datalines;
input libname $ memname $ name $ n;
datalines;
D_TEST C lab_date 1
D_TEST D Test_date 2
D_TEST E n_AEENDAT 3
D_TEST E n_AESITEAW 4
D_TEST F n_AESTDAT 5
D_TEST G VISIT_DATE 6
D_TEST G n_FTDAT 7
D_TEST G VISIT_DATE 8
;
run;

 

I need to check all dates to see if they are between D_TEST.A.STart_date and D_TEST.B.End_date

Kurt_Bremser
Super User

Another method to create code from data, by using a temporary file:

filename incfile temp;

data _null_;
length line $200;
file incfile;
set all_datasets;
by libname memname;
if _n_ = 1
then do;
  line = "data " !! catx(".",libname,"all_combined") !! "; merge a b";
  put line;
end;
if first.memname
then do;
  line = catx(".",libname,memname) !! "(keep=subjid ";
  put line;
end;
put name;
if last.memname then put ")";
if done then put "; by subjid; run;";
run;

data _null_;
infile incfile truncover;
input line $200.;
put line;
run;
/* this step is for control; if the code in the log is OK, submit the following include */

%include incfile;

filename incfile clear;

With the combined dataset, you can define an array over the date variables and run a check in a loop. 

SAS-PD
Fluorite | Level 6

Hi, could you please help me with define an array over the date variables and run a check in a loop ?

Kurt_Bremser
Super User

First, we need to get the list of variables in a macro variable:

proc sql noprint;
select name int :variables separated by " "
from all_datasets;
quit;

Next, we use that to create the array:

data want;
set all_combined;
array vars {*} &variables.;
do i = 1 to dim(vars);
  if vars{i} lt start_date or vars{i} gt end_date then flag = 1;
end;
drop i;
run;
mkeintz
PROC Star

52 files?  Really?   With (up to) 52 different date variable names?   You should probably merge all the datasets by subjid, which would make a program data vector with START_DATE, END_DATE and 52 other date varibles.

 

You don't need to run a macro 52 times, but you can benefit from making 2 macrovars, each a list of 52 names, as I do with only 2 names below.

 

data A;
infile datalines;
input Subjid $ Start_date ddmmyy10.;
format start_date e8601da.;
datalines;
Subj1 01/01/2001
Subj2 02/02/2002
Subj3 03/03/2003
Subj4 04/04/2004
;
run;

data B;
infile datalines;
input Subjid $ end_date ddmmyy10.;
format end_date e8601da.;
datalines;
Subj1 01/01/2021
Subj2 02/02/2022
Subj3 03/03/2023
Subj4 04/04/2024
;
run;

Data C;
infile datalines;
input Subjid $ lab_date ddmmyy10.;
format lab_date e8601da.;
datalines;
Subj1 01/01/1999
Subj2 02/02/2001
Subj3 03/03/2001
Subj4 04/04/2025
;
run;

data D;
infile datalines;
input Subjid $ Test_date ddmmyy10.;
format test_date e8601da.;
datalines;
Subj1 01/01/2011
Subj2 02/02/2011
Subj3 03/03/2025
Subj4 04/04/2014
;
run;


%let dslist = c             d         ;
%let varlst = lab_date      test_date ;

data _null_;
  merge a b &dslist  end=end_of_merge;
  by subjid;

  array dates {*}  &varlst;

  if _n_=1 then do;
    declare hash h ;   
    declare hash hoh ();
      hoh.definekey('i');
      hoh.definedata('h');
      hoh.definedone();
    do i=1 to dim(dates);
      h=_new_ hash(ordered:'a');
      h.definekey('subjid');
      h.definedata('subjid','start_date','end_date',vname(dates{i}));
      h.definedone();
      hoh.add();
    end;
  end;
  do i=1 to dim(dates);
    if dates{i}^=.  and (dates{i}<start_date or dates{i}>end_date) then do;
      hoh.find();
      h.add();
    end;
  end;
  if end_of_merge then do i=1 to dim(dates);
    hoh.find();
    if h.num_items>0 then h.output(dataset:cats('exclude_',scan("&dslist",i)));
  end;
run;

To expand from 2 datasets (C and D) to 52, just expand the two macrovars DSLIST and VARLST.  None of the remaining code has to be changed.

 

BTW, instead of 52 datasets, I think you would be far better off with 1 dataset, containing 52 dummy variables indicating whether the corresponding data variable is out of range.  Then you could just do filters on those dummy variables (or any combination of those dummy variables)

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SAS-PD
Fluorite | Level 6

Thank you for your response! 

 

The actual library has more than 52 datasets and about 115 dates variables to be checked against start and end dates. I agree with you to create 1 dataset with all 115+ dates variables from 52+ datasets (all dates variable in a librabry), subjid being the common variable among all datasets in the library. I was able to create a work dataset that has all datasets with only date variables in library using 


proc contents data = d_test._all_ out=content; run;

data content1;
set content;
where format = 'MMDDYY' ;
keep libname memname name n;
n+1;
run;

 

not sure how to proceed from here...kindly help!

Tom
Super User Tom
Super User

First question is do you really have non-standard variable names like that? With spaces in the name?  That will make the coding harder.  You will have to use name literals to reference the variables.

 

You can use the FMTINFO() function to find all of the variables that have a DATE type format attached to them (DATE, YYMMDD, etc.).  So if all of the datasets are in the library pointed to by the libref MYLIB the code would look like this.

proc contents data=MYLIB._all_ noprint out=contents; run;
data datevars;
  set contents;
  where 'date'=fmtinfo(format,'cat');
  keep libname memname name;
run;

Now you can use this list to generate the code you want.  Do you want to process each dataset separately?  Let's assume you already have made the dataset with the three key variables (let's call them SUBJECT, START and END).  Let's call this dataset SUBJECTS.  Now for each dataset that has date variables you just want to do something like:

data bad_ds1 ;
  merge subjects(in=in1) mylib.ds1(in=in2);
  by subject;
  if in2;
  array dates datevar1 datevar2 ;
  anybad=0;
  do i=1 to dim(dates) until (anybad);
    anybad = (.Z < dates[i] < START) or (dates[i] > END);
  end;
  if anybad;
run;

Which will make a subset of the data with just the observations where at least one of the DATE variables is populated but not in the range from START to END (where START and END are for that particular subject).  Note this assumes that all of the datasets are already sorted by the SUBJECT variable.

 

So the only thing that changes in that as you work you way through the list of data variables is dataset names and the list of variables in the ARRAY statement.  So you could easily generate it from the DATEVARS dataset.

filename code temp;
data _null_;
  file code;
  set datevars;
  by memname ;
  if first.memname then put 
  'data bad_' memname ';'
/ '  merge subjects(in=in1) mylib.' memname '(in=in2);'
/ '  by subject;'
/ '  if in2;'
/ '  array dates ' @
  ;
  put name @;
  if last.memname then put 
  ';'
/ '  anybad=0;'
/ '  do i=1 to dim(dates) until (anybad);'
/ '    anybad = (.Z < dates[i] < START) or (dates[i] > END);'
/ '  end;'
/ '  if anybad;'
/ 'run;'
  ;
run;
SAS-PD
Fluorite | Level 6

Hello, thank you for your response!

 

I apologize as I am new to SAS... How do I sort all datasets in a library to subjid? 

Kurt_Bremser
Super User

@SAS-PD wrote:

Hello, thank you for your response!

 

I apologize as I am new to SAS... How do I sort all datasets in a library to subjid? 


Given the dataset you already have, like this:

data _null_;
set all_dates;
call execute("
  proc sort data="
!! catx(".",libname,memname) !!
  ";
  by subjid;
  run;
");
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 970 views
  • 2 likes
  • 6 in conversation