DATA Step, Macro, Functions and more

Filtering out later dates and keeping earliest date entries

Reply
Frequent Contributor
Posts: 77

Filtering out later dates and keeping earliest date entries

I have a dataset that are inpatient hospitalizations which include a unique identifier as well as the patient’s discharge date. There are patients in this dataset that have been admitted more than once. How do I write a sas code that take only the earliest possible admission date of the patient and delete all subsequent entries with later discharge dates if that particular patient happens to have multiple admissions? This will ultimately lead to the modified dataset to have all unique entries per ID. I have included a sample dataset on excel. Thank you so much for your help!

Super User
Posts: 7,844

Re: Filtering out later dates and keeping earliest date entries

See my previous post https://communities.sas.com/t5/Base-SAS-Programming/Filtering-out-entries-that-do-not-have-a-certain... about presenting example data.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 77

Re: Filtering out later dates and keeping earliest date entries

sorry, attached is the sas dataset file

Attachment
Super User
Super User
Posts: 7,979

Re: Filtering out later dates and keeping earliest date entries

Post test data, in the form of a datastep using the code window ({i} above where you post).  Use this post to help you:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Frequent Contributor
Posts: 77

Re: Filtering out later dates and keeping earliest date entries

%macro data2datastep(dsn,lib,file,obs);
%local varlist msgtype;

%if %superq(obs)= %then %let obs=MAX;

%let msgtype=NOTE;
%if %superq(dsn)= %then %do;
   %let msgtype=ERROR;
   %put &msgtype: You must specify a data set name;
   %put;
   %goto syntax;
%end;
%let dsn=%qupcase(%superq(dsn));
%if %superq(dsn)=!HELP %then %do;
%syntax:
   %put &msgtype: &SYSMACRONAME macro help document:;
   %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
   %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,file,obs>);
   %put &msgtype- dsn:  Name of the dataset to be converted. Required.;
   %put &msgtype- lib:  LIBREF where the dataset resides. Optional.;
   %put &msgtype- file: Fully qulaified filename for the DATA step produced. Optional.;
   %put &msgtype-       Default is %nrstr(create_&lib._&dsn._data.sas) in the SAS default directory.;
   %put &msgtype- obs:  Max observations to include the created dataset. Optional.;
   %put &msgtype-       Default is MAX (all observations);
   %put;
   %put NOTE:   &SYSMACRONAME cannot be used in-line - it generates code.;
   %put NOTE-   Use !HELP to print these notes.;
   %return;
%end; 

%if %superq(lib)= %then %do;
    %let lib=%qscan(%superq(dsn),1,.);
    %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
    %else %let dsn=%qscan(&dsn,2,.);
%end;
%let lib=%qupcase(%superq(lib));
%let dsn=%qupcase(%superq(dsn));

%if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
   %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
   %let msgtype=NOTE;
   %GoTo syntax;
%end;

%if %superq(file)= %then %do;
   %let file=create_&lib._&dsn._data.sas;
   %if %symexist(USERDIR) %then %let file=&userdir/&file;
%end;

%if %symexist(USERDIR) %then %do;
   %if %qscan(%superq(file),-1,/\)=%superq(file) %then
      %let file=&userdir/&file;
%end;

proc sql noprint;
select Name
      into :varlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
select case type
          when 'num' then 
             case 
                when missing(format) then cats(Name,':32.')
                else cats(Name,':',format)
             end 
          else cats(Name,':$',length,'.')
       end
      into :inputlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
quit;

data _null_;
   file "&file" dsd;
   if _n_ =1 then do;
      put "data &lib..&dsn;";
      put @3 "infile datalines dsd truncover;";
      put @3 "input %superq(inputlist);";
      put "datalines4;";
   end;
   set &lib..&dsn(obs=&obs) end=last; 
   put &varlist @;
   if last then do;
      put;
      put ';;;;';
   end;
   else put;
run;
%mend;

Hi, 

 

I'm having trouble trying to find where exactly are the lines where I set up my dataset into this macro. Thanks for your help!

Super User
Posts: 7,844

Re: Filtering out later dates and keeping earliest date entries

You simply call the macro with

%data2datastep(dataset,library,filename,nobs);

as described loud and clear in https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Quote:

 

 

The macro call is:

%data2datastep(dsn, lib, file, obs);

 

dsn = input data set name, required

 

lib = library where input data set resides, optional

 

file = name of output file, optional

 

obs = number of obs to include in the data step, optional
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 77

Re: Filtering out later dates and keeping earliest date entries

Posted in reply to KurtBremser

Sorry guys, finally got it to work. Here it is:

 

data MYDATA.SAMPLEEARLIESTDATE;
  infile datalines dsd truncover;
  input Date:MMDDYY10. ID:BEST. Screened:$3.;
datalines4;
11/01/2017,14522778554,Yes
12/01/2017,14522778556,No
01/01/2018,14522778556,Yes
02/01/2018,14522778557,No
03/01/2018,14522778558,Yes
04/01/2018,14522778559,No
05/01/2018,14522778560,Yes
06/01/2018,14522778560,No
07/01/2018,14522778562,Yes
08/01/2018,14522778563,No
09/01/2018,14522778564,Yes
10/01/2018,14522778566,No
11/01/2018,14522778566,Yes
;;;;
Super User
Posts: 7,844

Re: Filtering out later dates and keeping earliest date entries


byeh2017 wrote:

Sorry guys, finally got it to work. Here it is:

 

data MYDATA.SAMPLEEARLIESTDATE;
  infile datalines dsd truncover;
  input Date:MMDDYY10. ID:BEST. Screened:$3.;
datalines4;
11/01/2017,14522778554,Yes
12/01/2017,14522778556,No
01/01/2018,14522778556,Yes
02/01/2018,14522778557,No
03/01/2018,14522778558,Yes
04/01/2018,14522778559,No
05/01/2018,14522778560,Yes
06/01/2018,14522778560,No
07/01/2018,14522778562,Yes
08/01/2018,14522778563,No
09/01/2018,14522778564,Yes
10/01/2018,14522778566,No
11/01/2018,14522778566,Yes
;;;;

Nice, as it should be.

Use @rogerjdeangelis solution.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 1,022

Re: Filtering out later dates and keeping earliest date entries

Is your data sorted already?  This makes a big difference if the data set is large:

 

  1. Is your data already sorted by id and date?  If so, the optimal solution takes one form.
  2. If your data is sorted by id only, then the optimal solution takes another form
  3. If you data is not sorted, .... you guessed it - a third form is optimal

 

Valued Guide
Posts: 505

Re: Filtering out later dates and keeping earliest date entries

How do I write a sas code that take only the earliest possible admission date of
the patient and delete all subsequent entries with later discharge dates if that
particular patient happens to have multiple admissions?


HAVE
====

Obs          DATE         ID        SCREENED

  1    11/01/2017    14522778554      Yes     Keep

  2    12/01/2017    14522778556      No      Keep
  3    12/02/2017    14522778556      No
  4    12/03/2017    14522778556      No

  5    01/01/2018     4522778556      Yes     keep
  6    01/02/2018     4522778556      Yes

  7    02/01/2018     4522778557      No      keep
  8    02/02/2018     4522778557      No
  9    02/03/2018     4522778557      No

 10    03/01/2018     4522778558      Yes     keep
 11    03/02/2018     4522778558      Yes
 12    03/03/2018     4522778558      Yes



WANT
====

Obs         ID              DATE    SCREENED

  1    14522778554    11/01/2017      Yes

  2    14522778556    12/01/2017      No

  3     4522778556    01/01/2018      Yes

  4     4522778557    02/01/2018      No

  5     4522778558    03/01/2018      Yes


FULL SOLUTION
==============


* CREATE SOME DATA;
data have(drop=dte);
retain id date;
input Dte mmddyy10. ID Screened $;
do date=dte to dte+int(4*uniform(5731));
  output;
end;
cards4;
11/1/2017 14522778554 Yes
12/1/2017 14522778556 No
1/1/2018 14522778556 Yes
2/1/2018 14522778557 No
3/1/2018 14522778558 Yes
4/1/2018 14522778559 No
5/1/2018 14522778560 Yes
6/1/2018 14522778560 No
7/1/2018 14522778562 Yes
8/1/2018 14522778563 No
9/1/2018 14522778564 Yes
10/1/2018 14522778566 No
11/1/2018 14522778566 Yes
;;;;
run;quit;

data want;
  set have;
  by id notsorted;
  if first.id;
run;quit;


proc print data=want;
format date mmddyy10.;
run;quit;




Frequent Contributor
Posts: 77

Re: Filtering out later dates and keeping earliest date entries

Posted in reply to rogerjdeangelis

Thank you. From my understanding, I'll need to sort it first by ID and then by date before I filter, right?

 

I'm also having a problem with using my existing dataset "mydata.sampleearliestdate" and setting a new dataset with "mydata.sampleearliestdateCLEAN" Not sure why it is not letting me create a new dataset to house to clean data

 

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 61         
 62         data mydata.sampleearliestdate;
 63           set mydata.sampleearliestdateCLEAN;
 ERROR: File MYDATA.SAMPLEEARLIESTDATECLEAN.DATA does not exist.
 64           by id notsorted;
 65           if first.id;
 66         run;
 
 NOTE: The SAS System stopped processing this step because of errors.
Frequent Contributor
Posts: 77

Re: Filtering out later dates and keeping earliest date entries

Actually nevermind about that question about the error. I realized I had use the set funciton incorrectly to reference a new dataset I wanted to create rather than an old one.

Frequent Contributor
Posts: 77

Re: Filtering out later dates and keeping earliest date entries

Thanks, I haven't sorted it yet. Would this be the right code to do this? Also, does this automatically save it onto the dataset or will I need to save it manually? 

 

proc sort data=mydata.sampleearliestdate;
by ID;
by date;
run;
Ask a Question
Discussion stats
  • 12 replies
  • 156 views
  • 0 likes
  • 5 in conversation