BookmarkSubscribeRSS Feed
byeh2017
Quartz | Level 8

I have a dataset that are inpatient hospitalizations which include a unique identifier as well as the patient’s discharge date. There are patients in this dataset that have been admitted more than once. How do I write a sas code that take only the earliest possible admission date of the patient and delete all subsequent entries with later discharge dates if that particular patient happens to have multiple admissions? This will ultimately lead to the modified dataset to have all unique entries per ID. I have included a sample dataset on excel. Thank you so much for your help!

12 REPLIES 12
byeh2017
Quartz | Level 8

sorry, attached is the sas dataset file

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data, in the form of a datastep using the code window ({i} above where you post).  Use this post to help you:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

byeh2017
Quartz | Level 8
%macro data2datastep(dsn,lib,file,obs);
%local varlist msgtype;

%if %superq(obs)= %then %let obs=MAX;

%let msgtype=NOTE;
%if %superq(dsn)= %then %do;
   %let msgtype=ERROR;
   %put &msgtype: You must specify a data set name;
   %put;
   %goto syntax;
%end;
%let dsn=%qupcase(%superq(dsn));
%if %superq(dsn)=!HELP %then %do;
%syntax:
   %put &msgtype: &SYSMACRONAME macro help document:;
   %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
   %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,file,obs>);
   %put &msgtype- dsn:  Name of the dataset to be converted. Required.;
   %put &msgtype- lib:  LIBREF where the dataset resides. Optional.;
   %put &msgtype- file: Fully qulaified filename for the DATA step produced. Optional.;
   %put &msgtype-       Default is %nrstr(create_&lib._&dsn._data.sas) in the SAS default directory.;
   %put &msgtype- obs:  Max observations to include the created dataset. Optional.;
   %put &msgtype-       Default is MAX (all observations);
   %put;
   %put NOTE:   &SYSMACRONAME cannot be used in-line - it generates code.;
   %put NOTE-   Use !HELP to print these notes.;
   %return;
%end; 

%if %superq(lib)= %then %do;
    %let lib=%qscan(%superq(dsn),1,.);
    %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
    %else %let dsn=%qscan(&dsn,2,.);
%end;
%let lib=%qupcase(%superq(lib));
%let dsn=%qupcase(%superq(dsn));

%if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
   %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
   %let msgtype=NOTE;
   %GoTo syntax;
%end;

%if %superq(file)= %then %do;
   %let file=create_&lib._&dsn._data.sas;
   %if %symexist(USERDIR) %then %let file=&userdir/&file;
%end;

%if %symexist(USERDIR) %then %do;
   %if %qscan(%superq(file),-1,/\)=%superq(file) %then
      %let file=&userdir/&file;
%end;

proc sql noprint;
select Name
      into :varlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
select case type
          when 'num' then 
             case 
                when missing(format) then cats(Name,':32.')
                else cats(Name,':',format)
             end 
          else cats(Name,':$',length,'.')
       end
      into :inputlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
quit;

data _null_;
   file "&file" dsd;
   if _n_ =1 then do;
      put "data &lib..&dsn;";
      put @3 "infile datalines dsd truncover;";
      put @3 "input %superq(inputlist);";
      put "datalines4;";
   end;
   set &lib..&dsn(obs=&obs) end=last; 
   put &varlist @;
   if last then do;
      put;
      put ';;;;';
   end;
   else put;
run;
%mend;

Hi, 

 

I'm having trouble trying to find where exactly are the lines where I set up my dataset into this macro. Thanks for your help!

Kurt_Bremser
Super User

You simply call the macro with

%data2datastep(dataset,library,filename,nobs);

as described loud and clear in https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Quote:

 

 

The macro call is:

%data2datastep(dsn, lib, file, obs);

 

dsn = input data set name, required

 

lib = library where input data set resides, optional

 

file = name of output file, optional

 

obs = number of obs to include in the data step, optional
byeh2017
Quartz | Level 8

Sorry guys, finally got it to work. Here it is:

 

data MYDATA.SAMPLEEARLIESTDATE;
  infile datalines dsd truncover;
  input Date:MMDDYY10. ID:BEST. Screened:$3.;
datalines4;
11/01/2017,14522778554,Yes
12/01/2017,14522778556,No
01/01/2018,14522778556,Yes
02/01/2018,14522778557,No
03/01/2018,14522778558,Yes
04/01/2018,14522778559,No
05/01/2018,14522778560,Yes
06/01/2018,14522778560,No
07/01/2018,14522778562,Yes
08/01/2018,14522778563,No
09/01/2018,14522778564,Yes
10/01/2018,14522778566,No
11/01/2018,14522778566,Yes
;;;;
Kurt_Bremser
Super User

@byeh2017 wrote:

Sorry guys, finally got it to work. Here it is:

 

data MYDATA.SAMPLEEARLIESTDATE;
  infile datalines dsd truncover;
  input Date:MMDDYY10. ID:BEST. Screened:$3.;
datalines4;
11/01/2017,14522778554,Yes
12/01/2017,14522778556,No
01/01/2018,14522778556,Yes
02/01/2018,14522778557,No
03/01/2018,14522778558,Yes
04/01/2018,14522778559,No
05/01/2018,14522778560,Yes
06/01/2018,14522778560,No
07/01/2018,14522778562,Yes
08/01/2018,14522778563,No
09/01/2018,14522778564,Yes
10/01/2018,14522778566,No
11/01/2018,14522778566,Yes
;;;;

Nice, as it should be.

Use @rogerjdeangelis solution.

mkeintz
PROC Star

Is your data sorted already?  This makes a big difference if the data set is large:

 

  1. Is your data already sorted by id and date?  If so, the optimal solution takes one form.
  2. If your data is sorted by id only, then the optimal solution takes another form
  3. If you data is not sorted, .... you guessed it - a third form is optimal

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rogerjdeangelis
Barite | Level 11
How do I write a sas code that take only the earliest possible admission date of
the patient and delete all subsequent entries with later discharge dates if that
particular patient happens to have multiple admissions?


HAVE
====

Obs          DATE         ID        SCREENED

  1    11/01/2017    14522778554      Yes     Keep

  2    12/01/2017    14522778556      No      Keep
  3    12/02/2017    14522778556      No
  4    12/03/2017    14522778556      No

  5    01/01/2018     4522778556      Yes     keep
  6    01/02/2018     4522778556      Yes

  7    02/01/2018     4522778557      No      keep
  8    02/02/2018     4522778557      No
  9    02/03/2018     4522778557      No

 10    03/01/2018     4522778558      Yes     keep
 11    03/02/2018     4522778558      Yes
 12    03/03/2018     4522778558      Yes



WANT
====

Obs         ID              DATE    SCREENED

  1    14522778554    11/01/2017      Yes

  2    14522778556    12/01/2017      No

  3     4522778556    01/01/2018      Yes

  4     4522778557    02/01/2018      No

  5     4522778558    03/01/2018      Yes


FULL SOLUTION
==============


* CREATE SOME DATA;
data have(drop=dte);
retain id date;
input Dte mmddyy10. ID Screened $;
do date=dte to dte+int(4*uniform(5731));
  output;
end;
cards4;
11/1/2017 14522778554 Yes
12/1/2017 14522778556 No
1/1/2018 14522778556 Yes
2/1/2018 14522778557 No
3/1/2018 14522778558 Yes
4/1/2018 14522778559 No
5/1/2018 14522778560 Yes
6/1/2018 14522778560 No
7/1/2018 14522778562 Yes
8/1/2018 14522778563 No
9/1/2018 14522778564 Yes
10/1/2018 14522778566 No
11/1/2018 14522778566 Yes
;;;;
run;quit;

data want;
  set have;
  by id notsorted;
  if first.id;
run;quit;


proc print data=want;
format date mmddyy10.;
run;quit;




byeh2017
Quartz | Level 8

Thank you. From my understanding, I'll need to sort it first by ID and then by date before I filter, right?

 

I'm also having a problem with using my existing dataset "mydata.sampleearliestdate" and setting a new dataset with "mydata.sampleearliestdateCLEAN" Not sure why it is not letting me create a new dataset to house to clean data

 

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 61         
 62         data mydata.sampleearliestdate;
 63           set mydata.sampleearliestdateCLEAN;
 ERROR: File MYDATA.SAMPLEEARLIESTDATECLEAN.DATA does not exist.
 64           by id notsorted;
 65           if first.id;
 66         run;
 
 NOTE: The SAS System stopped processing this step because of errors.
byeh2017
Quartz | Level 8

Actually nevermind about that question about the error. I realized I had use the set funciton incorrectly to reference a new dataset I wanted to create rather than an old one.

byeh2017
Quartz | Level 8

Thanks, I haven't sorted it yet. Would this be the right code to do this? Also, does this automatically save it onto the dataset or will I need to save it manually? 

 

proc sort data=mydata.sampleearliestdate;
by ID;
by date;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3197 views
  • 0 likes
  • 5 in conversation