BookmarkSubscribeRSS Feed
sas_student1
Quartz | Level 8

Hello SAS community,

 

I have a question regarding how to capture certain rows that fall under a certain criteria.

For example I have a database that has clinical information on clients. (Client 1 and 2 in this example). I also have which provider they saw and what their admission (Admit Date) and Dishcarge (dsch date) dates are. I want to only keep the first rowe where the difference between the discharge date and the next admit date (for the same client) is less than 15.

As such I would keep only row 1 (for client1) and rows 4 and 5 for Client 2.

 

Suggestions on SAS code that can get this output?

 

Row #ClientProviderAdmit DateDsch Date
1132/8/20162/10/2016
2132/26/20162/22/2016
3142/23/20162/24/2016
4211/2/20161/3/2016
5233/20/20163/23/2016
6244/25/2016

4/26/2016

 

 

 

Thank you in advance!

12 REPLIES 12
MikeZdeb
Rhodochrosite | Level 12

Hi. Did you mean "greater than or equal to 15 days" since that waht your dsired output indicates, yes/no? Also, row #2 in your data has a discharge date prior to the admit date. Nionetheless (and presuming GE 15 days, not LT 15 days) ...

 

data x;
input client provider (admit disch) (:mmddyy.);
format admit disch mmddyy10.;
datalines;
1 3 2/8/2016 2/10/2016
1 3 2/26/2016 2/22/2016
1 4 2/23/2016 2/24/2016
2 1 1/2/2016 1/3/2016
2 3 3/20/2016 3/23/2016
2 4 4/25/2016 4/26/2016
;

data y;
retain rec 1;
do j=1 by 1;
  set x;
  by client;
  rec + 1;
  if last.client then leave;
  set x (keep=admit rename=(admit=next_admit)) point=rec;
  if next_admit - disch ge 15 then output;
end;
drop j;
run;

 

DATA SET: y

client provider   admit     disch    next_admit

1        3     02/08/2016 02/10/2016 02/26/2016
2        1     01/02/2016 01/03/2016 03/20/2016
2        3     03/20/2016 03/23/2016 04/25/2016

mkeintz
PROC Star

This is a good program for the "set/by plus merge-with-offset" technique.  Assuming your data are sorted by client_id/disch_date:

 

data want;
  set have (keep=client);
  by client;
  merge have
        have (firstobs=2 keep=admit_date rename=(admit_date=next_admit));
  if last.client=0 and (next_admit-dsch_date)>=15;  /*Changed admit_date to next_admit*/
run;

 

Correction made above  thanks to @MikeZdeb

Notes:

  1. "SET/BY": The SET  and BY statement are there generate the automatirc variables first.client and last.client, indicating whether the record-in-hand is at the beginning and/or end of data for a given client
  2. "Merge with offset": the MERGE statement merges dataset HAVE with itself, but the 2nd reference to have is offset by one record ("firstobs=2").  And the 2nd reference keeps only one variable - admit_date, which is renamed to next_admit.
  3. The result is that it is easy to test for the conditions you require., using the subsetting if statement above.

 

In general this is a good technique for looking ahead for a condition to control the status of the record-in-hand.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
MikeZdeb
Rhodochrosite | Level 12

Nice.

 

ps I think that this ... if last.client=0 and (admit_date - dsch_date)>=15;

 

was supoosed to be ... if last.client=0 and (next_admit - dsch_date)>=15;

 

Yes/no?

sas_student1
Quartz | Level 8

I have follow-up and clarifcaiton to my question,

Apologies for being confusing.

 

To clarify, what I want to do is if a client has multiple visits to a provider any visits that are within 15 days of each other to only take the first visit. So if a client is discharged on Jan 1 and then seen again within 15 days (so between Jan 2 and Jan 15) then only to include the Jan 1 visit. Then if a client is seen on Jan 16 then to include the Jan 16 visit as well.

 

My example table was not the best. Actaully according to the above description then from my example table I should be expecting row 1 for client 1 and all three rows for client 2.

 

I did try both ways that were suggested and it kind of worked.

So when I have the data set as follows:

(note that row 2 the admit date is before the discharge date which is wrong as zdeb had correctly pointed out)

 

data x;

input client provider (admit disch) (:mmddyy.);

format admit disch mmddyy10.;

datalines;

1 3 2/8/2016 2/10/2016

1 3 2/26/2016 2/22/2016

1 4 2/23/2016 2/24/2016

2 1 1/2/2016 1/3/2016

2 3 3/20/2016 3/23/2016

2 4 4/25/2016 4/26/2016

;

run;

 

I get the output where it gives me back rows 1, 4 and 5. NOTE: I also need row 6 for client 2 to also output.

 

However, if I change the admit date on row two from '2/26/16' to '2/11/16' then I only get rows 4 and 5 from client 2 and I dont get any rows from client 1 back. I would want to get row 1 from client 1 still.

 

To clarify, there were mistakes in my example. I would want row 1 from client 1 and all three rows from client 2, since what I am trying to do is that if there are multiple visits of a cilent and those multiple visits are all within 15 days of each other then to only give me the first vist record, however if the visits are greater than 15 days from each other then to give me each visit that is greater than 15 days. Hope this makes more sense and apologies for any confusion.

 

Suggestions to the code that you had shared would be greatly appreciated!

Thank you very very much! Appreciate it!

art297
Opal | Level 21

Does the following do what you want?

data have;
  input client provider (admit disch) (:mmddyy.);
  format admit disch mmddyy10.;
  datalines;
1 3 2/8/2016 2/10/2016
1 3 2/11/2016 2/22/2016
1 4 2/23/2016 2/24/2016
2 1 1/2/2016 1/3/2016
2 3 3/20/2016 3/23/2016
2 4 4/25/2016 4/26/2016
;
run;

data want (drop=last: next:);
  set have (keep=client);
  by client;
  retain last_disch;
  merge have
        have (firstobs=2 keep=admit rename=(admit=next_admit));
  last_disch=lag(disch);
  if first.client then output;
  else if last.client=0 then do;
    if (next_admit - disch)>=15 then output;
  end;
  else if last.client then do;
    if admit-last_disch >=15 then output;
  end;
run;

Art, CEO, AnalystFinder.com

mkeintz
PROC Star

OK, for all pair of consecutive records, you want only the first if it is 15 days or less prior to the latter record.  And you also want all records that are more the 15 days prior.

 

 

 

data want;
  set have (keep=client);
  by client;
  merge have
        have (firstobs=2 keep=admit rename=(admit_date=next_admit));

  if admit_date>lag(discharge_date)+15 and first.client=0 then output;
  else if (last.client=0 and next_admit<=dsch_date+15) then output;
run;

 

 

But let's rephrase your request  - basically you want all records EXCEPT those that are within 15 days of the prior record, right?  If so, then the program can be much more compact:

 

data want;
  set have;
  by client;
  if admit_date-15 < lag(dsch_date) and first.client=0 then delete;
run;

 

Editted addition, after @art297's endorsement: 

I should add an important caveat to the 2nd option above.  Make sure the "admit_date-15<lag(dsch_date)" is always the FIRST condition in the "if ... then delete" statement.  I.e. don't use "if first.client=0 and admit_date-15 < lag(dsch_date) then delete".

 

Why?  Because, regardless of the result of the IF test you always want to update the (rather short) queue maintained by the lag function (i.e. you always wnat to replace the retrieved dsch_date value with the new value of dsch_date).   And since there is an "AND" connecting two conditions in the if test, SAS is smart enough not to waste resources testing the 2nd condition when the first is not true.  Putting the "lag()" part as the 2nd condition would mean the queue is not always updated.  Once that happens erroneous results are guaranteed.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
input client provider (admit disch) (:mmddyy.);
format admit disch mmddyy10.;
datalines;
1 3 2/8/2016 2/10/2016
1 3 2/26/2016 2/22/2016
1 4 2/23/2016 2/24/2016
2 1 1/2/2016 1/3/2016
2 3 3/20/2016 3/23/2016
2 4 4/25/2016 4/26/2016
;
data want;
merge have
      have (firstobs=2 keep=client admit 
      rename=(admit=next_admit client=_client));
  if client=_client and (next_admit-disch)>=15; 
run;
sas_student1
Quartz | Level 8

Thank you Ksharp for your reply.


I tried this and it didnt get the result I needed.

I ran the following (Note I fixed row 2 admit date as it was after the disch date originally which was incorrect on my part)

data have;

input client provider (admit disch) (:mmddyy.);

format admit disch mmddyy10.;

datalines;

1 3 2/8/2016 2/10/2016

1 3 2/16/2016 2/22/2016

1 4 2/23/2016 2/24/2016

2 1 1/2/2016 1/3/2016

2 3 3/20/2016 3/23/2016

2 4 4/25/2016 4/26/2016

;

run;

data want;

merge have

have (firstobs=2 keep=client admit

rename=(admit=next_admit client=_client));

if client=_client and (next_admit-disch)>=15;

run;

I get the following:

 

clientprovideradmitdisch_clientnext_admit
211/2/20161/3/201623/20/2016
233/20/20163/23/201624/25/2016

 

which is not what I need. I need rows 1 for client 1 and all three rows for client 3.

But a big thank you for replying!


I think mkeintz and possibly art297s suggestions may have worked!


Testing it on my actual and bigger database!

 

Thank you ALL for your help, I love asking questions here and getting many useful and helpful responses!!!


I really appreciate it! You are all fantastic!!

art297
Opal | Level 21

FWIW: I would go with @mkeintz's 2nd suggested solution:

data want;
  set have;
  by client;
  if admit-15 < lag(disch) and first.client=0 then delete;
run;

It does the same thing as my suggested code, but will run a lot faster than the code I suggested.

 

Art, CEO, AnalystFinder.com

 

sas_student1
Quartz | Level 8

Hello @art297,

 

I was just sending you a note when I noticed your reply.

You are correct @mkeintz 2nd suggestion actually yielded the result I needed.

 

Actually I was trying your code too (which was cool and i wanted to try it) but it didnt quite give me the results I needed.

 

For example: if I have a table of admit and disch of the client below

 

clientadmitdisch
13/5/20163/8/2016
13/10/20163/12/2016
14/3/20164/8/2016
14/19/20164/21/2016
14/23/20164/24/2016
15/23/20165/24/2016
16/1/20166/6/2016
112/23/201612/29/2016

 

I need the following rows to be output:

clientadmitdisch
13/5/20163/8/2016
14/3/20164/8/2016
15/23/20165/24/2016
112/23/201612/29/2016

 

@mkeintz 2nd suggestion does that however when I run the code you provided I dont get the exact rows I get:

clientadmitdisch
13/5/20163/8/2016
13/10/20163/12/2016
14/23/20164/24/2016
16/1/20166/6/2016
112/23/201612/29/2016

 

here is the full tables with the coloumns "needed", "mkeintz" and "art", the x under "mkeintz" and "art" indicate rows that were outputed using the respective codes and the x under the "needed" coloumn are as you may have guessed the rows that were needed:

 

clientadmitdischneededmkeintzart
13/5/20163/8/2016xxx
13/10/20163/12/2016  x
14/3/20164/8/2016xx 
14/19/20164/21/2016   
14/23/20164/24/2016  x
15/23/20165/24/2016xx 
16/1/20166/6/2016  x
112/23/201612/29/2016xxx

 

trying to figure out where the code needs to be tweaked. maybe in the first. last.

 

Thanks again to all

 

art297
Opal | Level 21

My code wouldn't work because the logic wasn't correct. Unlike @mkeintz's code, mine was designed to keep rather than delete records. However, it's design was to compare current with next and sometimes previous values. The latter part wasn't in line with your specs.

The corrected version to keep rather than delete would be:

data want (drop=last:);
  set have;
  by client;
  retain last_disch;
  last_disch=lag(disch);
  if first.client then output;
  else if (admit - last_disch)>=15 then output;
run;

That should make the same selections as @mkeintz's code.

 

Art, CEO, AnalystFinder.com

mkeintz
PROC Star

Or, in the interest of ever-more-compact code:

 

data want;

  set have;

  by client;

  if (admit-15)>lag(disch) or first.client;

run;

 

Like my other note, make sure the test on lag(disch) is the first condition in the subsetting IF.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 6020 views
  • 5 likes
  • 5 in conversation