Hello SAS community,
I have a question regarding how to capture certain rows that fall under a certain criteria.
For example I have a database that has clinical information on clients. (Client 1 and 2 in this example). I also have which provider they saw and what their admission (Admit Date) and Dishcarge (dsch date) dates are. I want to only keep the first rowe where the difference between the discharge date and the next admit date (for the same client) is less than 15.
As such I would keep only row 1 (for client1) and rows 4 and 5 for Client 2.
Suggestions on SAS code that can get this output?
Row # | Client | Provider | Admit Date | Dsch Date |
1 | 1 | 3 | 2/8/2016 | 2/10/2016 |
2 | 1 | 3 | 2/26/2016 | 2/22/2016 |
3 | 1 | 4 | 2/23/2016 | 2/24/2016 |
4 | 2 | 1 | 1/2/2016 | 1/3/2016 |
5 | 2 | 3 | 3/20/2016 | 3/23/2016 |
6 | 2 | 4 | 4/25/2016 | 4/26/2016
|
Thank you in advance!
Hi. Did you mean "greater than or equal to 15 days" since that waht your dsired output indicates, yes/no? Also, row #2 in your data has a discharge date prior to the admit date. Nionetheless (and presuming GE 15 days, not LT 15 days) ...
data x;
input client provider (admit disch) (:mmddyy.);
format admit disch mmddyy10.;
datalines;
1 3 2/8/2016 2/10/2016
1 3 2/26/2016 2/22/2016
1 4 2/23/2016 2/24/2016
2 1 1/2/2016 1/3/2016
2 3 3/20/2016 3/23/2016
2 4 4/25/2016 4/26/2016
;
data y;
retain rec 1;
do j=1 by 1;
set x;
by client;
rec + 1;
if last.client then leave;
set x (keep=admit rename=(admit=next_admit)) point=rec;
if next_admit - disch ge 15 then output;
end;
drop j;
run;
DATA SET: y
client provider admit disch next_admit
1 3 02/08/2016 02/10/2016 02/26/2016
2 1 01/02/2016 01/03/2016 03/20/2016
2 3 03/20/2016 03/23/2016 04/25/2016
This is a good program for the "set/by plus merge-with-offset" technique. Assuming your data are sorted by client_id/disch_date:
data want;
set have (keep=client);
by client;
merge have
have (firstobs=2 keep=admit_date rename=(admit_date=next_admit));
if last.client=0 and (next_admit-dsch_date)>=15; /*Changed admit_date to next_admit*/
run;
Correction made above thanks to @MikeZdeb
Notes:
In general this is a good technique for looking ahead for a condition to control the status of the record-in-hand.
Nice.
ps I think that this ... if last.client=0 and (admit_date - dsch_date)>=15;
was supoosed to be ... if last.client=0 and (next_admit - dsch_date)>=15;
Yes/no?
I have follow-up and clarifcaiton to my question,
Apologies for being confusing.
To clarify, what I want to do is if a client has multiple visits to a provider any visits that are within 15 days of each other to only take the first visit. So if a client is discharged on Jan 1 and then seen again within 15 days (so between Jan 2 and Jan 15) then only to include the Jan 1 visit. Then if a client is seen on Jan 16 then to include the Jan 16 visit as well.
My example table was not the best. Actaully according to the above description then from my example table I should be expecting row 1 for client 1 and all three rows for client 2.
I did try both ways that were suggested and it kind of worked.
So when I have the data set as follows:
(note that row 2 the admit date is before the discharge date which is wrong as zdeb had correctly pointed out)
data x;
input client provider (admit disch) (:mmddyy.);
format admit disch mmddyy10.;
datalines;
1 3 2/8/2016 2/10/2016
1 3 2/26/2016 2/22/2016
1 4 2/23/2016 2/24/2016
2 1 1/2/2016 1/3/2016
2 3 3/20/2016 3/23/2016
2 4 4/25/2016 4/26/2016
;
run;
I get the output where it gives me back rows 1, 4 and 5. NOTE: I also need row 6 for client 2 to also output.
However, if I change the admit date on row two from '2/26/16' to '2/11/16' then I only get rows 4 and 5 from client 2 and I dont get any rows from client 1 back. I would want to get row 1 from client 1 still.
To clarify, there were mistakes in my example. I would want row 1 from client 1 and all three rows from client 2, since what I am trying to do is that if there are multiple visits of a cilent and those multiple visits are all within 15 days of each other then to only give me the first vist record, however if the visits are greater than 15 days from each other then to give me each visit that is greater than 15 days. Hope this makes more sense and apologies for any confusion.
Suggestions to the code that you had shared would be greatly appreciated!
Thank you very very much! Appreciate it!
Does the following do what you want?
data have; input client provider (admit disch) (:mmddyy.); format admit disch mmddyy10.; datalines; 1 3 2/8/2016 2/10/2016 1 3 2/11/2016 2/22/2016 1 4 2/23/2016 2/24/2016 2 1 1/2/2016 1/3/2016 2 3 3/20/2016 3/23/2016 2 4 4/25/2016 4/26/2016 ; run; data want (drop=last: next:); set have (keep=client); by client; retain last_disch; merge have have (firstobs=2 keep=admit rename=(admit=next_admit)); last_disch=lag(disch); if first.client then output; else if last.client=0 then do; if (next_admit - disch)>=15 then output; end; else if last.client then do; if admit-last_disch >=15 then output; end; run;
Art, CEO, AnalystFinder.com
OK, for all pair of consecutive records, you want only the first if it is 15 days or less prior to the latter record. And you also want all records that are more the 15 days prior.
data want;
set have (keep=client);
by client;
merge have
have (firstobs=2 keep=admit rename=(admit_date=next_admit));
if admit_date>lag(discharge_date)+15 and first.client=0 then output;
else if (last.client=0 and next_admit<=dsch_date+15) then output;
run;
But let's rephrase your request - basically you want all records EXCEPT those that are within 15 days of the prior record, right? If so, then the program can be much more compact:
data want;
set have;
by client;
if admit_date-15 < lag(dsch_date) and first.client=0 then delete;
run;
Editted addition, after @art297's endorsement:
I should add an important caveat to the 2nd option above. Make sure the "admit_date-15<lag(dsch_date)" is always the FIRST condition in the "if ... then delete" statement. I.e. don't use "if first.client=0 and admit_date-15 < lag(dsch_date) then delete".
Why? Because, regardless of the result of the IF test you always want to update the (rather short) queue maintained by the lag function (i.e. you always wnat to replace the retrieved dsch_date value with the new value of dsch_date). And since there is an "AND" connecting two conditions in the if test, SAS is smart enough not to waste resources testing the 2nd condition when the first is not true. Putting the "lag()" part as the 2nd condition would mean the queue is not always updated. Once that happens erroneous results are guaranteed.
data have;
input client provider (admit disch) (:mmddyy.);
format admit disch mmddyy10.;
datalines;
1 3 2/8/2016 2/10/2016
1 3 2/26/2016 2/22/2016
1 4 2/23/2016 2/24/2016
2 1 1/2/2016 1/3/2016
2 3 3/20/2016 3/23/2016
2 4 4/25/2016 4/26/2016
;
data want;
merge have
have (firstobs=2 keep=client admit
rename=(admit=next_admit client=_client));
if client=_client and (next_admit-disch)>=15;
run;
Thank you Ksharp for your reply.
I tried this and it didnt get the result I needed.
I ran the following (Note I fixed row 2 admit date as it was after the disch date originally which was incorrect on my part)
data have;
input client provider (admit disch) (:mmddyy.);
format admit disch mmddyy10.;
datalines;
1 3 2/8/2016 2/10/2016
1 3 2/16/2016 2/22/2016
1 4 2/23/2016 2/24/2016
2 1 1/2/2016 1/3/2016
2 3 3/20/2016 3/23/2016
2 4 4/25/2016 4/26/2016
;
run;
data want;
merge have
have (firstobs=2 keep=client admit
rename=(admit=next_admit client=_client));
if client=_client and (next_admit-disch)>=15;
run;
I get the following:
client | provider | admit | disch | _client | next_admit |
2 | 1 | 1/2/2016 | 1/3/2016 | 2 | 3/20/2016 |
2 | 3 | 3/20/2016 | 3/23/2016 | 2 | 4/25/2016 |
which is not what I need. I need rows 1 for client 1 and all three rows for client 3.
But a big thank you for replying!
I think mkeintz and possibly art297s suggestions may have worked!
Testing it on my actual and bigger database!
Thank you ALL for your help, I love asking questions here and getting many useful and helpful responses!!!
I really appreciate it! You are all fantastic!!
FWIW: I would go with @mkeintz's 2nd suggested solution:
data want; set have; by client; if admit-15 < lag(disch) and first.client=0 then delete; run;
It does the same thing as my suggested code, but will run a lot faster than the code I suggested.
Art, CEO, AnalystFinder.com
Hello @art297,
I was just sending you a note when I noticed your reply.
You are correct @mkeintz 2nd suggestion actually yielded the result I needed.
Actually I was trying your code too (which was cool and i wanted to try it) but it didnt quite give me the results I needed.
For example: if I have a table of admit and disch of the client below
client | admit | disch |
1 | 3/5/2016 | 3/8/2016 |
1 | 3/10/2016 | 3/12/2016 |
1 | 4/3/2016 | 4/8/2016 |
1 | 4/19/2016 | 4/21/2016 |
1 | 4/23/2016 | 4/24/2016 |
1 | 5/23/2016 | 5/24/2016 |
1 | 6/1/2016 | 6/6/2016 |
1 | 12/23/2016 | 12/29/2016 |
I need the following rows to be output:
client | admit | disch |
1 | 3/5/2016 | 3/8/2016 |
1 | 4/3/2016 | 4/8/2016 |
1 | 5/23/2016 | 5/24/2016 |
1 | 12/23/2016 | 12/29/2016 |
@mkeintz 2nd suggestion does that however when I run the code you provided I dont get the exact rows I get:
client | admit | disch |
1 | 3/5/2016 | 3/8/2016 |
1 | 3/10/2016 | 3/12/2016 |
1 | 4/23/2016 | 4/24/2016 |
1 | 6/1/2016 | 6/6/2016 |
1 | 12/23/2016 | 12/29/2016 |
here is the full tables with the coloumns "needed", "mkeintz" and "art", the x under "mkeintz" and "art" indicate rows that were outputed using the respective codes and the x under the "needed" coloumn are as you may have guessed the rows that were needed:
client | admit | disch | needed | mkeintz | art |
1 | 3/5/2016 | 3/8/2016 | x | x | x |
1 | 3/10/2016 | 3/12/2016 | x | ||
1 | 4/3/2016 | 4/8/2016 | x | x | |
1 | 4/19/2016 | 4/21/2016 | |||
1 | 4/23/2016 | 4/24/2016 | x | ||
1 | 5/23/2016 | 5/24/2016 | x | x | |
1 | 6/1/2016 | 6/6/2016 | x | ||
1 | 12/23/2016 | 12/29/2016 | x | x | x |
trying to figure out where the code needs to be tweaked. maybe in the first. last.
Thanks again to all
My code wouldn't work because the logic wasn't correct. Unlike @mkeintz's code, mine was designed to keep rather than delete records. However, it's design was to compare current with next and sometimes previous values. The latter part wasn't in line with your specs.
The corrected version to keep rather than delete would be:
data want (drop=last:); set have; by client; retain last_disch; last_disch=lag(disch); if first.client then output; else if (admit - last_disch)>=15 then output; run;
That should make the same selections as @mkeintz's code.
Art, CEO, AnalystFinder.com
Or, in the interest of ever-more-compact code:
data want;
set have;
by client;
if (admit-15)>lag(disch) or first.client;
run;
Like my other note, make sure the test on lag(disch) is the first condition in the subsetting IF.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.