turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- selecting unique cases

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 04:48 PM

Hi all,

I have a data set with the following variables:

ID Procedure Date_of_Procedure

1 X 10Mar2016

1 X 10Apr2016

2 X 12Jan2016

2 X 13Jan2016

2 X 24Feb2016

2 X 10Mar2016

2 Y 10Mar2016

3 Y 11Dec2016

3 Y 12Dec2016

4 X 24Jul2016

4 Y 25Jul2016

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

Any help with the SAS code would be greatly appreciated!

Thanks!

Sat

Accepted Solutions

Solution

03-24-2017
09:22 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 04:58 PM - edited 03-23-2017 05:07 PM

smunigala wrote:

Hi all,

I have a data set with the following variables:

ID Procedure Date_of_Procedure

1 X 10Mar2016

1 X 10Apr2016

2 X 12Jan2016

2 X 13Jan2016

2 X 24Feb2016

2 X 10Mar2016

2 Y 10Mar2016

3 Y 11Dec2016

3 Y 12Dec2016

4 X 24Jul2016

4 Y 25Jul2016

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

Any help with the SAS code would be greatly appreciated!

Thanks!

Sat

Here's one way...there are probably several others. In the future please post your data as a data step - like the first step in my program to create the HAVE data set.

```
data have;
informat id $1. procedure $1. date_of_procedure date9.;
format id $1. procedure $1. date_of_procedure date9.;
input ID $ Procedure $ Date_of_Procedure ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
;
run;
proc sql;
create table both_procedures as
select *
from have
group by id
having count(distinct procedure)>1;
quit;
proc sql;
create table both_procs_2day as
select h1.id, h1.date_of_procedure as date_x,
h2.id as record_id, h2.date_of_procedure as date_y
from (select * from both_procedures where procedure='X') as h1 full join
(select * from both_procedures where procedure='Y') as h2
on h1.id=h2.id
where abs(date_x - date_y) <= 2;
quit;
```

All Replies

Solution

03-24-2017
09:22 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 04:58 PM - edited 03-23-2017 05:07 PM

smunigala wrote:

Hi all,

I have a data set with the following variables:

ID Procedure Date_of_Procedure

1 X 10Mar2016

1 X 10Apr2016

2 X 12Jan2016

2 X 13Jan2016

2 X 24Feb2016

2 X 10Mar2016

2 Y 10Mar2016

3 Y 11Dec2016

3 Y 12Dec2016

4 X 24Jul2016

4 Y 25Jul2016

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

Any help with the SAS code would be greatly appreciated!

Thanks!

Sat

Here's one way...there are probably several others. In the future please post your data as a data step - like the first step in my program to create the HAVE data set.

```
data have;
informat id $1. procedure $1. date_of_procedure date9.;
format id $1. procedure $1. date_of_procedure date9.;
input ID $ Procedure $ Date_of_Procedure ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
;
run;
proc sql;
create table both_procedures as
select *
from have
group by id
having count(distinct procedure)>1;
quit;
proc sql;
create table both_procs_2day as
select h1.id, h1.date_of_procedure as date_x,
h2.id as record_id, h2.date_of_procedure as date_y
from (select * from both_procedures where procedure='X') as h1 full join
(select * from both_procedures where procedure='Y') as h2
on h1.id=h2.id
where abs(date_x - date_y) <= 2;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 05:27 PM

Hi Reeza,

Thanks for the quick reply! Can I identify/ select cases which are close to each other (before I do the 2 day cutoff)? If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

Sat

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 05:31 PM

smunigala wrote:

Hi Reeza,

Thanks for the quick reply! Can I identify/ select cases which are close to each other (before I do the 2 day cutoff)? If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

Sat

I don't understand what you mean.

You can remove the filter then you'll have all the differences and you can compare them.

If there are more than 2 records that match, I think both would be selected right now.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 06:27 PM

Hi Reeza,

Sorry, I got the code, but can you help me with the datetime variable rather than date variable? I am looking at 2 hours difference. can

where abs(date_x- date_y) work if my date variables are datetime variables and I need two hours difference rather than two days? I tried but I am not getting the date_of_procedure as date_x and date_y.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 06:54 PM

DATEPART will convert to date.

Datetimes are number of seconds from Jan 1, 1960.

If you need to account for time then change '2' days to represent the equivalent number of seconds.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 06:59 PM

my datetime variables are not getting labeled as date_x and date_y, it is just showing as date_of_procedure.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 08:26 PM

I'm assuming you've made changes to the code?

Post the code please.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 06:34 PM

My datevariable is DATETIME20.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 10:12 PM

I find the explanations a bit unclear, but maybe this does the job in a simple manner.

Uncomment the 2-day filter if needed.

```
data HAVE;
informat ID $1. PROCEDURE $1. DATE date9.;
format ID $1. PROCEDURE $1. DATE date9.;
input ID $ PROCEDURE $ DATE ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
4 Y 28Jul2016
run;
proc sql;
select unique a.ID
, a.DATE as DATE1
, b.DATE as DATE2
, b.DATE - a.DATE as DIF
from HAVE a
, HAVE b
where A.ID eq b.ID
and a.PROCEDURE ne b.PROCEDURE
and a.DATE le b.DATE
/* and calculated DIF le 2*/
group by a.ID
having DIF=min(DIF);
quit;
```

ID | DATE1 | DATE2 | DIF |
---|---|---|---|

2 | 10MAR2016 | 10MAR2016 | 0 |

4 | 24JUL2016 | 25JUL2016 | 1 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 10:28 PM

If you want step by step control, this should help:

```
data HAVE (index=(A=(ID DATE PROCEDURE)));
informat ID $1. PROCEDURE $1. DATE date9.;
format ID $1. PROCEDURE $1. DATE date9.;
input ID $ PROCEDURE $ DATE ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
4 Y 28Jul2016
run;
data XY_EVENTS;
set HAVE;
by ID DATE PROCEDURE;
LAST_ID = lag(ID);
LAST_PROCEDURE = lag(PROCEDURE);
LAST_DATE = lag(DATE);
if ID = LAST_ID and PROCEDURE ne LAST_PROCEDURE then do;
N =_N_;
DIF =DATE-LAST_DATE;
output;
end;
keep ID N DIF;
run;
proc sql;
create table XY_MIN as
select * from XY_EVENTS
group by ID
having DIF=min(DIF);
quit;
data WANT;
set XY_MIN;
POINT=N-1;
set HAVE point=POINT;
output;
POINT=N;
set HAVE point=POINT;
output;
drop N;
run;
proc print noobs;
run;
```

ID | DIF | PROCEDURE | DATE |
---|---|---|---|

2 | 0 | X | 10MAR2016 |

2 | 0 | Y | 10MAR2016 |

4 | 1 | X | 24JUL2016 |

4 | 1 | Y | 25JUL2016 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 11:03 PM

What if there are multiple X and Y within 2 days, what you gonna do ?

```
data have;
informat id $1. procedure $1. date_of_procedure date9.;
format id $1. procedure $1. date_of_procedure date9.;
input ID $ Procedure $ Date_of_Procedure ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
;
run;
proc sort data=have;
by id date_of_procedure procedure;
run;
data want;
set have;
if id=lag(id) and procedure ne lag(procedure) and
dif(date_of_procedure) lt 3 then do;
output;n=_n_-1;
set have point=n;
output;
end;
drop n;
run;
```