Hi all,
I have a data set with the following variables:
ID Procedure Date_of_Procedure
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.
Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).
Any help with the SAS code would be greatly appreciated!
Thanks!
Sat
@smunigala wrote:
Hi all,
I have a data set with the following variables:
ID Procedure Date_of_Procedure
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.
Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).
Any help with the SAS code would be greatly appreciated!
Thanks!
Sat
Here's one way...there are probably several others. In the future please post your data as a data step - like the first step in my program to create the HAVE data set.
data have;
informat id $1. procedure $1. date_of_procedure date9.;
format id $1. procedure $1. date_of_procedure date9.;
input ID $ Procedure $ Date_of_Procedure ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
;
run;
proc sql;
create table both_procedures as
select *
from have
group by id
having count(distinct procedure)>1;
quit;
proc sql;
create table both_procs_2day as
select h1.id, h1.date_of_procedure as date_x,
h2.id as record_id, h2.date_of_procedure as date_y
from (select * from both_procedures where procedure='X') as h1 full join
(select * from both_procedures where procedure='Y') as h2
on h1.id=h2.id
where abs(date_x - date_y) <= 2;
quit;
@smunigala wrote:
Hi all,
I have a data set with the following variables:
ID Procedure Date_of_Procedure
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.
Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).
Any help with the SAS code would be greatly appreciated!
Thanks!
Sat
Here's one way...there are probably several others. In the future please post your data as a data step - like the first step in my program to create the HAVE data set.
data have;
informat id $1. procedure $1. date_of_procedure date9.;
format id $1. procedure $1. date_of_procedure date9.;
input ID $ Procedure $ Date_of_Procedure ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
;
run;
proc sql;
create table both_procedures as
select *
from have
group by id
having count(distinct procedure)>1;
quit;
proc sql;
create table both_procs_2day as
select h1.id, h1.date_of_procedure as date_x,
h2.id as record_id, h2.date_of_procedure as date_y
from (select * from both_procedures where procedure='X') as h1 full join
(select * from both_procedures where procedure='Y') as h2
on h1.id=h2.id
where abs(date_x - date_y) <= 2;
quit;
Hi Reeza,
Thanks for the quick reply! Can I identify/ select cases which are close to each other (before I do the 2 day cutoff)? If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).
Sat
@smunigala wrote:
Hi Reeza,
Thanks for the quick reply! Can I identify/ select cases which are close to each other (before I do the 2 day cutoff)? If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).
Sat
I don't understand what you mean.
You can remove the filter then you'll have all the differences and you can compare them.
If there are more than 2 records that match, I think both would be selected right now.
Hi Reeza,
Sorry, I got the code, but can you help me with the datetime variable rather than date variable? I am looking at 2 hours difference. can
where abs(date_x- date_y) work if my date variables are datetime variables and I need two hours difference rather than two days? I tried but I am not getting the date_of_procedure as date_x and date_y.
Thanks!
DATEPART will convert to date.
Datetimes are number of seconds from Jan 1, 1960.
If you need to account for time then change '2' days to represent the equivalent number of seconds.
my datetime variables are not getting labeled as date_x and date_y, it is just showing as date_of_procedure.
I'm assuming you've made changes to the code?
Post the code please.
My datevariable is DATETIME20.
I find the explanations a bit unclear, but maybe this does the job in a simple manner.
Uncomment the 2-day filter if needed.
data HAVE;
informat ID $1. PROCEDURE $1. DATE date9.;
format ID $1. PROCEDURE $1. DATE date9.;
input ID $ PROCEDURE $ DATE ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
4 Y 28Jul2016
run;
proc sql;
select unique a.ID
, a.DATE as DATE1
, b.DATE as DATE2
, b.DATE - a.DATE as DIF
from HAVE a
, HAVE b
where A.ID eq b.ID
and a.PROCEDURE ne b.PROCEDURE
and a.DATE le b.DATE
/* and calculated DIF le 2*/
group by a.ID
having DIF=min(DIF);
quit;
ID | DATE1 | DATE2 | DIF |
---|---|---|---|
2 | 10MAR2016 | 10MAR2016 | 0 |
4 | 24JUL2016 | 25JUL2016 | 1 |
If you want step by step control, this should help:
data HAVE (index=(A=(ID DATE PROCEDURE)));
informat ID $1. PROCEDURE $1. DATE date9.;
format ID $1. PROCEDURE $1. DATE date9.;
input ID $ PROCEDURE $ DATE ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
4 Y 28Jul2016
run;
data XY_EVENTS;
set HAVE;
by ID DATE PROCEDURE;
LAST_ID = lag(ID);
LAST_PROCEDURE = lag(PROCEDURE);
LAST_DATE = lag(DATE);
if ID = LAST_ID and PROCEDURE ne LAST_PROCEDURE then do;
N =_N_;
DIF =DATE-LAST_DATE;
output;
end;
keep ID N DIF;
run;
proc sql;
create table XY_MIN as
select * from XY_EVENTS
group by ID
having DIF=min(DIF);
quit;
data WANT;
set XY_MIN;
POINT=N-1;
set HAVE point=POINT;
output;
POINT=N;
set HAVE point=POINT;
output;
drop N;
run;
proc print noobs;
run;
ID | DIF | PROCEDURE | DATE |
---|---|---|---|
2 | 0 | X | 10MAR2016 |
2 | 0 | Y | 10MAR2016 |
4 | 1 | X | 24JUL2016 |
4 | 1 | Y | 25JUL2016 |
What if there are multiple X and Y within 2 days, what you gonna do ?
data have;
informat id $1. procedure $1. date_of_procedure date9.;
format id $1. procedure $1. date_of_procedure date9.;
input ID $ Procedure $ Date_of_Procedure ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
;
run;
proc sort data=have;
by id date_of_procedure procedure;
run;
data want;
set have;
if id=lag(id) and procedure ne lag(procedure) and
dif(date_of_procedure) lt 3 then do;
output;n=_n_-1;
set have point=n;
output;
end;
drop n;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.