BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smunigala
Obsidian | Level 7

Hi all,

I have a data set with the following variables:

 

ID            Procedure      Date_of_Procedure

1                X                   10Mar2016

1                X                   10Apr2016

2                X                   12Jan2016

2                X                   13Jan2016

2                X                    24Feb2016

2                X                   10Mar2016

2                Y                   10Mar2016

3                Y                   11Dec2016

3                Y                   12Dec2016

4                X                    24Jul2016

4                Y                    25Jul2016

 

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

 

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Any help with the SAS code would be greatly appreciated!

 

Thanks!

Sat

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@smunigala wrote:

Hi all,

I have a data set with the following variables:

 

ID            Procedure      Date_of_Procedure

1                X                   10Mar2016

1                X                   10Apr2016

2                X                   12Jan2016

2                X                   13Jan2016

2                X                    24Feb2016

2                X                   10Mar2016

2                Y                   10Mar2016

3                Y                   11Dec2016

3                Y                   12Dec2016

4                X                    24Jul2016

4                Y                    25Jul2016

 

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

 

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Any help with the SAS code would be greatly appreciated!

 

Thanks!

Sat


 

 

Here's one way...there are probably several others. In the future please post your data as a data step - like the first step in my program to create the HAVE data set.

 

	data have;
	informat id $1. procedure $1. date_of_procedure date9.;
	format id $1. procedure $1. date_of_procedure date9.;
	
	input ID   $         Procedure  $    Date_of_Procedure ;
	cards;
	1                X                   10Mar2016
	1                X                   10Apr2016
	2                X                   12Jan2016
	2                X                   13Jan2016
	2                X                    24Feb2016
	2                X                   10Mar2016
	2                Y                   10Mar2016
	3                Y                   11Dec2016
	3                Y                   12Dec2016
	4                X                    24Jul2016
	4                Y                    25Jul2016
	;
	run;
	
	proc sql;
	create table both_procedures as
	select *
	from have
	group by id
	having count(distinct procedure)>1;
	quit;
	
	proc sql;
	create table both_procs_2day as
	select h1.id, h1.date_of_procedure as date_x,
	h2.id as record_id, h2.date_of_procedure as date_y
	from (select * from both_procedures where procedure='X') as h1 full join
	(select * from both_procedures where procedure='Y') as h2
	on h1.id=h2.id 
	where abs(date_x - date_y) <= 2;
	quit;

 

 

 

View solution in original post

11 REPLIES 11
Reeza
Super User

@smunigala wrote:

Hi all,

I have a data set with the following variables:

 

ID            Procedure      Date_of_Procedure

1                X                   10Mar2016

1                X                   10Apr2016

2                X                   12Jan2016

2                X                   13Jan2016

2                X                    24Feb2016

2                X                   10Mar2016

2                Y                   10Mar2016

3                Y                   11Dec2016

3                Y                   12Dec2016

4                X                    24Jul2016

4                Y                    25Jul2016

 

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

 

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Any help with the SAS code would be greatly appreciated!

 

Thanks!

Sat


 

 

Here's one way...there are probably several others. In the future please post your data as a data step - like the first step in my program to create the HAVE data set.

 

	data have;
	informat id $1. procedure $1. date_of_procedure date9.;
	format id $1. procedure $1. date_of_procedure date9.;
	
	input ID   $         Procedure  $    Date_of_Procedure ;
	cards;
	1                X                   10Mar2016
	1                X                   10Apr2016
	2                X                   12Jan2016
	2                X                   13Jan2016
	2                X                    24Feb2016
	2                X                   10Mar2016
	2                Y                   10Mar2016
	3                Y                   11Dec2016
	3                Y                   12Dec2016
	4                X                    24Jul2016
	4                Y                    25Jul2016
	;
	run;
	
	proc sql;
	create table both_procedures as
	select *
	from have
	group by id
	having count(distinct procedure)>1;
	quit;
	
	proc sql;
	create table both_procs_2day as
	select h1.id, h1.date_of_procedure as date_x,
	h2.id as record_id, h2.date_of_procedure as date_y
	from (select * from both_procedures where procedure='X') as h1 full join
	(select * from both_procedures where procedure='Y') as h2
	on h1.id=h2.id 
	where abs(date_x - date_y) <= 2;
	quit;

 

 

 

smunigala
Obsidian | Level 7

Hi Reeza,

Thanks for the quick reply! Can I identify/ select cases which are close to each other (before I do the 2 day cutoff)? If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Sat

Reeza
Super User

@smunigala wrote:

Hi Reeza,

Thanks for the quick reply! Can I identify/ select cases which are close to each other (before I do the 2 day cutoff)? If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Sat


I don't understand what you mean.

You can remove the filter then you'll have all the differences and you can compare them. 

If there are more than 2 records that match, I think both would be selected right now. 

smunigala
Obsidian | Level 7

Hi Reeza,

Sorry, I got the code, but can you help me with the datetime variable rather than date variable? I am looking at 2 hours difference. can

where abs(date_x- date_y) work if my date variables are datetime variables and I need two hours difference rather than two days? I tried but I am not getting the date_of_procedure as date_x and date_y.

 

Thanks!

Reeza
Super User

DATEPART will convert to date.

Datetimes are number of seconds from Jan 1, 1960. 

 

If you need to account for time then change '2' days to represent the equivalent number of seconds. 

smunigala
Obsidian | Level 7

my datetime variables are not getting labeled as date_x and date_y, it is just showing as date_of_procedure.

Reeza
Super User

I'm assuming you've made changes to the code?

Post the code please.

smunigala
Obsidian | Level 7

My datevariable is DATETIME20.

ChrisNZ
Tourmaline | Level 20

I find the explanations a bit unclear, but maybe this does the job in a simple manner.

Uncomment the 2-day filter if needed.

data HAVE;
  informat ID $1. PROCEDURE $1. DATE date9.;
  format   ID $1. PROCEDURE $1. DATE date9.;
  input ID $ PROCEDURE  $  DATE ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
4 Y 28Jul2016
run;

proc sql;
  select unique a.ID
  , a.DATE as DATE1
  , b.DATE as DATE2
  , b.DATE - a.DATE as DIF
  from HAVE a
     , HAVE b
  where A.ID        eq b.ID
    and a.PROCEDURE ne b.PROCEDURE
    and a.DATE      le b.DATE 
  /*  and calculated DIF le 2*/
  group by a.ID
  having DIF=min(DIF);
quit;
ID DATE1 DATE2 DIF
2 10MAR2016 10MAR2016 0
4 24JUL2016 25JUL2016 1

 

 

ChrisNZ
Tourmaline | Level 20

If you want step by step control, this should help:

 

data HAVE (index=(A=(ID DATE PROCEDURE)));
  informat ID $1. PROCEDURE $1. DATE date9.;
  format   ID $1. PROCEDURE $1. DATE date9.;
  input ID $ PROCEDURE  $  DATE ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
4 Y 28Jul2016
run;

data XY_EVENTS;
  set HAVE;
  by ID DATE PROCEDURE;
  LAST_ID        = lag(ID);
  LAST_PROCEDURE = lag(PROCEDURE);
  LAST_DATE      = lag(DATE);
  if ID = LAST_ID and PROCEDURE ne LAST_PROCEDURE then do;
    N   =_N_;
    DIF =DATE-LAST_DATE;   
    output;
  end;
  keep ID N DIF;
run;

proc sql;
  create table XY_MIN as
  select * from XY_EVENTS
  group by ID
  having DIF=min(DIF);
quit;

data WANT; 
  set XY_MIN;
  POINT=N-1; 
  set HAVE point=POINT;
  output;
  POINT=N;                                       
  set HAVE point=POINT;
  output;
  drop N;
run;

proc print noobs;
run;
ID DIF PROCEDURE DATE
2 0 X 10MAR2016
2 0 Y 10MAR2016
4 1 X 24JUL2016
4 1 Y 25JUL2016

 

 

Ksharp
Super User

What if there are multiple X and Y within 2 days, what you gonna do ?

 

 

data have;
	informat id $1. procedure $1. date_of_procedure date9.;
	format id $1. procedure $1. date_of_procedure date9.;
	
	input ID   $         Procedure  $    Date_of_Procedure ;
	cards;
	1                X                   10Mar2016
	1                X                   10Apr2016
	2                X                   12Jan2016
	2                X                   13Jan2016
	2                X                    24Feb2016
	2                X                   10Mar2016
	2                Y                   10Mar2016
	3                Y                   11Dec2016
	3                Y                   12Dec2016
	4                X                    24Jul2016
	4                Y                    25Jul2016
	;
	run;
proc sort data=have;
by id date_of_procedure procedure;
run;
data want;
 set have;
 if id=lag(id) and procedure ne lag(procedure) and
    dif(date_of_procedure) lt 3 then do;
  output;n=_n_-1;
  set have point=n;
  output;
 end;
 drop n;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1399 views
  • 1 like
  • 4 in conversation