Help using Base SAS procedures

selecting unique cases

Accepted Solution Solved
Reply
Contributor
Posts: 59
Accepted Solution

selecting unique cases

Hi all,

I have a data set with the following variables:

 

ID            Procedure      Date_of_Procedure

1                X                   10Mar2016

1                X                   10Apr2016

2                X                   12Jan2016

2                X                   13Jan2016

2                X                    24Feb2016

2                X                   10Mar2016

2                Y                   10Mar2016

3                Y                   11Dec2016

3                Y                   12Dec2016

4                X                    24Jul2016

4                Y                    25Jul2016

 

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

 

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Any help with the SAS code would be greatly appreciated!

 

Thanks!

Sat


Accepted Solutions
Solution
‎03-24-2017 09:22 AM
Super User
Posts: 19,851

Re: selecting unique cases

[ Edited ]
Posted in reply to smunigala

smunigala wrote:

Hi all,

I have a data set with the following variables:

 

ID            Procedure      Date_of_Procedure

1                X                   10Mar2016

1                X                   10Apr2016

2                X                   12Jan2016

2                X                   13Jan2016

2                X                    24Feb2016

2                X                   10Mar2016

2                Y                   10Mar2016

3                Y                   11Dec2016

3                Y                   12Dec2016

4                X                    24Jul2016

4                Y                    25Jul2016

 

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

 

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Any help with the SAS code would be greatly appreciated!

 

Thanks!

Sat


 

 

Here's one way...there are probably several others. In the future please post your data as a data step - like the first step in my program to create the HAVE data set.

 

	data have;
	informat id $1. procedure $1. date_of_procedure date9.;
	format id $1. procedure $1. date_of_procedure date9.;
	
	input ID   $         Procedure  $    Date_of_Procedure ;
	cards;
	1                X                   10Mar2016
	1                X                   10Apr2016
	2                X                   12Jan2016
	2                X                   13Jan2016
	2                X                    24Feb2016
	2                X                   10Mar2016
	2                Y                   10Mar2016
	3                Y                   11Dec2016
	3                Y                   12Dec2016
	4                X                    24Jul2016
	4                Y                    25Jul2016
	;
	run;
	
	proc sql;
	create table both_procedures as
	select *
	from have
	group by id
	having count(distinct procedure)>1;
	quit;
	
	proc sql;
	create table both_procs_2day as
	select h1.id, h1.date_of_procedure as date_x,
	h2.id as record_id, h2.date_of_procedure as date_y
	from (select * from both_procedures where procedure='X') as h1 full join
	(select * from both_procedures where procedure='Y') as h2
	on h1.id=h2.id 
	where abs(date_x - date_y) <= 2;
	quit;

 

 

 

View solution in original post


All Replies
Solution
‎03-24-2017 09:22 AM
Super User
Posts: 19,851

Re: selecting unique cases

[ Edited ]
Posted in reply to smunigala

smunigala wrote:

Hi all,

I have a data set with the following variables:

 

ID            Procedure      Date_of_Procedure

1                X                   10Mar2016

1                X                   10Apr2016

2                X                   12Jan2016

2                X                   13Jan2016

2                X                    24Feb2016

2                X                   10Mar2016

2                Y                   10Mar2016

3                Y                   11Dec2016

3                Y                   12Dec2016

4                X                    24Jul2016

4                Y                    25Jul2016

 

My goal is to identify only those IDs who have both procedures and then to find out the difference in days between the two procedures. I want to select only those patients (IDs) with both procedures within 2days only (highlighted in RED), which is my subset of interest.

 

Each ID can have only one procedure (either X or Y) or have multiple same procedures (an ID can have multiple X or multiple Y or a combination of multiple X with one Y or one X with multiple Ys). I am interested in only those patients (IDs) who have both procedures. If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Any help with the SAS code would be greatly appreciated!

 

Thanks!

Sat


 

 

Here's one way...there are probably several others. In the future please post your data as a data step - like the first step in my program to create the HAVE data set.

 

	data have;
	informat id $1. procedure $1. date_of_procedure date9.;
	format id $1. procedure $1. date_of_procedure date9.;
	
	input ID   $         Procedure  $    Date_of_Procedure ;
	cards;
	1                X                   10Mar2016
	1                X                   10Apr2016
	2                X                   12Jan2016
	2                X                   13Jan2016
	2                X                    24Feb2016
	2                X                   10Mar2016
	2                Y                   10Mar2016
	3                Y                   11Dec2016
	3                Y                   12Dec2016
	4                X                    24Jul2016
	4                Y                    25Jul2016
	;
	run;
	
	proc sql;
	create table both_procedures as
	select *
	from have
	group by id
	having count(distinct procedure)>1;
	quit;
	
	proc sql;
	create table both_procs_2day as
	select h1.id, h1.date_of_procedure as date_x,
	h2.id as record_id, h2.date_of_procedure as date_y
	from (select * from both_procedures where procedure='X') as h1 full join
	(select * from both_procedures where procedure='Y') as h2
	on h1.id=h2.id 
	where abs(date_x - date_y) <= 2;
	quit;

 

 

 

Contributor
Posts: 59

Re: selecting unique cases

Hi Reeza,

Thanks for the quick reply! Can I identify/ select cases which are close to each other (before I do the 2 day cutoff)? If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Sat

Super User
Posts: 19,851

Re: selecting unique cases

Posted in reply to smunigala

smunigala wrote:

Hi Reeza,

Thanks for the quick reply! Can I identify/ select cases which are close to each other (before I do the 2 day cutoff)? If an ID has a series of X and just one Y then, I am interested in the closest X to the Y for that ID (eg. ID 2, selected in RED).

 

Sat


I don't understand what you mean.

You can remove the filter then you'll have all the differences and you can compare them. 

If there are more than 2 records that match, I think both would be selected right now. 

Contributor
Posts: 59

Re: selecting unique cases

Hi Reeza,

Sorry, I got the code, but can you help me with the datetime variable rather than date variable? I am looking at 2 hours difference. can

where abs(date_x- date_y) work if my date variables are datetime variables and I need two hours difference rather than two days? I tried but I am not getting the date_of_procedure as date_x and date_y.

 

Thanks!

Super User
Posts: 19,851

Re: selecting unique cases

Posted in reply to smunigala

DATEPART will convert to date.

Datetimes are number of seconds from Jan 1, 1960. 

 

If you need to account for time then change '2' days to represent the equivalent number of seconds. 

Contributor
Posts: 59

Re: selecting unique cases

my datetime variables are not getting labeled as date_x and date_y, it is just showing as date_of_procedure.

Super User
Posts: 19,851

Re: selecting unique cases

Posted in reply to smunigala

I'm assuming you've made changes to the code?

Post the code please.

Contributor
Posts: 59

Re: selecting unique cases

Posted in reply to smunigala

My datevariable is DATETIME20.

PROC Star
Posts: 1,760

Re: selecting unique cases

Posted in reply to smunigala

I find the explanations a bit unclear, but maybe this does the job in a simple manner.

Uncomment the 2-day filter if needed.

data HAVE;
  informat ID $1. PROCEDURE $1. DATE date9.;
  format   ID $1. PROCEDURE $1. DATE date9.;
  input ID $ PROCEDURE  $  DATE ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
4 Y 28Jul2016
run;

proc sql;
  select unique a.ID
  , a.DATE as DATE1
  , b.DATE as DATE2
  , b.DATE - a.DATE as DIF
  from HAVE a
     , HAVE b
  where A.ID        eq b.ID
    and a.PROCEDURE ne b.PROCEDURE
    and a.DATE      le b.DATE 
  /*  and calculated DIF le 2*/
  group by a.ID
  having DIF=min(DIF);
quit;
ID DATE1 DATE2 DIF
2 10MAR2016 10MAR2016 0
4 24JUL2016 25JUL2016 1

 

 

PROC Star
Posts: 1,760

Re: selecting unique cases

Posted in reply to smunigala

If you want step by step control, this should help:

 

data HAVE (index=(A=(ID DATE PROCEDURE)));
  informat ID $1. PROCEDURE $1. DATE date9.;
  format   ID $1. PROCEDURE $1. DATE date9.;
  input ID $ PROCEDURE  $  DATE ;
cards;
1 X 10Mar2016
1 X 10Apr2016
2 X 12Jan2016
2 X 13Jan2016
2 X 24Feb2016
2 X 10Mar2016
2 Y 10Mar2016
3 Y 11Dec2016
3 Y 12Dec2016
4 X 24Jul2016
4 Y 25Jul2016
4 Y 28Jul2016
run;

data XY_EVENTS;
  set HAVE;
  by ID DATE PROCEDURE;
  LAST_ID        = lag(ID);
  LAST_PROCEDURE = lag(PROCEDURE);
  LAST_DATE      = lag(DATE);
  if ID = LAST_ID and PROCEDURE ne LAST_PROCEDURE then do;
    N   =_N_;
    DIF =DATE-LAST_DATE;   
    output;
  end;
  keep ID N DIF;
run;

proc sql;
  create table XY_MIN as
  select * from XY_EVENTS
  group by ID
  having DIF=min(DIF);
quit;

data WANT; 
  set XY_MIN;
  POINT=N-1; 
  set HAVE point=POINT;
  output;
  POINT=N;                                       
  set HAVE point=POINT;
  output;
  drop N;
run;

proc print noobs;
run;
ID DIF PROCEDURE DATE
2 0 X 10MAR2016
2 0 Y 10MAR2016
4 1 X 24JUL2016
4 1 Y 25JUL2016

 

 

Super User
Posts: 10,041

Re: selecting unique cases

Posted in reply to smunigala

What if there are multiple X and Y within 2 days, what you gonna do ?

 

 

data have;
	informat id $1. procedure $1. date_of_procedure date9.;
	format id $1. procedure $1. date_of_procedure date9.;
	
	input ID   $         Procedure  $    Date_of_Procedure ;
	cards;
	1                X                   10Mar2016
	1                X                   10Apr2016
	2                X                   12Jan2016
	2                X                   13Jan2016
	2                X                    24Feb2016
	2                X                   10Mar2016
	2                Y                   10Mar2016
	3                Y                   11Dec2016
	3                Y                   12Dec2016
	4                X                    24Jul2016
	4                Y                    25Jul2016
	;
	run;
proc sort data=have;
by id date_of_procedure procedure;
run;
data want;
 set have;
 if id=lag(id) and procedure ne lag(procedure) and
    dif(date_of_procedure) lt 3 then do;
  output;n=_n_-1;
  set have point=n;
  output;
 end;
 drop n;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 346 views
  • 1 like
  • 4 in conversation