BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jsimmo02
Calcite | Level 5

Hey there,

 

So I have a large patient dataset where patients often have multiple entries under the same patient ID. Each entry has a column for patient ID, year of admission, whether they are a smoker (0 = no, 1 = yes), and whether they were admitted to deliver a child (0=no, 1 =yes) so the data looks like this

 

PatientYear admittedSmoking FlagDelivery Flag
Patient x200200
Patient x200310
Patient x200401
Patient x200601
Patient x200811
Patient y200401
Patient z200110
Patient z200101

 

How can I return a list of patients who smoked the year of or the year before delivering a child (as well as the year that it occured).

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input Patient& $20.	Yearadmitted	SmokingFlag	DeliveryFlag ;
cards;
Patient x	2002	0	0
Patient x	2003	1	0
Patient x	2004	0	1
Patient x	2006	0	1
Patient x	2008	1	1
Patient y	2004	0	1
Patient z	2001	1	0
Patient z	2001	0	1
;
data want;
set have;
by patient;
retain t;
if first.patient then call missing(t);
if 	DeliveryFlag and t then if Yearadmitted-t in (0,1) then output;
if 	SmokingFlag=1 then t=Yearadmitted;
run;

I am not sure why you didn't include 2008 in your expected output though

View solution in original post

9 REPLIES 9
Reeza
Super User

Are these actually years or full dates? 

 

Assuming your data is sorted correctly:

 

data want; 
set have;

by patient year_admitted;

prev_smoke = lag(smoke);
prev_year = lag(year);

if first.patient then call missing(prev_smoke, prev_year);
smoke_prev_year = .;

if delivery_flag=1  /*if delivered*/
    and year-prev_year=1  /*if previous record is the previous year*/
     and (smoking_flag=1 or prev_smoke=1) /*smoked in current or previous year*/ 
           then smoke_prev_year = 1;
else if delivery_flag=1 then smoke_prev_year=0; /*only set flag if delivered*/


run;

@jsimmo02 wrote:

Hey there,

 

So I have a large patient dataset where patients often have multiple entries under the same patient ID. Each entry has a column for patient ID, year of admission, whether they are a smoker (0 = no, 1 = yes), and whether they were admitted to deliver a child (0=no, 1 =yes) so the data looks like this

 

Patient Year admitted Smoking Flag Delivery Flag
Patient x 2002 0 0
Patient x 2003 1 0
Patient x 2004 0 1
Patient x 2006 0 1
Patient x 2008 1 1
Patient y 2004 0 1
Patient z 2001 1 0
Patient z 2001 0 1

 

How can I return a list of patients who smoked the year of or the year before delivering a child (as well as the year that it occured).


 

Astounding
PROC Star

A couple of questions ...

 

First, there are some years that don't appear.  For example, Patient X has no data for 2005.  Can we assume that smoker=no for those years?  (If not, how do we determine smoking status?)

 

Clearly, you want to flag this combination:  delivery=yes, smoker=yes, and smoker=yes for the previous year.

 

Did you also want to flag:  delivery=yes, smoker=no, but smoker=yes for the previous year?

 

Did you also want to flag:  delivery=yes, smoker=yes, but smoker=no for the previous year?

jsimmo02
Calcite | Level 5

First, there are some years that don't appear.  For example, Patient X has no data for 2005.  Can we assume that smoker=no for those years?  (If not, how do we determine smoking status?) Yes, you can assume smoker = no for those years.

 

"Clearly, you want to flag this combination:  delivery=yes, smoker=yes, and smoker=yes for the previous year."

 

I am hoping for delivery = yes and (smoker=yes OR smoker for previous year = yes)

 

Did you also want to flag:  delivery=yes, smoker=no, but smoker=yes for the previous year? YES

 

Did you also want to flag:  delivery=yes, smoker=yes, but smoker=no for the previous year? YES

novinosrin
Tourmaline | Level 20

@jsimmo02  Can you please post a sample of your expect output for your input sample? Thank you

jsimmo02
Calcite | Level 5

The hopeful output would be

PatientAllconditionsmetYear
Patient xY2004
Patient zY2001

 

but if I can just create a flag for patients who meet all the criteria, I would be happy. Having the year return would be a bonus

novinosrin
Tourmaline | Level 20
data have;
input Patient& $20.	Yearadmitted	SmokingFlag	DeliveryFlag ;
cards;
Patient x	2002	0	0
Patient x	2003	1	0
Patient x	2004	0	1
Patient x	2006	0	1
Patient x	2008	1	1
Patient y	2004	0	1
Patient z	2001	1	0
Patient z	2001	0	1
;
data want;
set have;
by patient;
retain t;
if first.patient then call missing(t);
if 	DeliveryFlag and t then if Yearadmitted-t in (0,1) then output;
if 	SmokingFlag=1 then t=Yearadmitted;
run;

I am not sure why you didn't include 2008 in your expected output though

Patrick
Opal | Level 21

@jsimmo02

Posting below even though things have moved on while I've been working out the code. 

It's just another coding option for what you've got already.

data have;
  infile datalines dlm='09'x truncover;
  input Patient $	Year_admitted	Smoking_Flag	Delivery_Flag;
datalines;
x	2002	0	0
x	2003	1	0
x	2004	0	1
x	2006	0	1
x	2008	1	1
y	2004	0	1
z	2001	1	0
z	2001	0	1
;
run;

data want;
  set have;
  by Patient notsorted Year_admitted;

  _r_sflg=lag(Smoking_Flag);

  if first.patient then return;
  
  /* also include delivery year for smoking flag */
  AllConditionsMet=  ( Delivery_Flag=1 and (Smoking_Flag=1 or _r_sflg=1) );

  if AllConditionsMet=1 then output;
run;

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 4445 views
  • 0 likes
  • 5 in conversation