Hey there,
So I have a large patient dataset where patients often have multiple entries under the same patient ID. Each entry has a column for patient ID, year of admission, whether they are a smoker (0 = no, 1 = yes), and whether they were admitted to deliver a child (0=no, 1 =yes) so the data looks like this
Patient | Year admitted | Smoking Flag | Delivery Flag |
Patient x | 2002 | 0 | 0 |
Patient x | 2003 | 1 | 0 |
Patient x | 2004 | 0 | 1 |
Patient x | 2006 | 0 | 1 |
Patient x | 2008 | 1 | 1 |
Patient y | 2004 | 0 | 1 |
Patient z | 2001 | 1 | 0 |
Patient z | 2001 | 0 | 1 |
How can I return a list of patients who smoked the year of or the year before delivering a child (as well as the year that it occured).
data have;
input Patient& $20. Yearadmitted SmokingFlag DeliveryFlag ;
cards;
Patient x 2002 0 0
Patient x 2003 1 0
Patient x 2004 0 1
Patient x 2006 0 1
Patient x 2008 1 1
Patient y 2004 0 1
Patient z 2001 1 0
Patient z 2001 0 1
;
data want;
set have;
by patient;
retain t;
if first.patient then call missing(t);
if DeliveryFlag and t then if Yearadmitted-t in (0,1) then output;
if SmokingFlag=1 then t=Yearadmitted;
run;
I am not sure why you didn't include 2008 in your expected output though
Are these actually years or full dates?
Assuming your data is sorted correctly:
data want;
set have;
by patient year_admitted;
prev_smoke = lag(smoke);
prev_year = lag(year);
if first.patient then call missing(prev_smoke, prev_year);
smoke_prev_year = .;
if delivery_flag=1 /*if delivered*/
and year-prev_year=1 /*if previous record is the previous year*/
and (smoking_flag=1 or prev_smoke=1) /*smoked in current or previous year*/
then smoke_prev_year = 1;
else if delivery_flag=1 then smoke_prev_year=0; /*only set flag if delivered*/
run;
@jsimmo02 wrote:
Hey there,
So I have a large patient dataset where patients often have multiple entries under the same patient ID. Each entry has a column for patient ID, year of admission, whether they are a smoker (0 = no, 1 = yes), and whether they were admitted to deliver a child (0=no, 1 =yes) so the data looks like this
Patient Year admitted Smoking Flag Delivery Flag Patient x 2002 0 0 Patient x 2003 1 0 Patient x 2004 0 1 Patient x 2006 0 1 Patient x 2008 1 1 Patient y 2004 0 1 Patient z 2001 1 0 Patient z 2001 0 1
How can I return a list of patients who smoked the year of or the year before delivering a child (as well as the year that it occured).
A couple of questions ...
First, there are some years that don't appear. For example, Patient X has no data for 2005. Can we assume that smoker=no for those years? (If not, how do we determine smoking status?)
Clearly, you want to flag this combination: delivery=yes, smoker=yes, and smoker=yes for the previous year.
Did you also want to flag: delivery=yes, smoker=no, but smoker=yes for the previous year?
Did you also want to flag: delivery=yes, smoker=yes, but smoker=no for the previous year?
First, there are some years that don't appear. For example, Patient X has no data for 2005. Can we assume that smoker=no for those years? (If not, how do we determine smoking status?) Yes, you can assume smoker = no for those years.
"Clearly, you want to flag this combination: delivery=yes, smoker=yes, and smoker=yes for the previous year."
I am hoping for delivery = yes and (smoker=yes OR smoker for previous year = yes)
Did you also want to flag: delivery=yes, smoker=no, but smoker=yes for the previous year? YES
Did you also want to flag: delivery=yes, smoker=yes, but smoker=no for the previous year? YES
@jsimmo02 Can you please post a sample of your expect output for your input sample? Thank you
The hopeful output would be
Patient | Allconditionsmet | Year |
Patient x | Y | 2004 |
Patient z | Y | 2001 |
but if I can just create a flag for patients who meet all the criteria, I would be happy. Having the year return would be a bonus
data have;
input Patient& $20. Yearadmitted SmokingFlag DeliveryFlag ;
cards;
Patient x 2002 0 0
Patient x 2003 1 0
Patient x 2004 0 1
Patient x 2006 0 1
Patient x 2008 1 1
Patient y 2004 0 1
Patient z 2001 1 0
Patient z 2001 0 1
;
data want;
set have;
by patient;
retain t;
if first.patient then call missing(t);
if DeliveryFlag and t then if Yearadmitted-t in (0,1) then output;
if SmokingFlag=1 then t=Yearadmitted;
run;
I am not sure why you didn't include 2008 in your expected output though
i should have
Yes! YES! Nailed it
Posting below even though things have moved on while I've been working out the code.
It's just another coding option for what you've got already.
data have;
infile datalines dlm='09'x truncover;
input Patient $ Year_admitted Smoking_Flag Delivery_Flag;
datalines;
x 2002 0 0
x 2003 1 0
x 2004 0 1
x 2006 0 1
x 2008 1 1
y 2004 0 1
z 2001 1 0
z 2001 0 1
;
run;
data want;
set have;
by Patient notsorted Year_admitted;
_r_sflg=lag(Smoking_Flag);
if first.patient then return;
/* also include delivery year for smoking flag */
AllConditionsMet= ( Delivery_Flag=1 and (Smoking_Flag=1 or _r_sflg=1) );
if AllConditionsMet=1 then output;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.