Hello,
I am trying to get some frequency numbers for two patient populations:
Pop 1 : Patients who have an initial behavior code of 0 followed by behavior code 1 based on year of diagnosis (yod).
Pop 2: Patients who have an initial behavior code of 0 followed by behavior code 1 based on diagnosis (yod).
Some patients have up to 6 diagnosis but I am only interested in the first 2 as explained above based on the yod.
How would you code this given I have multiple diagnosis (behavior_code) and different yod to obtain the n’s for each population?
Here is an example of some of the variables I am working with:
patient_id | age | yod | behavior_code | det_sequence_no |
1 | 55-59 years | 2011 | 0 | 1 |
1 | 60-64 years | 2015 | 1 | 2 |
2 | 55-59 years | 2001 | 0 | 1 |
2 | 55-59 years | 2002 | 1 | 2 |
3 | 70-74 years | 2002 | 0 | 1 |
3 | 75-79 years | 2007 | 1 | 2 |
4 | 45-49 years | 2003 | 0 | 1 |
4 | 50-54 years | 2008 | 0 | 2 |
4 | 50-54 years | 2009 | 1 | 3 |
4 | 60-64 years | 2018 | 1 | 4 |
Thank you so much in advance!
After making your data usable (always, as in ALWAYS, post your example data as data step code which creates the data), I get this:
data have;
infile datalines dlm="09"x dsd truncover;
input patient_id $ age :$15. yod behavior_code det_sequence_no;
datalines;
1 55-59 years 2011 0 1
1 60-64 years 2015 1 2
2 55-59 years 2001 0 1
2 55-59 years 2002 1 2
3 70-74 years 2002 0 1
3 75-79 years 2007 1 2
4 45-49 years 2003 0 1
4 50-54 years 2008 0 2
4 50-54 years 2009 1 3
4 60-64 years 2018 1 4
;
data want;
set have;
by patient_id;
label flag = "Behavior of 2nd Diagnosis";
if first.patient_id
then count = 1;
else count + 1;
beh = lag(behavior_code);
if count = 2;
if not beh
then if behavior_code
then flag = 1;
else flag = 0;
keep patient_id flag;
run;
/* The following step, taken from https://support.sas.com/kb/23/350.html,
is only there to get the variable label for the variable column */
proc template;
edit Base.Freq.OneWayList ;
edit h1;
/* set text to BLANK instead of VARLABEL */
text " ";
end;
edit FVariable;
just=varjust;
style=rowheader;
id;
generic;
header=varlabel;
end;
end;
run;
proc freq data=want;
tables flag / nocum;
run;
which gets this result:
All your patients have a 0 followed by a 1, so they all fall into the same group.
So you want to count patients with a 0/1 combination in the first two observations vs. those with a 0/0, and disregard all further observations for each patient?
data want;
set have;
by patient_id;
if first.patient_id
then count = 1;
else count + 1;
beh = lag(behavior_code);
if count = 2;
if not beh
then if behavior_code
then flag = 1;
else flag = 0;
keep patient_id flag;
run;
Should a patient start with 1, flag will be missing.
Thank you so much, Kurt and happy new year. I tried your suggested code but it gave me the below.
.
I think my problem is getting the frequency number for the second diagnosis when some patients have 3 or more diagnosis. I am simply interested in the initial 0 and the second using the diagnosis year.. As I mentioned the first diagnosis for all patients is a 0 and the second diagnosis will either will be a 0 or 1.
How can I generate the following table? For example this is the ideal output I am looking for with made-up numbers:
Behavior of 2nd diagnosis | Frequency | Percent |
|
|
|
0 | 211 | 44.99 |
1 | 258 | 55.01 |
Thank you again for your help. I hope this clarifies further..
After making your data usable (always, as in ALWAYS, post your example data as data step code which creates the data), I get this:
data have;
infile datalines dlm="09"x dsd truncover;
input patient_id $ age :$15. yod behavior_code det_sequence_no;
datalines;
1 55-59 years 2011 0 1
1 60-64 years 2015 1 2
2 55-59 years 2001 0 1
2 55-59 years 2002 1 2
3 70-74 years 2002 0 1
3 75-79 years 2007 1 2
4 45-49 years 2003 0 1
4 50-54 years 2008 0 2
4 50-54 years 2009 1 3
4 60-64 years 2018 1 4
;
data want;
set have;
by patient_id;
label flag = "Behavior of 2nd Diagnosis";
if first.patient_id
then count = 1;
else count + 1;
beh = lag(behavior_code);
if count = 2;
if not beh
then if behavior_code
then flag = 1;
else flag = 0;
keep patient_id flag;
run;
/* The following step, taken from https://support.sas.com/kb/23/350.html,
is only there to get the variable label for the variable column */
proc template;
edit Base.Freq.OneWayList ;
edit h1;
/* set text to BLANK instead of VARLABEL */
text " ";
end;
edit FVariable;
just=varjust;
style=rowheader;
id;
generic;
header=varlabel;
end;
end;
run;
proc freq data=want;
tables flag / nocum;
run;
which gets this result:
I don't understand the difference between Pop1 and Pop2. If the code posted by @Kurt_Bremser does not solve the issue, you need to post example data containing all possible combinations and show the expected result for that dataset.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.