I am wondering how to code to allocate groups based on my data described below.
I have administrative claims data and am interested in comparing the incidence of an outcome between 2 groups: adults with CP (CP=1) and adults without CP (CP=0; controls). My data structure is in the long format and grouped by patid, date of encounter, many other variables, and an indicator for the CP group variable. Each row below represents a single claim. I already identified a CP diagnosis or not for each claim (1=claim for CP; 0=no claim for CP). The algorithm to identify individuals with CP is at least 2 claims for CP not on the same day. Since a CP diagnosis does not necessarily get flagged on the first date of encounter, I am unable to use the FIRST and LAST function. How do I code to get a single group variable with 3 levels: (1) 2+ claims for CP on different days; (2) no claim for CP (controls); and (3) only 1 claim for CP (will be used as a sensitivity analysis)?
Here is an example of what the data looks like. Patient 1 (patid) has CP; patient 2 does not have CP; and patient 3 has only 1 claim for CP.
patid fst_dt CP
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
Thank you!!
Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:
data work.have;
length patid fst_dt cp 8;
informat fst_dt mmddyy10.;
format fst_dt date9.;
input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;
data work.want;
set work.have;
by patid notsorted;
length group 8 lastCPDate 8 cpCounter 8;
retain lastCPDate cpCounter;
format lastCPDate date9.;
if first.patid then do;
cpCounter = 0;
end;
if CP then do;
cpCounter = cpCounter + (lastCPDate ^= fst_dt);
lastCPDate = fst_dt;
end;
if last.patid then do;
select;
when (cpCounter = 1) group = 3;
when (cpCounter > 1) group = 1;
otherwise group = 2;
end;
output;
end;
keep patid group;
run;
Is the variable fst_dt a date, dateime or string?
fst_dt is a date.
Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:
data work.have;
length patid fst_dt cp 8;
informat fst_dt mmddyy10.;
format fst_dt date9.;
input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;
data work.want;
set work.have;
by patid notsorted;
length group 8 lastCPDate 8 cpCounter 8;
retain lastCPDate cpCounter;
format lastCPDate date9.;
if first.patid then do;
cpCounter = 0;
end;
if CP then do;
cpCounter = cpCounter + (lastCPDate ^= fst_dt);
lastCPDate = fst_dt;
end;
if last.patid then do;
select;
when (cpCounter = 1) group = 3;
when (cpCounter > 1) group = 1;
otherwise group = 2;
end;
output;
end;
keep patid group;
run;
This works great! However, this results in a single line for each patid. This is fine, because I then merged this dataset with my other dataset in the long format, and the group variable populated correctly.
This is awesome! Thanks so much!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.