I am wondering how to code to allocate groups based on my data described below.
I have administrative claims data and am interested in comparing the incidence of an outcome between 2 groups: adults with CP (CP=1) and adults without CP (CP=0; controls). My data structure is in the long format and grouped by patid, date of encounter, many other variables, and an indicator for the CP group variable. Each row below represents a single claim. I already identified a CP diagnosis or not for each claim (1=claim for CP; 0=no claim for CP). The algorithm to identify individuals with CP is at least 2 claims for CP not on the same day. Since a CP diagnosis does not necessarily get flagged on the first date of encounter, I am unable to use the FIRST and LAST function. How do I code to get a single group variable with 3 levels: (1) 2+ claims for CP on different days; (2) no claim for CP (controls); and (3) only 1 claim for CP (will be used as a sensitivity analysis)?
Here is an example of what the data looks like. Patient 1 (patid) has CP; patient 2 does not have CP; and patient 3 has only 1 claim for CP.
patid fst_dt CP
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
Thank you!!
Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:
data work.have;
length patid fst_dt cp 8;
informat fst_dt mmddyy10.;
format fst_dt date9.;
input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;
data work.want;
set work.have;
by patid notsorted;
length group 8 lastCPDate 8 cpCounter 8;
retain lastCPDate cpCounter;
format lastCPDate date9.;
if first.patid then do;
cpCounter = 0;
end;
if CP then do;
cpCounter = cpCounter + (lastCPDate ^= fst_dt);
lastCPDate = fst_dt;
end;
if last.patid then do;
select;
when (cpCounter = 1) group = 3;
when (cpCounter > 1) group = 1;
otherwise group = 2;
end;
output;
end;
keep patid group;
run;
Is the variable fst_dt a date, dateime or string?
fst_dt is a date.
Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:
data work.have;
length patid fst_dt cp 8;
informat fst_dt mmddyy10.;
format fst_dt date9.;
input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;
data work.want;
set work.have;
by patid notsorted;
length group 8 lastCPDate 8 cpCounter 8;
retain lastCPDate cpCounter;
format lastCPDate date9.;
if first.patid then do;
cpCounter = 0;
end;
if CP then do;
cpCounter = cpCounter + (lastCPDate ^= fst_dt);
lastCPDate = fst_dt;
end;
if last.patid then do;
select;
when (cpCounter = 1) group = 3;
when (cpCounter > 1) group = 1;
otherwise group = 2;
end;
output;
end;
keep patid group;
run;
This works great! However, this results in a single line for each patid. This is fine, because I then merged this dataset with my other dataset in the long format, and the group variable populated correctly.
This is awesome! Thanks so much!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.