I have a long data set that contains an id (patssn), diagnosis code (diag), encounter date (encdate1), and a second date (otherDate1). There are many records containing the same id, diag, and encdate with a different otherDate1. I eventually need to count the number of otherDate1's a person had with the same diagnosis between 90 and 180 days after the encdate1.
Currently the data is structured like this:
patssn diag encdate1 otherDate1
1 3051 10/1/13 10/12/13
1 3051 10/1/13 11/14/13
1 3051 10/1/13 1/15/14
1 3051 10/1/13 2/12/14
2 33829 11/15/13 11/22/13
2 33829 11/15/13 12/1/13
....
Using the following code, I was able to transpose the data, but don't get the encdate1, and I need the encdate1 in order to determine the date from which to start counting the number of otherDate1's 90-180 days after the encdate.
proc transpose data=diag3_sort out=caper_transpose2 prefix=diag;
by patssn;
var otherDate1;
run;
The transposed data looks like this:
patssn ameofformervariable diag1 diag2 diag3 diag4 diag5....
123456789 otherDate1 10/12/13 11/14/13 1/15/14 2/12/14
234567890 otherDate1 11/22/13 12/13/13
I believe I need to transpose the data and in another step have SAS count the number of otherDiag1 at least 90 days but less than 181 days after encdate1. If I'm correct, how do I keep encdate1 (which is the same) for every patssn?
Why not just add encdate1 to the BY variable(s)?
I had tried adding encdate1 to the by variable. I got the following error message:
ERROR: Data set WORK.DIAG3_SORT is not sorted in ascending sequence. The current BY group has
encdate1 = 10/29/2013 and the next BY group has encdate1 = 10/03/2013.
I should have said the data has been sorted by patssn and otherDate1. The data set diag3_sort lists the patssn in order and within the patssn, the otherDate1 variable is in order. Encdate1 is the same for every row containing the same patssn.
I'm attaching an Excel file with the data for 20 individuals found in my SAS file. (My SAS file uses actual ssns, and therefore, I can't attach it.) The variable nbDiag is the number of encounters the individual has with the diagnosis code (diag) noted. Encdate1 is the first encounter with the specific diag; there are nbDiag - 1 rows for that patssn and diag. However, that same patssn can be listed with more than one diag (see individual #16).
When I transpose the data, I have 76 variables: patssn, nameofformer variable, and diag1-diag74; there's one row for every patssn/diag combination. I also need encdate1 associated with the specific individual and diag.
Thanks!
If all you need to do is count number of diags you don't need to transpose. See if following works. Data will need to be sorted.
data x;
input patssn diag encdate1 mmddyy8. otherDate1 mmddyy8.;
format encdate1 otherDate1 mmddyy8.;
datalines;
1 3051 10/1/13 10/12/13
1 3051 10/1/13 11/14/13
1 3051 10/1/13 1/15/14
1 3051 10/1/13 2/12/14
2 33829 11/15/13 11/22/13
2 33829 11/15/13 12/1/13
;
run;
data x1;
set x;
by patssn diag encdate1 otherDate1;
retain count;
if first.encdate1 then count=0;
if 90<=otherDate1-encdate1+1<=180 then count=count+1;
if last.encdate1;
keep patssn diag encdate1 count;
run;
You didn't post the output you want yet .
data x;
input patssn diag encdate1 mmddyy8. otherDate1 mmddyy8.;
format encdate1 otherDate1 mmddyy8.;
datalines;
1 3051 10/1/13 10/12/13
1 3051 10/1/13 11/14/13
1 3051 10/1/13 1/15/14
1 3051 10/1/13 2/12/14
2 33829 11/15/13 11/22/13
2 33829 11/15/13 12/1/13
;
run;
proc sql;
create table want as
select a.*,
(select count(*) from x
where a.patssn=patssn and a.diag=diag and a.encdate1=encdate1
and otherDate1 between a.encdate1+90 and a.encdate1+180) as count
from x as a ;
quit;
data have;
informat encdate1 otherdate1 mmddyy10.;
format encdate1 otherdate1 mmddyy10.;
infile cards;
input patssn diag encdate1 otherDate1;
cards;
1 3051 10/1/13 10/12/13
1 3051 10/1/13 11/14/13
1 3051 10/1/13 1/15/14
1 3051 10/1/13 2/12/14
2 33829 11/15/13 11/22/13
2 33829 11/15/13 12/1/13
;
data want;
set have;
by patssn diag encdate1;
if 90 <= (otherdate1 - encdate1) <= 120 then count + 1;
if first.encdate1 then count = 1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.