I am trying to generate a count of how many total visits there were in a specific time period using the start date as a reference. There are 400+ observations and 100 visits.
Study design: On the start date, participants were expected to return to the clinic at 12 months ± 3 months, 24 months ± 3 months, and 36 months ± 3 months.
Data: (example)
ID Startdate visit1 visit2 visit3 ... visit100
123 12MAR2009 13MAR2010 2AUG2010 4JUN2012 5DEC2015
456 14MAR2009 16MAR2010 4AUG2010 5JUN2012 10DEC2015
789 16MAR2009 19MAR2010 6AUG2010 6JUN2012 15DEC2015
What I need to know is if a patient has a visit within the range of:
Using the attached set, this is an example of what the reporting table would look like…
12 month visits | 29* |
24 month visits | 18 |
36 month visits | 10 |
*This means that there were 29 visits that occurred between 9-15 months from the start date.
First, transpose from wide to long format, then cumulate per ID:
proc transpose data=have out=visits (drop=_NAME_ rename=(COL1=visit));
by ID;
copy startdate;
run;
data want (keep=ID visit12 visit24 visit36);
set visits;
by ID;
retain start visit12 visit24 visit36;
if first.ID
then do
start = startdate;
visit12 = 0;
visit24 = 0;
visit36 = 0;
end;
if intnx('month',start,9) <= visit < intnx('month',start,16) then visit12 + 1;
if intnx('month',start,21) <= visit < intnx('month',start,28) then visit24 + 1;
if intnx('month',start,33) <= visit < intnx('month',start,40) then visit36 + 1;
if last.ID then output;
run;
To get total sums, just don't reset the counters with every group change, and provide only 1 output at EOF.
As with most problems, it is your data structure which makes your problem difficult. For programming purposes it is far easier to program with, and maintain if you use normalised data structures - fixed structure with data in the dataset. This makes your problem very simple:
data have; id=123; startdate="12mar2009"d; visit1="13mar2010"d; visit2="02aug2010"d; format startdate visit: date9.; run; proc transpose data=have out=inter; by id startdate; var visit:; run; data want; set inter; length month9_15 month21_27 $1; retain month9_15 month21_27; by id; if intnx("month",startdate,9) <= col1 <= intnx("month",startdate,15) then month9_15="Y"; if intnx("month",startdate,21) <= col1 <= intnx("month",startdate,27) then month21_27="Y"; if last.id then output; run;
For example gives you a dataset with one row per subject and a flag for each visit, its then a simple proc means (or you could total in that datastep).
First, transpose from wide to long format, then cumulate per ID:
proc transpose data=have out=visits (drop=_NAME_ rename=(COL1=visit));
by ID;
copy startdate;
run;
data want (keep=ID visit12 visit24 visit36);
set visits;
by ID;
retain start visit12 visit24 visit36;
if first.ID
then do
start = startdate;
visit12 = 0;
visit24 = 0;
visit36 = 0;
end;
if intnx('month',start,9) <= visit < intnx('month',start,16) then visit12 + 1;
if intnx('month',start,21) <= visit < intnx('month',start,28) then visit24 + 1;
if intnx('month',start,33) <= visit < intnx('month',start,40) then visit36 + 1;
if last.ID then output;
run;
To get total sums, just don't reset the counters with every group change, and provide only 1 output at EOF.
@JKS1 wrote:
Hi - When I attempted to transpose the set, the date value was lost? The start date also only shows up on the first ID and should show up for each visit.
You probably meant that the FORMAT was lost. That's not a big issue, you can assign the format later when you need to display values.
Since you are looking for the sums of visits, anyway, you won't need it at all in intermediate datasets.
And the second issue is resolved by keeping the startdate value in RETAINed variable, as I did in my data step.
Believe it or not, the programming will be easy. The hard part is defining the solution. For example, with a start date of 12MAR2009, what time period should the 9-15 month interval include? Do you want to count days rather than months to define the interval? (Consider what should happen if the start date were 30MAY2009 and there is no such thing as 30FEB2010.) If a person has multiple visits within that interval, should all the person's visits be counted or just 1?
Once those things are defined, running through an array in a DATA step will be easy.
Perfect. You can do the same in SAS. It will make a difference, though, whether your DATE variables are actually numeric SAS dates, or whether they are character strings. If they are numeric SAS dates (the preferred method for many reasons):
data want;
set have;
array visits {100} visit1-visit100;
_12_mo_visits=0;
_24_mo_visits=0;
_36_mo_visits=0;
do _n_=1 to 100;
if startdate + 1005 <= visits{_n_} <= startdate + 1185 then _36_mo_visits + 1;
* Add 2 more statements here, for the 12-month and 24-month counts;
end;
run;
That gives you 3 new variables in the data set, with counts per observation: how many visits in each of the ranges. You can add them up easily:
proc means data=want sum;
var _12_mo_visits _24_mo_visits _36_mo_visits;
run;
Or you can count patients with each level of # of visits instead:
proc freq data=want;
tables _12_mo_visits _24_mo_visits _36_mo_visits;
run;
Good luck.
*** EDITED: If your date variables are actually character strings rather than numeric SAS dates, only one piece of the program needs to change:
if startdate + 1005 <= input(visits{_n_}, date9.) <= startdate + 1185 then _36_mo_visits + 1;
The same would apply to STARTDATE, depending on whether that is a numeric SAS date vs. a character string.
I think no need proc transpose. Change Kurt's code : data want (keep=ID visit12 visit24 visit36); set visits; array x{*} visit: ; _visit12 = 0; _visit24 = 0; _visit36 = 0; do i=1 to dim(x); if intnx('month',start,9) <= x{i}< intnx('month',start,16) then _visit12 + 1; if intnx('month',start,21) <= x{i} < intnx('month',start,28) then _visit24 + 1; if intnx('month',start,33) <= x{i} < intnx('month',start,40) then _visit36 + 1; end; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.