Hello all,
I have dataset for a patient population with a start and an end date. I have also have the date for their visits to the hospital. I want to create a new variable which counts their visits to the hospital including the months with zero visits.
The dataset looks like this:
ID | Startdate | Enddate | Visitdate |
1 | 15JAN2016 | 09AUG2016 | 10MAR2016 |
1 | 15JAN2016 | 09AUG2016 | 13MAR2016 |
1 | 15JAN2016 | 09AUG2016 | 09JUN2016 |
2 | 08MAR2016 | 12JUL2016 | 15MAR2016 |
2 | 08MAR2016 | 12JUL2016 | 25MAR2016 |
2 | 08MAR2016 | 12JUL2016 | 6JUNE2016 |
2 | 08MAR2016 | 12JUL2016 | 3JUL2016 |
What I want is:
ID | Startdate | Enddate | Visitdate | Newdate | Count |
1 | 15JAN2016 | 09AUG2016 | 10MAR2016 | JAN2016 | 0 |
1 | 15JAN2016 | 09AUG2016 | 13MAR2016 | FEB2016 | 0 |
1 | 15JAN2016 | 09AUG2016 | 09JUN2016 | MAR2016 | 2 |
1 | 15JAN2016 | 09AUG2016 | APR2016 | 0 | |
1 | 15JAN2016 | 09AUG2016 | MAY2016 | 0 | |
1 | 15JAN2016 | 09AUG2016 | JUN2016 | 1 | |
1 | 15JAN2016 | 09AUG2016 | JUL2016 | 0 | |
1 | 15JAN2016 | 09AUG2016 | AUG2016 | 0 | |
2 | 08MAR2016 | 12JUL2016 | 15MAR2016 | MAR2016 | 2 |
2 | 08MAR2016 | 12JUL2016 | 25MAR2016 | APR2016 | 0 |
2 | 08MAR2016 | 12JUL2016 | 6JUNE2016 | MAY2016 | 0 |
2 | 08MAR2016 | 12JUL2016 | 3JUL2016 | JUN2016 | 1 |
2 | 08MAR2016 | 12JUL2016 | JUL2016 | 1 |
I am looking to create these two variables in the dataset so I can get the counts for the visits by each month between the first and last date. Please help me find a solution for this.
Thanks!
what if the number of visits is greater than the number of months between Startdate and Enddate?
The 'WANT' dataset does not make much sense on it's own. You should not mix aggregate data (those counts) with detail data in the same row.
What if an ID had 30 detail records for visits over 3 months ? You would a lot more detail (30) than aggregates (3).
A more reasonable 'want' would look like
id month count -- ----- -----
Perhaps reported in a layout such as
id startmonth visits_count_in_month_1 ...to... visits_count_in_month_<n>
-- --------- --- --- --- --- --- --- --- --- --- ---
1 15JAN2016 0 0 2 0 0 1 0 0
where <n> is the number of months start to end.
Example:
data have;
input ID Startdate Enddate Visitdate;
format startdate enddate visitdate date9.;
informat startdate enddate visitdate date9.;
datalines;
1 15JAN2016 09AUG2016 10MAR2016
1 15JAN2016 09AUG2016 13MAR2016
1 15JAN2016 09AUG2016 09JUN2016
2 08MAR2016 12JUL2016 15MAR2016
2 08MAR2016 12JUL2016 25MAR2016
2 08MAR2016 12JUL2016 6JUN2016
2 08MAR2016 12JUL2016 3JUL2016
;
data want_way1;
* array using direct index element for frequency counting;
* index is the numeric sas date value for a month;
array months(0:%sysfunc(today())) _temporary_;
do until (last.id);
set have;
by id;
months(intnx('month', visitdate, 0)) + 1;
end;
startmonth =intnx('month', startdate, 0);
do index = 0 to intck('month', startdate, enddate);
month = intnx('month', startdate, index);
count = coalesce(months(month),0);
output;
end;
call missing (of months(*));
keep id index month count startmonth;
format month startmonth nldateymm.;
run;
ods html file='wide.html' style=plateau;
proc report data=want_way1;
columns id startmonth count,index;
define id / group;
define startmonth / min;
define index / ' ' across;
define count / 'Freq in month ';
run;
ods html close;
Produces
data
and report output
A second way to get the counts is to create a months data set covering each id. Aggregate a join of the dates coverage with the details data grouping by id and month.
Example:
data months_per_id; set have; by id; if first.id; do index = 0 to intck('month', startdate, enddate); month = intnx('month', startdate, index); output; end; keep id index month; format month nldateymm.; run; proc sql; create table want_way2 as select M.id, index, month, sum(intck('month', month, visitdate)=0) as count from have join months_per_id as M on have.id = M.id group by M.id, M.index, M.month ;
You don't need that awkward data structure to get your counts. Try this way:
/* Create the required months */
data months;
do until(last.id);
set have; by id;
firstMonth = min(intnx("month", startdate, 0), firstmonth);
lastMonth = max(intnx("month", enddate, 0), lastmonth);
end;
monthDate = firstMonth;
do until(monthDate > lastMonth);
output;
monthDate = intnx("month", monthDate, 1);
end;
format monthDate monyy7.;
keep id monthDate;
run;
/* Count the visits per month */
proc sql;
create table monthlyVisits as
select
a.id,
a.monthDate,
count(visitDate) as nbVisits
from
months as a left join
have as b on a.id=b.id and intnx("month", b.visitDate, 0) = a.monthDate
group by a.id, a.monthDate;
quit;
First, @raajdesaii , I will leave it to you to add the Visitdate column as you would like it. Aside from that, this class of problem is interesting to me because I face it in my work too. I think this is the way I handle this kind of problem. It needs these five steps, I say, if for anything, to be able understand what we had programmed when we look back six months later.
data IDs;
set have (Keep=ID Startdate Enddate);
by ID;
if first.ID;
run;
data AllNewDates (sortedby=ID Newdate);
set IDs;
Newdate=intnx('month',Startdate,0);
do while (Newdate<Enddate);
Count=0;
output;
Newdate=intnx('month',Newdate,1);
end;
run;
data Newdates0 (Keep=ID Newdate);
set have (Keep=ID Visitdate);
Newdate=intnx('month',Visitdate,0);
format Newdate date9.;
run;
proc freq data=Newdates0;
by ID;
table Newdate / out=Newdates(drop=PERCENT) noprint;
format Newdate date9.;
run;
data want;
update AllNewDates Newdates;
by ID Newdate;
format Newdate monyy7.;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.