Hello, I am analyzing a health care utilization survey, and I have a clinic visit dataset in which each observation represents one clinic visit. For each observation, there is a data element for the person's unique identifier (PID) and the month of the visit (MONTH - takes on values of 1-12). As a simplified example, say I had five people (PID 1-5). The dataset would look like this (note that PID 3 had no visits that year and therefore has no observations in the clinic visit dataset):
PID MONTH
1 3
2 5
4 6
4 1
5 12
I am trying to create a person-level dataset in which each observation represents one person. The dataset would have information on the number of clinic visits in each month, e.g.:
PID VIS_JAN VIS_FEB VIS_MAR VIS_APR VIS_MAY VIS_JUN VIS_JUL VIS_AUG VIS_SEP VIS_OCT VIS_NOV VIS_DEC
1 0 0 1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0
4 1 0 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 1
Any suggestions? Thanks!
You have to use your new dataset and tell summary that PID is now a CLASS variable.
proc summary data=have nway classdata=CLASSDATA;
class pid month ;
output out=counts ;
run;
This is quick and dirty so some optimization could probably be done but hopefully this gets you going:
A couple of notes:
1) The card statements are just getting test data in. You need a table with all your ids present and all the months you want to use present so the sql can make the cartesian product between the ids and months.
2) Some of the sql steps can be combined, but I left them separate so it was easier to see what was happening.
infile cards dlm=','; | |
input pid month; | |
cards; |
1, 3
2, 5
4, 6
4, 1
5, 12
1, 3
8, 5
4, 6
4, 2
5, 12
;
run;
data all;
infile cards ; | |
input pid; | |
cards; |
1
2
4
3
4
5
5
6
7
8
8
9
2
10
;
run;
data mnthin;
infile cards; | |
input month; | |
cards; |
1
2
3
4
4
4
5
5
6
7
8
8
9
10
11
12
;
run;
proc sql;
create table cnts as | |
select pid, month, count(*) as cnt | |
from in | |
group by pid, month | |
; | |
create table ids as | |
select distinct pid | |
from all | |
; | |
create table mnth as | |
select distinct month | |
from mnthin | |
; | |
create table cartesian as | |
select a.pid, b.month | |
from ids as a , mnth as b | |
; | |
create table out1 as | |
select a.pid, a.month, coalesce(b.cnt,0) as cnt | |
from cartesian as a left join cnts as b on a.pid =b.pid and a.month = b.month | |
order by pid, month | |
; |
quit;
Hope this helps!
EJ
data out2;
set out1; | |
by pid month; |
if first.pid then do; | ||
vis_jan = 0; vis_feb = 0; vis_mar = 0; vis_apr = 0; | ||
vis_may = 0; vis_jun = 0; vis_jul = 0; vis_aug = 0; | ||
vis_sep = 0; vis_oct = 0; vis_nov = 0; vis_dec = 0; | ||
end; |
select (month); | ||
when (1) vis_jan = cnt; | ||
when (2) vis_feb = cnt; | ||
when (3) vis_mar = cnt; | ||
when (4) vis_apr = cnt; | ||
when (5) vis_may = cnt; | ||
when (6) vis_jun = cnt; | ||
when (7) vis_jul = cnt; | ||
when (8) vis_aug = cnt; | ||
when (9) vis_sep = cnt; | ||
when (10) vis_oct = cnt; | ||
when (11) vis_nov = cnt; | ||
when (12) vis_dec = cnt; | ||
end; |
retain vis: ; | |
drop cnt month; | |
if last.pid then output; |
run;
Thanks EJ. I have never used SQL before and am having trouble deciphering the code. I should probably take some time to learn SQL in the near future.
I'm wondering, though, if there is a non-SQL way to do this?
The sql is doing the counts and creating a full set of ids months to merge the counts into. If I remember correctly a data step merge will not give you a cartesian product (all the ids months combinations) in a long format instead of the wide you desire.
You can repalce the sql count with proc freq and an output statement (with some additional cleanup)
You can replace the two distincts with proc sort nodupkey
Im sure others can provide better data step suggestions.
EJ
Use PROC SUMMARY and then TRANSPOSE. You can create a CLASSDATA table to insure that all months are represented.
data months;
do month=1 to 12; output; end;
run;
proc summary data=have nway classdata=months;
by pid;
class month ;
output out=counts ;
run;
proc transpose data=counts out=want prefix=vis_ ;
by pid;
id month;
var _freq_;
run;
data have; input PID MONTH; cards; 1 3 2 5 3 6 4 1 5 12 ; run; data want(keep=pid vis:) ; set have; array a{12} VIS_JAN VIS_FEB VIS_MAR VIS_APR VIS_MAY VIS_JUN VIS_JUL VIS_AUG VIS_SEP VIS_OCT VIS_NOV VIS_DEC ; do i=1 to dim(a); if i=month then a{i}=1; else a{i}=0; end; run;
Ksharp
Correct me if Im wrong but I dont think either of your methods account for PIDs that are not in the source dataset. So if a full set is desired some would be missing.
Also believe Ksharp code just flags if a visit happens in a month and not the number of visits in that month, (again forgive if I missed something).
EJ
It will include all PID values in the input dataset, but If there is a source of PID values in another dataset then you can use that to generate a CLASSDATA dataset for PROC SUMMARY. Just convert the proc summary code to treat PID as a CLASS variable instead of a BY variable.
Here is one way.
data classdata ;
set PIDLIST (keep=pid );
by pid;
if first.pid;
do month=1 to 12;
output;
end;
run;
Thanks Tom. I still am not getting any entries for PID 3, who is not in the original clinical visit dataset. This is the code I am using:
data have;
input PID MONTH;
cards;
1 3
2 5
4 6
4 1
5 12
;
run;
data PIDLIST;
do PID = 1 to 5; output; end;
run;
data classdata;
set PIDLIST (keep = PID);
by PID;
if first.PID;
do month=1 to 12;
output;
end;
run;
proc summary data=have nway classdata=months;
by pid;
class month ;
output out=counts ;
run;
proc transpose data=counts out=want prefix=vis_ ;
by pid;
id month;
var _freq_;
run;
You have to use your new dataset and tell summary that PID is now a CLASS variable.
proc summary data=have nway classdata=CLASSDATA;
class pid month ;
output out=counts ;
run;
Thanks, Tom. I appreciate your help.
Thanks to Tom and KSharp for their input. EJ is correct; neither of the solutions account for PIDs that are not in the source dataset.
It looks like I don't understand OP totally.
data have; input PID MONTH; cards; 1 3 2 5 4 6 4 1 5 12 ; run; data PIDLIST; do PID = 1 to 5; output; end; run; data have; merge have pidlist; by pid; run; data temp(keep=pid vis:) ; set have; array a{12} VIS_JAN VIS_FEB VIS_MAR VIS_APR VIS_MAY VIS_JUN VIS_JUL VIS_AUG VIS_SEP VIS_OCT VIS_NOV VIS_DEC ; do i=1 to dim(a); if i=month then a{i}=1; else a{i}=0; end; run; proc summary data=temp; by pid; var vis:; output out=want(drop=_:) sum= ; run;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.