BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chuakp
Obsidian | Level 7

Hello, I am analyzing a health care utilization survey, and I have a clinic visit dataset in which each observation represents one clinic visit.  For each observation, there is a data element for the person's unique identifier (PID) and the month of the visit (MONTH - takes on values of 1-12).  As a simplified example, say I had five people (PID 1-5).  The dataset would look like this (note that PID 3 had no visits that year and therefore has no observations in the clinic visit dataset):

PID            MONTH

1               3

2               5

4               6

4               1

5               12

I am trying to create a person-level dataset in which each observation represents one person.  The dataset would have information on the number of clinic visits in each month, e.g.:

PID   VIS_JAN   VIS_FEB  VIS_MAR  VIS_APR  VIS_MAY  VIS_JUN   VIS_JUL  VIS_AUG  VIS_SEP  VIS_OCT   VIS_NOV  VIS_DEC

1          0                 0               1              0             0               0             0              0              0               0               0             0

2          0                 0               0              0             1               0             0              0              0               0               0             0

3          0                 0               0              0             0               0             0              0              0               0               0             0

4          1                 0               0              0             0               1             0              0              0               0               0             0

5          0                 0               0              0             0               0             0              0              0               0               0             1

Any suggestions?  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You have to use your new dataset and tell summary that PID is now a CLASS variable.

proc summary data=have nway classdata=CLASSDATA;

  class pid month ;

  output out=counts ;

run;

View solution in original post

12 REPLIES 12
esjackso
Quartz | Level 8

This is quick and dirty so some optimization could probably be done but hopefully this gets you going:

A couple of notes:

1) The card statements are just getting test data in. You need a table with all your ids present and all the months you want to use present so the sql can make the cartesian product between the ids and months.

2) Some of the sql steps can be combined, but I left them separate so it was easier to see what was happening.

infile cards dlm=',';
input pid month;
cards;

1, 3

2, 5

4, 6

4, 1

5, 12

1, 3

8, 5

4, 6

4, 2

5, 12

;

run;

data all;

infile cards ;
input pid;
cards;

1

2

4

3

4

5

5

6

7

8

8

9

2

10

;

run;

data mnthin;

infile cards;
input month;
cards;

1

2

3

4

4

4

5

5

6

7

8

8

9

10

11

12

;

run;

proc sql;

create table cnts as
select pid, month, count(*) as cnt
from in
group by pid, month
;
create table ids as
select distinct pid
from all
;
create table mnth as
select distinct month
from mnthin
;
create table cartesian as
select a.pid, b.month
from ids as a , mnth as b
;
create table out1 as
select a.pid, a.month, coalesce(b.cnt,0) as cnt
from cartesian as a left join cnts as b on a.pid =b.pid and a.month = b.month
order by pid, month
;

quit;

Hope this helps!

EJ

data out2;

set out1;
by pid month;

if first.pid then do;
vis_jan = 0; vis_feb = 0; vis_mar = 0; vis_apr = 0;
vis_may = 0; vis_jun = 0; vis_jul = 0; vis_aug = 0;
vis_sep = 0; vis_oct = 0; vis_nov = 0; vis_dec = 0;
end;

select (month);
when (1) vis_jan = cnt;
when (2) vis_feb = cnt;
when (3) vis_mar = cnt;
when (4) vis_apr = cnt;
when (5) vis_may = cnt;
when (6) vis_jun = cnt;
when (7) vis_jul = cnt;
when (8) vis_aug = cnt;
when (9) vis_sep = cnt;
when (10) vis_oct = cnt;
when (11) vis_nov = cnt;
when (12) vis_dec = cnt;
end;

retain vis: ;
drop cnt month;
if last.pid then output;

run;

chuakp
Obsidian | Level 7

Thanks EJ.  I have never used SQL before and am having trouble deciphering the code. I should probably take some time to learn SQL in the near future. 

I'm wondering, though, if there is a non-SQL way to do this?

esjackso
Quartz | Level 8

The sql is doing the counts and creating a full set of ids months to merge the counts into. If I remember correctly a data step merge will not give you a cartesian product (all the ids months combinations) in a long format instead of the wide you desire.

You can repalce the sql count with proc freq and an output statement (with some additional cleanup)

You can replace the two distincts with proc sort  nodupkey

Im sure others can provide better data step suggestions.

EJ

Tom
Super User Tom
Super User

Use PROC SUMMARY and then TRANSPOSE.  You can create a CLASSDATA table to insure that all months are represented.

data months;

do month=1 to 12; output; end;

run;

proc summary data=have nway classdata=months;

  by pid;

  class month ;

  output out=counts ;

run;

proc transpose data=counts out=want prefix=vis_ ;

  by pid;

  id month;

  var _freq_;

run;

Ksharp
Super User
data have;
input PID            MONTH;
cards;
1               3
2               5
3               6
4               1
5               12
;
run;
data want(keep=pid vis:) ;
 set have;
 array a{12} VIS_JAN   VIS_FEB  VIS_MAR  VIS_APR  VIS_MAY  VIS_JUN   VIS_JUL  VIS_AUG  VIS_SEP  VIS_OCT   VIS_NOV  VIS_DEC ;
 do i=1 to dim(a);
  if i=month then a{i}=1;
   else a{i}=0;
 end;
run;

Ksharp

esjackso
Quartz | Level 8

and

Correct me if Im wrong but I dont think either of your methods account for PIDs that are not in the source dataset. So if a full set is desired some would be missing.

Also believe Ksharp code just flags if a visit happens in a month and not the number of visits in that month, (again forgive if I missed something).

EJ

Tom
Super User Tom
Super User

It will include all PID values in the input dataset, but If there is a source of PID values in another dataset then you can use that to generate a CLASSDATA dataset for PROC SUMMARY.  Just convert the proc summary code to treat PID as a CLASS variable instead of a BY variable.

Here is one way.

data classdata ;

  set PIDLIST (keep=pid );

  by pid;

  if first.pid;

  do month=1 to 12;

    output;

  end;

run;

chuakp
Obsidian | Level 7

Thanks Tom.  I still am not getting any entries for PID 3, who is not in the original clinical visit dataset.  This is the code I am using:

data have;

input PID            MONTH;

cards;

1               3

2               5

4               6

4               1

5               12

;

run;

data PIDLIST;

do PID = 1 to 5; output; end;

run;

data classdata;

  set PIDLIST (keep = PID);

  by PID;

  if first.PID;

  do month=1 to 12;

    output;

  end;

run;

proc summary data=have nway classdata=months;

  by pid;

  class month ;

  output out=counts ;

run;

proc transpose data=counts out=want prefix=vis_ ;

  by pid;

  id month;

  var _freq_;

run;

Tom
Super User Tom
Super User

You have to use your new dataset and tell summary that PID is now a CLASS variable.

proc summary data=have nway classdata=CLASSDATA;

  class pid month ;

  output out=counts ;

run;

chuakp
Obsidian | Level 7

Thanks, Tom.  I appreciate your help.

chuakp
Obsidian | Level 7

Thanks to Tom and KSharp for their input.  EJ is correct; neither of the solutions account for PIDs that are not in the source dataset. 

Ksharp
Super User

It looks like I don't understand OP totally.

data have;
input PID            MONTH;
cards;
1               3
2               5
4               6
4               1
5               12
;
run;
 

 

data PIDLIST;
do PID = 1 to 5; output; end;
run;

data have;
 merge have pidlist;
 by pid;
run;

data temp(keep=pid vis:) ;
 set have;
 array a{12} VIS_JAN   VIS_FEB  VIS_MAR  VIS_APR  VIS_MAY  VIS_JUN   VIS_JUL  VIS_AUG  VIS_SEP  VIS_OCT   VIS_NOV  VIS_DEC ;
 do i=1 to dim(a);
  if i=month then a{i}=1;
   else a{i}=0;
 end;
run;
proc summary data=temp;
 by pid;
 var vis:;
 output out=want(drop=_:) sum= ;
run;

Ksharp

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2581 views
  • 3 likes
  • 4 in conversation