Solved: Re: Converting from visit-level to person-level data

chuakp · Posted 07-05-2013 11:06 AM

Hello, I am analyzing a health care utilization survey, and I have a clinic visit dataset in which each observation represents one clinic visit. For each observation, there is a data element for the person's unique identifier (PID) and the month of the visit (MONTH - takes on values of 1-12). As a simplified example, say I had five people (PID 1-5). The dataset would look like this (note that PID 3 had no visits that year and therefore has no observations in the clinic visit dataset):

PID MONTH

1 3

2 5

4 6

4 1

5 12

I am trying to create a person-level dataset in which each observation represents one person. The dataset would have information on the number of clinic visits in each month, e.g.:

PID VIS_JAN VIS_FEB VIS_MAR VIS_APR VIS_MAY VIS_JUN VIS_JUL VIS_AUG VIS_SEP VIS_OCT VIS_NOV VIS_DEC

1 0 0 1 0 0 0 0 0 0 0 0 0

2 0 0 0 0 1 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 0 0 0 0 0

4 1 0 0 0 0 1 0 0 0 0 0 0

5 0 0 0 0 0 0 0 0 0 0 0 1

Any suggestions? Thanks!

Tom · Posted 07-05-2013 12:55 PM

You have to use your new dataset and tell summary that PID is now a CLASS variable.

proc summary data=have nway classdata=CLASSDATA;

class pid month ;

output out=counts ;

run;

View solution in original post

esjackso · Posted 07-05-2013 11:57 AM

This is quick and dirty so some optimization could probably be done but hopefully this gets you going:

A couple of notes:

1) The card statements are just getting test data in. You need a table with all your ids present and all the months you want to use present so the sql can make the cartesian product between the ids and months.

2) Some of the sql steps can be combined, but I left them separate so it was easier to see what was happening.

	infile cards dlm=',';
	input pid month;
	cards;

1, 3

2, 5

4, 6

4, 1

5, 12

1, 3

8, 5

4, 6

4, 2

5, 12

;

run;

data all;

	infile cards ;
	input pid;
	cards;

1

2

4

3

4

5

6

7

8

9

2

10

;

run;

data mnthin;

	infile cards;
	input month;
	cards;

1

2

3

4

5

6

7

8

9

10

11

12

;

run;

proc sql;

	create table cnts as
	select pid, month, count(*) as cnt
	from in
	group by pid, month
	;
	create table ids as
	select distinct pid
	from all
	;
	create table mnth as
	select distinct month
	from mnthin
	;
	create table cartesian as
	select a.pid, b.month
	from ids as a , mnth as b
	;
	create table out1 as
	select a.pid, a.month, coalesce(b.cnt,0) as cnt
	from cartesian as a left join cnts as b on a.pid =b.pid and a.month = b.month
	order by pid, month
	;

quit;

Hope this helps!

EJ

data out2;

	set out1;
	by pid month;

	if first.pid then do;
		vis_jan = 0; vis_feb = 0; vis_mar = 0; vis_apr = 0;
		vis_may = 0; vis_jun = 0; vis_jul = 0; vis_aug = 0;
		vis_sep = 0; vis_oct = 0; vis_nov = 0; vis_dec = 0;
	end;

	select (month);
		when (1) vis_jan = cnt;
		when (2) vis_feb = cnt;
		when (3) vis_mar = cnt;
		when (4) vis_apr = cnt;
		when (5) vis_may = cnt;
		when (6) vis_jun = cnt;
		when (7) vis_jul = cnt;
		when (8) vis_aug = cnt;
		when (9) vis_sep = cnt;
		when (10) vis_oct = cnt;
		when (11) vis_nov = cnt;
		when (12) vis_dec = cnt;
	end;

	retain vis: ;
	drop cnt month;
	if last.pid then output;

run;

chuakp · Posted 07-05-2013 12:04 PM

Thanks EJ. I have never used SQL before and am having trouble deciphering the code. I should probably take some time to learn SQL in the near future.

I'm wondering, though, if there is a non-SQL way to do this?

esjackso · Posted 07-05-2013 12:16 PM

The sql is doing the counts and creating a full set of ids months to merge the counts into. If I remember correctly a data step merge will not give you a cartesian product (all the ids months combinations) in a long format instead of the wide you desire.

You can repalce the sql count with proc freq and an output statement (with some additional cleanup)

You can replace the two distincts with proc sort nodupkey

Im sure others can provide better data step suggestions.

EJ

Tom · Posted 07-05-2013 12:14 PM

Use PROC SUMMARY and then TRANSPOSE. You can create a CLASSDATA table to insure that all months are represented.

data months;

do month=1 to 12; output; end;

run;

proc summary data=have nway classdata=months;

by pid;

class month ;

output out=counts ;

run;

proc transpose data=counts out=want prefix=vis_ ;

by pid;

id month;

var _freq_;

run;

Ksharp · Posted 07-05-2013 12:16 PM

data have;
input PID            MONTH;
cards;
1               3
2               5
3               6
4               1
5               12
;
run;
data want(keep=pid vis:) ;
 set have;
 array a{12} VIS_JAN   VIS_FEB  VIS_MAR  VIS_APR  VIS_MAY  VIS_JUN   VIS_JUL  VIS_AUG  VIS_SEP  VIS_OCT   VIS_NOV  VIS_DEC ;
 do i=1 to dim(a);
  if i=month then a{i}=1;
   else a{i}=0;
 end;
run;

Ksharp

esjackso · Posted 07-05-2013 12:23 PM

and

Correct me if Im wrong but I dont think either of your methods account for PIDs that are not in the source dataset. So if a full set is desired some would be missing.

Also believe Ksharp code just flags if a visit happens in a month and not the number of visits in that month, (again forgive if I missed something).

EJ

Tom · Posted 07-05-2013 12:36 PM

It will include all PID values in the input dataset, but If there is a source of PID values in another dataset then you can use that to generate a CLASSDATA dataset for PROC SUMMARY. Just convert the proc summary code to treat PID as a CLASS variable instead of a BY variable.

Here is one way.

data classdata ;

set PIDLIST (keep=pid );

by pid;

if first.pid;

do month=1 to 12;

output;

end;

run;

chuakp · Posted 07-05-2013 12:50 PM

Thanks Tom. I still am not getting any entries for PID 3, who is not in the original clinical visit dataset. This is the code I am using:

data have;

input PID MONTH;

cards;

1 3

2 5

4 6

4 1

5 12

;

run;

data PIDLIST;

do PID = 1 to 5; output; end;

run;

data classdata;

set PIDLIST (keep = PID);

by PID;

if first.PID;

do month=1 to 12;

output;

end;

run;

proc summary data=have nway classdata=months;

by pid;

class month ;

output out=counts ;

run;

proc transpose data=counts out=want prefix=vis_ ;

by pid;

id month;

var _freq_;

run;

Tom · Posted 07-05-2013 12:55 PM

You have to use your new dataset and tell summary that PID is now a CLASS variable.

proc summary data=have nway classdata=CLASSDATA;

class pid month ;

output out=counts ;

run;

chuakp · Posted 07-05-2013 12:57 PM

Thanks, Tom. I appreciate your help.

chuakp · Posted 07-05-2013 12:37 PM

Thanks to Tom and KSharp for their input. EJ is correct; neither of the solutions account for PIDs that are not in the source dataset.

Ksharp · Posted 07-05-2013 09:11 PM

It looks like I don't understand OP totally.

data have;
input PID            MONTH;
cards;
1               3
2               5
4               6
4               1
5               12
;
run;
 

 

data PIDLIST;
do PID = 1 to 5; output; end;
run;

data have;
 merge have pidlist;
 by pid;
run;

data temp(keep=pid vis:) ;
 set have;
 array a{12} VIS_JAN   VIS_FEB  VIS_MAR  VIS_APR  VIS_MAY  VIS_JUN   VIS_JUL  VIS_AUG  VIS_SEP  VIS_OCT   VIS_NOV  VIS_DEC ;
 do i=1 to dim(a);
  if i=month then a{i}=1;
   else a{i}=0;
 end;
run;
proc summary data=temp;
 by pid;
 var vis:;
 output out=want(drop=_:) sum= ;
run;

Ksharp

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away