Solved: Re: Number of Claims by quarter of year

GKati · Posted 06-09-2017 10:23 AM

Hello Everyone,

I have a large dataset of health insurance claims of the following structure:

data claims;

input client year quarter specialty;

datalines;

1 2012 1 22

1 2012 1 30

2 2013 2 10

;

run;

I would like to:

1. Create variables for each specialty (1 to 90) named spec1, spec 2, etc.

2. Place the sum of claims per quarter of year and client from that specialty in the approrpriate variable cell.

My desired output would look something like this:

client	year	quarter	Specialty	Spec22	Spec10	spec30
1	2012	1	22	2	0	0
1	2012	1	22	2	0	0
1	2012	1	30	0	0	1
2	2013	2	10	0	2	0
2	2013	2	10	0	2	0

Thanks for your help.

Reeza · Posted 06-09-2017 10:48 AM

Use your favourite summary proc (proc means/univariate/summary/sql) to summarize to quarter and then use PROC TRANSPOSE to reformat the data.

View solution in original post

ballardw · Posted 06-09-2017 10:29 AM

Is this for a report? If not what exactly will you do with the resulting data? For many purposes changing data to a "wide" format as you requiest complicates a large number of further operations.

GKati · Posted 06-09-2017 10:38 AM

Hi ballardw,

Am I changing it to wide? Not exactly, because that wouldn't give me the sum of claims per quarter as the value. Correct?

It is difficult to explain exactly. Eventually, I would like to collapse the dataset by quarter of year. Similar to this:

client	year	quarter	Specialty	Spec22	Spec10	spec30
1	2012	1	22	2	0	1
2	2013	2	10	0	2	0

And merge this into my client dataset. At the moment, my problem is that i have several (sometimes 400) claims per client from different specialties and hospitals, and I can't merge the two dataset appropriately.

Thanks for your comment.

Reeza · Posted 06-09-2017 10:48 AM

Use your favourite summary proc (proc means/univariate/summary/sql) to summarize to quarter and then use PROC TRANSPOSE to reformat the data.

GKati · Posted 06-09-2017 03:24 PM

I didn't think of that. Thanks for the tip.

GKati · Posted 06-10-2017 01:37 AM

Hello Reeza,

I am trying your method with proc transpose. I need to see the number of claims per CLIENT by specialty and quarter of year. I have about 4 million claims from probably a few million different clients, multiplied by the number of periods.

I have this so far, but I somehow doubt that this is going to work for SAS:

PROC tabulate DATA=work.claims out=claims_out;

CLASS client year quarter specialty;

VAR specialty;

tables client*year*quarter*specialty, N;

title "Claims by Specialty and quarter";

run;

Any suggestions?

GKati · Posted 06-10-2017 02:26 AM

Hello Everyone,

I have a large dataset of health insurance claims of the following structure:

data claims;

input client year quarter specialty;

datalines;

1 2012 1 22

1 2012 1 30

2 2013 2 10

;

run;

I would like to summarize the number of claims per CLIENT by specialty and quarter of years. In other words, I would like to collapse my data by quarters.

My desired output would look something like this:

client	year	quarter	Specialty	Spec22	Spec10	spec30
1	2012	1	22	2	0	1
2	2013	2	10	0	2	0

Reeza on this forum suggested using proc tabulate (or something similar) and then proc transpose. But my problem is that I have 4.5 million claims from milions of clients. And this seems to be too much for SAS to write out.

This is what I've tried so far:

PROC tabulate DATA=work.claims out=claims_out;

CLASS client year quarter;

VAR specialty;

tables client*year*quarter*specialty, N;

title "Claims by Specialty and quarter";

run;

Any suggestions?

Yegen · Posted 06-10-2017 05:36 AM

The following code yields the desired outcome:

proc sql;
  create table work.claims_out as
    select *, 
       	sum(Case when specialty=22 then 1 else 0 end) as Spec22,
    	sum(Case when specialty=10 then 1 else 0 end) as Spec10,     
   		sum(Case when specialty=30 then 1 else 0 end) as Spec30
    from work.claims
    group by client, year, quarter;
quit;

proc sort data=work.claims_out nodupkey;
	by client year quarter;
quit;

GKati · Posted 06-10-2017 06:45 AM

Thanks Yegen. In other words, this is not possible with Proc Tabulate...

One problem I have with this code, is that I actually have 90 specialties. Is there a way I don't have to manually generate 90 Spec variables?

Thanks again.

Kurt_Bremser · Posted 06-10-2017 06:56 AM

Sort by client, year, quarter and speciality.

Do a proc summary with the same by statement.

Determine the needed columns with a proc sort nodupkey by speciality.

Write the data step that does the transpose (or create it dynamically with call execute from the proc sort result).

How many "specialities" are there, anyway?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

GKati · Posted 06-10-2017 07:04 AM

KurtBremser: There are about 90 specialties. Thanks for your comment. I will try it.

Astounding · Posted 06-10-2017 07:04 AM

You should be able to get closer to your desired output with a few changes:

PROC tabulate DATA=work.claims out=claims_out;

CLASS client year quarter specialty;

tables client*year*quarter, specialty;

title "Claims by Specialty and quarter";

run;

This is untested code, so may need some tweaking.

This still won't get you SPECIALTY column you indicate, however. But your logic for obtaining it is not clear. Should it be the first specialty within that Client / Year / Quarter The most frequently occurring specialty? Should it be removed entirely?

GKati · Posted 06-10-2017 10:53 AM

Astounding: I am trying to create a claims dataset with basic information on each claim, that I can merge in with the clients dataset. I have many (sometimes 400 claims) per client that I need information about in one dataset. Hence, the new dataset structure.

Ksharp · Posted 06-10-2017 07:28 AM

data claims;
input client year quarter specialty;
datalines;
1 2012 1 22
1 2012 1 22
1 2012 1 30
2 2013 2 10
2 2013 2 10
;
run;
proc freq data=claims noprint;
table client*year*quarter*specialty/out=have nopercent nocum ;
run;
proc sql noprint;
 select distinct cats('spec',specialty) into : list separated by ' '
  from have;
quit;
data want;
 set have;
 by client year quarter;
 array x{*} &list;
 retain &list;

if first.quarter then do;
 do i=1 to dim(x);
  x{i}=0;
 end;
end;

 do i=1 to dim(x);
  if cats('spec',specialty)=vname(x{i}) then do;
    x{i}=count;leave;
  end;
 end;

if last.quarter;
drop i specialty count percent;
run;

GKati · Posted 06-10-2017 11:24 AM

Thanks for your comment Ksharp. I get the following error message.

Data want;

43 set have;

44 by client year quarter;

45 array x{*} &list;

NOTE: Line generated by the macro variable "LIST".

45 spec. spec0 spec1 spec10 spec11 spec13 spec16 spec17 spec18 spec2 spec20 spec21 spec22 spec23 spec24 spec26 spec27

_____

22

201

45 ! spec28 spec29 spec3 spec30 spec35 spec37 spec4 spec5 spec6 spec60 spec61 spec62 spec7 spec8 spec89 spec90

ERROR 22-322: Syntax error, expecting one of the following: a name, a numeric constant, $, (, ;, _ALL_, _CHARACTER_, _CHAR_,

_NUMERIC_, _TEMPORARY_.

ERROR 201-322: The option is not recognized and will be ignored.

46 retain &list;

NOTE: Line generated by the macro variable "LIST".

46 spec. spec0 spec1 spec10 spec11 spec13 spec16 spec17 spec18 spec2 spec20 spec21 spec22 spec23 spec24 spec26 spec27

_____

22

201

46 ! spec28 spec29 spec3 spec30 spec35 spec37 spec4 spec5 spec6 spec60 spec61 spec62 spec7 spec8 spec89 spec90

ERROR 22-322: Syntax error, expecting one of the following: a name, ;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.

ERROR 201-322: The option is not recognized and will be ignored.

47

48 if first.quarter then do;

49 do i=1 to dim(x);

50 x{i}=0;

51 end;

52 end;

53 do i=1 to dim(x);

54 if cats('spec',specialty)=vname(x{i}) then do;

55 x{i}=count;leave;

56 end;

57 end;

58

Number of Claims by quarter of year

Re: Number of Claims by quarter of year

Re: Number of Claims by quarter of year

Re: Number of Claims by quarter of year

Re: Number of Claims by quarter of year

Re: Number of Claims by quarter of year

Re: Number of Claims by quarter of year

Summarize clients by quarter of year and specialty

Re: Summarize clients by quarter of year and specialty

Re: Summarize clients by quarter of year and specialty

Re: Summarize clients by quarter of year and specialty

Re: Summarize clients by quarter of year and specialty

Re: Summarize clients by quarter of year and specialty

Re: Summarize clients by quarter of year and specialty

Re: Summarize clients by quarter of year and specialty

Re: Summarize clients by quarter of year and specialty

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!