BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

This is the challenge that I am am facing:

 

I have a small (880000 obs) data set.  There are a minimum of 4 unique claim numbers for each recipient, so this 880K data set is my working calculation set (with a minimum of 4 claims up to, I think there is one person with 77 claim lines!) I want to calculate a Bice - Boxer man index for each recipient and then put (write) the result, along with the Recipient ID to a new data set.  This data set will have (using hypothetical numbers) say 2,500 distinct recipients each with a B-B Index value. The B-B index values vary between 0 and 1 inclusive.  I haven't successfully worked this out - thus, I am here - but this rough schematic shows one recipient with multiple claim lines in the working calculation set. Incidentally I am counting up the number of primary care providers to use in the B-B index over each block of claims. 

 

ID                                Claim_No                              PCP_ID

123453                         897605                               40000

123453                         245168                               40000

123453                         589741                                40000

123453                         589456                                54123

123456                         659421                                65874

123456                         568971                                21356

123456                         474895                                32150

123456                         412008                                32150

123456                         568888                                32150

123456                         213004                                32150

 

There are 5 distinct PCP providers here.  The B-B Index is calculated as follows with individual provider j, s number of providers

and n visits (distinct claims)           

                                  ∑                 n^2 - n   

                                j=1 to s          n(n-1)    

 

 

using this formula the B-B index for recipient 123456 is calculated as:  B-B =     (3^sq - 3)  + (1^sq - 1) + (1^sq - 1) + (1^sq - 1)

                                                                                                                           +(4^sq - 4) / (5)*(4)

                                                                                                                      =       6 + 0 + 0 + 0 +12 / 20

                                                                                                                      =        18/20

                                                                                                                      =         0.9

   Then I would like to put the each ID value and corresponding calculated B-B index to a new data set

 

New data set

 

obs           ID                  B-B index

   1          123456             0.9

  . . .                      . . .

2500        345267             0.23

 

That is what I am trying to do.  I apologize that I don't have any active code, but I can say that the program (which started with many million obs) is littered with false starts and incorrect results.

 

Any tips, helps, techniques from the community of experienced users are welcomed.

 

Walt Lierman

               

                                                                                

 

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
wlierman
Lapis Lazuli | Level 10
Hello ballardw,
You nailed it! Thank you very much for the code and catching my typo (ID value).
This really breaks a log-jam.
Take care.

Cheers,

W Lierman

View solution in original post

3 REPLIES 3
ballardw
Super User

You formula really needs to show two different n's, one for the n of individual providers and the N of number of providers.

 

One way:

data have;
   input id $ claim_no $ PCP_id $;
datalines;
123456  897605  40000 
123456  245168  40000
123456  589741  40000
123456  589456  54123
123456  659421  65874
123456  568971  21356
123456  474895  32150
123456  412008  32150
123456  568888  32150
123456  213004  32150
;
run;

proc summary data=have nway ;
  class id pcp_id;
  output out=sum1 ;
run;
proc summary data=sum1 nway;
   class id;
   output out=sum2;
run;

data temp;
  merge  sum1 (rename=(_freq_=n))
         sum2 (rename=(_freq_=bign))
  ;
  by id;
  val = (n**2 - n)/ (bign * (bign-1));
run;

proc summary data=temp nway;
   class id;
   var val;
   output out=want (drop=_:) sum=B_B_index;
run;

Note that your example data has two different ID values and so does not actually match you manual calculations. I "fixed" the ID on the first 4 records.

 

Proc summary without any VAR variable gives a count. The counts could be done with proc SQL or proc freq as well.

wlierman
Lapis Lazuli | Level 10
Hello ballardw,
You nailed it! Thank you very much for the code and catching my typo (ID value).
This really breaks a log-jam.
Take care.

Cheers,

W Lierman
novinosrin
Tourmaline | Level 20

HI @wlierman Sorry for the bother. Would be nice and courteous and ethical if you could give the credit to @ballardw by marking his answer as the accepted solution. Thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1589 views
  • 0 likes
  • 3 in conversation