This is the challenge that I am am facing:
I have a small (880000 obs) data set. There are a minimum of 4 unique claim numbers for each recipient, so this 880K data set is my working calculation set (with a minimum of 4 claims up to, I think there is one person with 77 claim lines!) I want to calculate a Bice - Boxer man index for each recipient and then put (write) the result, along with the Recipient ID to a new data set. This data set will have (using hypothetical numbers) say 2,500 distinct recipients each with a B-B Index value. The B-B index values vary between 0 and 1 inclusive. I haven't successfully worked this out - thus, I am here - but this rough schematic shows one recipient with multiple claim lines in the working calculation set. Incidentally I am counting up the number of primary care providers to use in the B-B index over each block of claims.
ID Claim_No PCP_ID
123453 897605 40000
123453 245168 40000
123453 589741 40000
123453 589456 54123
123456 659421 65874
123456 568971 21356
123456 474895 32150
123456 412008 32150
123456 568888 32150
123456 213004 32150
There are 5 distinct PCP providers here. The B-B Index is calculated as follows with individual provider j, s number of providers
and n visits (distinct claims)
∑ n^2 - n j=1 to s n(n-1) |
using this formula the B-B index for recipient 123456 is calculated as: B-B = (3^sq - 3) + (1^sq - 1) + (1^sq - 1) + (1^sq - 1)
+(4^sq - 4) / (5)*(4)
= 6 + 0 + 0 + 0 +12 / 20
= 18/20
= 0.9
Then I would like to put the each ID value and corresponding calculated B-B index to a new data set
New data set
obs ID B-B index
1 123456 0.9
. . . . . .
2500 345267 0.23
That is what I am trying to do. I apologize that I don't have any active code, but I can say that the program (which started with many million obs) is littered with false starts and incorrect results.
Any tips, helps, techniques from the community of experienced users are welcomed.
Walt Lierman
You formula really needs to show two different n's, one for the n of individual providers and the N of number of providers.
One way:
data have; input id $ claim_no $ PCP_id $; datalines; 123456 897605 40000 123456 245168 40000 123456 589741 40000 123456 589456 54123 123456 659421 65874 123456 568971 21356 123456 474895 32150 123456 412008 32150 123456 568888 32150 123456 213004 32150 ; run; proc summary data=have nway ; class id pcp_id; output out=sum1 ; run; proc summary data=sum1 nway; class id; output out=sum2; run; data temp; merge sum1 (rename=(_freq_=n)) sum2 (rename=(_freq_=bign)) ; by id; val = (n**2 - n)/ (bign * (bign-1)); run; proc summary data=temp nway; class id; var val; output out=want (drop=_:) sum=B_B_index; run;
Note that your example data has two different ID values and so does not actually match you manual calculations. I "fixed" the ID on the first 4 records.
Proc summary without any VAR variable gives a count. The counts could be done with proc SQL or proc freq as well.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.