Solved: Weighted Average or Mean amounts on multiple segments

bknitch · Posted 05-12-2020 11:25 AM

I have a large data set that has multiple tiers to a C_ID. The C_ID serves as the foundation of the detail, from there a P_ID is associated and finally a segment_id. I'm trying to find the weighted average PART_C_D_PREMIUM of each P_ID that is associated to the parent C_ID by rolling up the segment_id's premiums.

I need to calculate weights based on frequency or number of segment_id's associated to each p_id and c_id. Not sure if a weighted mean is possible using this methodology, or if anyone has a better approach. If its not possible, what is the best way to find the mean premium and rolling up the segment_id's so that its just c_id and p_id with the corresponding means for each premium.

Example below of what I want my output to look like

C_ID	P_ID	parentorg	plan_type	plan_name	PART_C_D_Premium	Mean
H05	067	A	HM	BC	0	0
H15	006	A	LP	AM	47,44,47	46

Here is my data set

data have;
	infile datalines dlm="09"x;
	input C_ID $ P_ID $ parentorg $ plan_type $ plan_name $ segment_id p_c_premium p_d_premium part_c_d_prem_combine;
    datalines;
    H05 	067	A 	HM 	BC	1	0	0	0
    H05 	067	A 	HM 	BC	2	0	0	0
    H10 	237	H 	HM 	HG	1	0	0	0
    H10 	237	H 	HM 	HG	2	0	0	0
    H15 	006	A 	LP	AM	1	0	47	47
    H15 	006	A 	LP	AM	2	0	44	44
    H15 	006	A 	LP	AM	4	0	47	47
    H19	    014	P 	HM 	PH	1	0	0	0
    H19	    014	P 	HM 	PH	2	25	0	25
    H20 	099	H 	HM 	HG	1	0	0	0
    H20 	099	H 	HM 	HG	2	0.6	22.4	23
    H20 	100	H 	HM 	HG	4	0	0	0
    H20 	100	H 	HM 	HG	5	19	0	19
    H39 	059	G 	LP	GG	3	36.4	33.6	70
    H39 	059	G 	LP	GG	4	35.9	39.1	75
    H39 	060	G 	LP	GG	3	0	0	0
    H39 	060	G 	LP	GG	4	0	0	0
    H39 	061	G 	LP	GG	1	122.8	56.2	179
    H39 	061	G 	LP	GG	2	102.8	56.2	159
    H39 	061	G 	LP	GG	5	62.8	56.2	119
	;
	run;

PaigeMiller · Posted 05-12-2020 01:01 PM

Just to make sure I am understanding this properly

For example C_ID= H15 P_ID= 006 has three rows, with values 47, 44 and 47 and segment IDs of 1, 2 and 4, and the value you want is (47*1+44*2+47*4)/(1+2+4) = 46.14...

As I said, PROC SUMMARY makes this easy to do. Although your PROC SQL also gives this answer, PROC SQL will give an incorrect answer in the presence of missing values, while PROC SUMMARY does not have this problem.

proc summary data=have nway;
    class contract_id plan_id;
    var part_c_d_prem_combine /weight=segment_id1;
    output out=want mean=;
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 05-12-2020 12:20 PM

Actually its not hard to do using PROC SUMMARY, but its not clear from your description what the weights are. The weights have to be a variable in your data set.

--
Paige Miller

bknitch · Posted 05-12-2020 12:49 PM

The weighted average, should be the frequency of segment_id's under a specific c_id and p_id combination. For example C_ID= H15 P_ID= 006 has 3 segment_id's (1,2,4) I think i figured it out using a proc sql; It seems to work accordingly once i built the variable of segment_id freq in.

proc sql;
title 'WEIGHTED AVERAGES';
select c_id, p_id,
sum(part_c_d_prem_combine * segment_id1)/sum(segment_id1) as weighted_average
from work.plan15
group by contract_id, plan_id;
run;

PaigeMiller · Posted 05-12-2020 01:01 PM

Just to make sure I am understanding this properly

For example C_ID= H15 P_ID= 006 has three rows, with values 47, 44 and 47 and segment IDs of 1, 2 and 4, and the value you want is (47*1+44*2+47*4)/(1+2+4) = 46.14...

As I said, PROC SUMMARY makes this easy to do. Although your PROC SQL also gives this answer, PROC SQL will give an incorrect answer in the presence of missing values, while PROC SUMMARY does not have this problem.

proc summary data=have nway;
    class contract_id plan_id;
    var part_c_d_prem_combine /weight=segment_id1;
    output out=want mean=;
run;

--
Paige Miller

bknitch · Posted 05-12-2020 01:02 PM

@PaigeMiller thanks for the information!

Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Registration is open

Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Re: Weighted Average or Mean amounts on multiple segments

Registration is open

SAS Training: Just a Click Away