Re: Predicting Sub Group variable based on constant Group variable and...

finbar_gillen · Posted 09-09-2019 06:49 AM

Hi all,

I am looking for a data step or proc sql function to help predict a sub group variable in a dataset. Anyone any ideas??

Example:

Group 1 has an average income of 45000, within this group there are 4 sub groups within various degrees of affluence. I want to use the affluence % along with the average group income to predict the sub group income.

group	income	subgroup	affluence	subgroup income
1	45000	1001	60.2	?
1	45000	1002	40.3	?
1	45000	1003	39.4	?
1	45000	1004	20.1	?
2	30000	2001	20.1	?
2	30000	2002	14.2	?
2	30000	2003	9.4	?
2	30000	2004	10.3	?
2	30000	2005	8.7	?
3	50000	3001	60.9	?
3	50000	3002	72.3	?

ballardw · Posted 09-09-2019 10:57 AM

So exactly how to you intend to use the affluence percentage to "predict" the subgroup?

If you can work out a few examples by hand then show us the expected result for those examples. Otherwise we are shooting very blind.

BTW, I would suggest that your subgroup is not continuous but categorical. The very name indicates such: subgroup. Groups are categorical. Continuous tends to be something measured such as height, weight, sales volume.

finbar_gillen · Posted 09-09-2019 11:28 AM

Hi @ballardw,

The sub group is categorical yes... it is the sub group affluence score that is continuous, this is the variable that I want to use to try and assign a predicted subgroup income.

I was thinking of creating 6 bins using the quantile method for the income column to create affluence groupings... i.e. very affluent, affluent, slightly affluent, slightly deprived, deprived, very deprived. If a sub group is very affluent then is should have an income that is somewhat higher than the groups average.. if a sub group is very disadvantaged then is should have an income that is somewhat lower than the groups average.

I want the income to increase in more affluent sub groups and decrease in less deprived ones, so that there is an income for each sub group.

ballardw · Posted 09-09-2019 12:12 PM

@finbar_gillen wrote:

Hi @ballardw,

The sub group is categorical yes... it is the sub group affluence score that is continuous, this is the variable that I want to use to try and assign a predicted subgroup income.

I was thinking of creating 6 bins using the quantile method for the income column to create affluence groupings... i.e. very affluent, affluent, slightly affluent, slightly deprived, deprived, very deprived. If a sub group is very affluent then is should have an income that is somewhat higher than the groups average.. if a sub group is very disadvantaged then is should have an income that is somewhat lower than the groups average.

I want the income to increase in more affluent sub groups and decrease in less deprived ones, so that there is an income for each sub group.

I think that if possible you want to go back to how the "affluence score" was created. Or at least share exactly what it measures. The first post indicates it is supposed to be a percentage. Percentage of what? Perhaps rawer data would provide a more robust approach.

Ksharp · Posted 09-10-2019 08:37 AM

Does Income conform to Normal/Gamma distribution?

If it does , using QUANTILE() function, and @Rick_SAS have done a lot of this stuff .

Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Re: Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Re: Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Re: Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Re: Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Re: Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Re: Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Re: Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Re: Predicting Sub Group variable based on constant Group variable and continous Sub Group Variable

Ready to join fellow brilliant minds for the SAS Hackathon?