BookmarkSubscribeRSS Feed
finbar_gillen
Calcite | Level 5

Hi all,

 

I am looking for a data step or proc sql function to help predict a sub group variable in a dataset. Anyone any ideas??

 

Example:

 

Group 1 has an average income of 45000, within this group there are 4 sub groups within various degrees of affluence. I want to use the affluence % along with the average group income to predict the sub group income.

 

groupincomesubgroupaffluencesubgroup income
145000100160.2?
145000100240.3?
145000100339.4?
145000100420.1?
230000200120.1?
230000200214.2?
23000020039.4?
230000200410.3?
23000020058.7?
350000300160.9?
350000300272.3?
4 REPLIES 4
ballardw
Super User

So exactly how to you intend to use the affluence percentage to "predict" the subgroup?

If you can work out a few examples by hand then show us the expected result for those examples. Otherwise we are shooting very blind.

 

BTW, I would suggest that your subgroup is not continuous but categorical. The very name indicates such: subgroup. Groups are categorical. Continuous tends to be something measured such as height, weight, sales volume.

finbar_gillen
Calcite | Level 5

Hi @ballardw,

 

The sub group is categorical yes... it is the sub group affluence score that is continuous, this is the variable that I want to use to try and assign a predicted subgroup income.

 

I was thinking of creating 6 bins using the quantile method for the income column to create affluence groupings... i.e. very affluent, affluent, slightly affluent, slightly deprived, deprived, very deprived. If a sub group is very affluent then is should have an income that is somewhat higher than the groups average.. if a sub group is very disadvantaged then is should have an income that is somewhat lower than the groups average. 

 

I want the income to increase in more affluent sub groups and decrease in less deprived ones, so that there is an income for each sub group. 

ballardw
Super User

@finbar_gillen wrote:

Hi @ballardw,

 

The sub group is categorical yes... it is the sub group affluence score that is continuous, this is the variable that I want to use to try and assign a predicted subgroup income.

 

I was thinking of creating 6 bins using the quantile method for the income column to create affluence groupings... i.e. very affluent, affluent, slightly affluent, slightly deprived, deprived, very deprived. If a sub group is very affluent then is should have an income that is somewhat higher than the groups average.. if a sub group is very disadvantaged then is should have an income that is somewhat lower than the groups average. 

 

I want the income to increase in more affluent sub groups and decrease in less deprived ones, so that there is an income for each sub group. 


I think that if possible you want to go back to how the "affluence score" was created. Or at least share exactly what it measures. The first post indicates it is supposed to be a percentage. Percentage of what? Perhaps rawer data would provide a more robust approach.

Ksharp
Super User

Does Income conform to Normal/Gamma distribution?

If it does , using QUANTILE() function, and @Rick_SAS  have done a lot of this stuff .

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 589 views
  • 0 likes
  • 3 in conversation