I have some data, which I need to aggregate from a household level to district level. This I have done with proc summary. The trouble is that I need one of the original variables when I'm doing my regression: The standard errors are clustered at the household level. I'm using the proc surveyreg to do the regression and I need to specify the cluster to be at the household level. This variable is not available as I have aggregated the information. How can I do my regression?
I'm using SAS 9.4.
You could remerge your aggregated data back with the original. That would repeat the groupically-level data for every member of the group but from what I understand that is what you're after.
In many cases PROC SQL would do that for you automatically, leading to the famous note
NOTE: The query requires remerging summary statistics back with the original data.
Nothing is keeping you from joining the summary dataset back with its origin:
proc sql;
select h.household_id, h.district_id, h.income, s.income_avg
from household h, summary.s
where h.district_id=s.district_id;
quit;
But there is only like 600 observations(Districts) in the new data set and let's say 50,000 observations in the old(households). I can't see these two being merged.
Hi @Pejak
No they will not be merged 1:1 obviously. But there is a 1:n relationship that will easily be merged with the code I suggested. The result would have the same 50.000 household records that you started with. The district data will be repeated.
If that does not meet your requirements than please give us examples of your data that you have and that you want.
regards,
- Jan.
I'm not a statistician, so I',m probably out of bounds here.
But, why cant you do your analysis on the original data where you have the necessary variables?
If you need to go for the aggregated data, how would you suspect to match the cluster variable to the district? There are surely multiple per district, otherwise it wouldn't be a problem. You need a business rule to chose the proper cluster level, or else it will be random.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.