Programming the statistical procedures from SAS

How to aggregate data and still keep some of the original information

Reply
New Contributor
Posts: 2

How to aggregate data and still keep some of the original information

I have some data, which I need to aggregate from a household level to district level. This I have done with proc summary. The trouble is that I need one of the original variables when I'm doing my regression: The standard errors are clustered at the household level. I'm using the proc surveyreg to do the regression and I need to specify the cluster to be at the household level. This variable is not available as I have aggregated the information. How can I do my regression?

 

 

I'm using SAS 9.4.

Super Contributor
Posts: 409

Re: How to aggregate data and still keep some of the original information

You could remerge your aggregated data back with the original. That would repeat the groupically-level data for every member of the group but from what I understand that is what you're after.

 

In many cases PROC SQL would do that for you automatically, leading to the famous note

 

NOTE: The query requires remerging summary statistics back with the original data.

Nothing is keeping you from joining the summary dataset back with its origin:

 

proc sql;
   select h.household_id, h.district_id, h.income, s.income_avg
     from household h, summary.s 
        where h.district_id=s.district_id;
quit;
New Contributor
Posts: 2

Re: How to aggregate data and still keep some of the original information

But there is only like 600 observations(Districts) in the new data set and let's say 50,000 observations in the old(households). I can't see these two being merged.

Super Contributor
Posts: 409

Re: How to aggregate data and still keep some of the original information

Hi @Pejak

 

No they will not be merged 1:1 obviously. But there is a 1:n relationship that will easily be merged with the code I suggested. The result would have the same 50.000 household records that you started with. The district data will be repeated.

 

If that does not meet your requirements than please give us examples of your data that you have and that you want.

 

regards,

- Jan.

Super User
Posts: 5,316

Re: How to aggregate data and still keep some of the original information

I'm not a statistician, so I',m probably out of bounds here.

But, why cant you do your analysis on the original data where you have the necessary variables?

If you need to go for the aggregated data, how would you suspect to match the cluster variable to the district? There are surely multiple per district, otherwise it wouldn't be a problem. You need a business rule to chose the proper cluster level, or else it will be random.

Data never sleeps
Ask a Question
Discussion stats
  • 4 replies
  • 210 views
  • 0 likes
  • 3 in conversation