<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to aggregate data and still keep some of the original information in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280129#M14760</link>
    <description>&lt;P&gt;&lt;SPAN&gt;But there is only like 600 observations(Districts) in the new data set and let's say 50,000 observations in the old(households). I can't see these two being merged.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 25 Jun 2016 09:23:26 GMT</pubDate>
    <dc:creator>Pejak</dc:creator>
    <dc:date>2016-06-25T09:23:26Z</dc:date>
    <item>
      <title>How to aggregate data and still keep some of the original information</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280120#M14758</link>
      <description>&lt;P&gt;I have some data, which I need to aggregate from a household level to district level. This I have done with proc summary. The trouble is that I need one of the original variables when I'm doing my regression: The standard errors are clustered at the household level. I'm using the proc surveyreg to do the regression and I need to specify the cluster to be at the household level. This variable is not available&amp;nbsp;as I have aggregated the information. How can I do my regression?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm using SAS 9.4.&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jun 2016 07:43:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280120#M14758</guid>
      <dc:creator>Pejak</dc:creator>
      <dc:date>2016-06-25T07:43:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to aggregate data and still keep some of the original information</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280126#M14759</link>
      <description>&lt;P&gt;You could remerge your aggregated data back with the original. That would repeat the groupically-level data for every member of the group but from what I understand that is what you're after.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In many cases PROC SQL would do that for you automatically, leading to the famous note&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;NOTE: The query requires remerging summary statistics back with the original data.&lt;/PRE&gt;
&lt;P&gt;Nothing is keeping you from joining the summary dataset back with its origin:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
   select h.household_id, h.district_id, h.income, s.income_avg
     from household h, summary.s 
        where h.district_id=s.district_id;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 25 Jun 2016 08:54:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280126#M14759</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2016-06-25T08:54:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to aggregate data and still keep some of the original information</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280129#M14760</link>
      <description>&lt;P&gt;&lt;SPAN&gt;But there is only like 600 observations(Districts) in the new data set and let's say 50,000 observations in the old(households). I can't see these two being merged.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jun 2016 09:23:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280129#M14760</guid>
      <dc:creator>Pejak</dc:creator>
      <dc:date>2016-06-25T09:23:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to aggregate data and still keep some of the original information</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280132#M14761</link>
      <description>&lt;P&gt;Hi &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/91783"&gt;@Pejak﻿&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;No they will not be merged 1:1 obviously. But there is a 1:n relationship that will easily be merged with the code I suggested. The result would have the same 50.000 household records that you started with. The district data will be repeated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If that does not meet your requirements than please give us examples of your data that you have and that you want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;regards,&lt;/P&gt;
&lt;P&gt;- Jan.&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jun 2016 10:50:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280132#M14761</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2016-06-25T10:50:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to aggregate data and still keep some of the original information</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280274#M14768</link>
      <description>&lt;P&gt;I'm not a statistician, so I',m probably out of bounds here.&lt;/P&gt;
&lt;P&gt;But, why cant you do your analysis on the original data where you have the necessary variables?&lt;/P&gt;
&lt;P&gt;If you need to go for the&amp;nbsp;aggregated data, how would you suspect to match the cluster variable to the district? There are surely multiple per district, otherwise it wouldn't be a problem. You need a business rule to chose the proper cluster level, or else it will be random.&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jun 2016 20:03:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-to-aggregate-data-and-still-keep-some-of-the-original/m-p/280274#M14768</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-06-26T20:03:56Z</dc:date>
    </item>
  </channel>
</rss>

