<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Remove multiple observations, aggregate on studyID and dispensing date, sum payment variable in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Remove-multiple-observations-aggregate-on-studyID-and-dispensing/m-p/811330#M319988</link>
    <description>&lt;P&gt;Why do you not think the payment as summed?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you include variables that are not included in the GROUP BY statement or an aggregate calculation (ie diag1, diag2) it will cause SAS to merge the summary data with the original data causing you to have duplicates. If you expect DIAG1/DIAG2 to remain constant over the grouping you should add them to the GROUP BY statement. If you do not expect them to be consistent you need to define rules on which one to take and the solution will vary based on those rules.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/404316"&gt;@jusjolly&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First time poster so apologies if I am unclear.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am working with a large study cohort which has some claims data. In the data, there can be multiple (duplicate) observations per day. I want to define a duplicate as an observation with the same studyID and dispensing date and aggregate the data based on these two variables. I also want to sum the payment variables ('pay', 'ded') that occur in these separate observations. I want to keep some of the other variables in the data as well (eg diag1, diag2, age, region, SES).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have tried:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;create table test as select distinct&lt;/P&gt;
&lt;P&gt;studyid, diag1, diag2, age, region, SES, dispensedate,&lt;/P&gt;
&lt;P&gt;(sum(pay)) as totalpay,&lt;/P&gt;
&lt;P&gt;(sum(ded)) as totalded,&lt;/P&gt;
&lt;P&gt;from studydata&lt;/P&gt;
&lt;P&gt;group by studyid, dispensedate;&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Looking at the proc means min/max, I do not think the payment variables were summed...&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 03 May 2022 21:47:13 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2022-05-03T21:47:13Z</dc:date>
    <item>
      <title>Remove multiple observations, aggregate on studyID and dispensing date, sum payment variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Remove-multiple-observations-aggregate-on-studyID-and-dispensing/m-p/811327#M319986</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First time poster so apologies if I am unclear.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working with a large study cohort which has some claims data. In the data, there can be multiple (duplicate) observations per day. I want to define a duplicate as an observation with the same studyID and dispensing date and aggregate the data based on these two variables. I also want to sum the payment variables ('pay', 'ded') that occur in these separate observations. I want to keep some of the other variables in the data as well (eg diag1, diag2, age, region, SES).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have tried:&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table test as select distinct&lt;/P&gt;&lt;P&gt;studyid, diag1, diag2, age, region, SES, dispensedate,&lt;/P&gt;&lt;P&gt;(sum(pay)) as totalpay,&lt;/P&gt;&lt;P&gt;(sum(ded)) as totalded,&lt;/P&gt;&lt;P&gt;from studydata&lt;/P&gt;&lt;P&gt;group by studyid, dispensedate;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Looking at the proc means min/max, I do not think the payment variables were summed...&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 May 2022 21:41:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Remove-multiple-observations-aggregate-on-studyID-and-dispensing/m-p/811327#M319986</guid>
      <dc:creator>jusjolly</dc:creator>
      <dc:date>2022-05-03T21:41:38Z</dc:date>
    </item>
    <item>
      <title>Re: Remove multiple observations, aggregate on studyID and dispensing date, sum payment variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Remove-multiple-observations-aggregate-on-studyID-and-dispensing/m-p/811330#M319988</link>
      <description>&lt;P&gt;Why do you not think the payment as summed?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you include variables that are not included in the GROUP BY statement or an aggregate calculation (ie diag1, diag2) it will cause SAS to merge the summary data with the original data causing you to have duplicates. If you expect DIAG1/DIAG2 to remain constant over the grouping you should add them to the GROUP BY statement. If you do not expect them to be consistent you need to define rules on which one to take and the solution will vary based on those rules.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/404316"&gt;@jusjolly&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First time poster so apologies if I am unclear.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am working with a large study cohort which has some claims data. In the data, there can be multiple (duplicate) observations per day. I want to define a duplicate as an observation with the same studyID and dispensing date and aggregate the data based on these two variables. I also want to sum the payment variables ('pay', 'ded') that occur in these separate observations. I want to keep some of the other variables in the data as well (eg diag1, diag2, age, region, SES).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have tried:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;create table test as select distinct&lt;/P&gt;
&lt;P&gt;studyid, diag1, diag2, age, region, SES, dispensedate,&lt;/P&gt;
&lt;P&gt;(sum(pay)) as totalpay,&lt;/P&gt;
&lt;P&gt;(sum(ded)) as totalded,&lt;/P&gt;
&lt;P&gt;from studydata&lt;/P&gt;
&lt;P&gt;group by studyid, dispensedate;&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Looking at the proc means min/max, I do not think the payment variables were summed...&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 May 2022 21:47:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Remove-multiple-observations-aggregate-on-studyID-and-dispensing/m-p/811330#M319988</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2022-05-03T21:47:13Z</dc:date>
    </item>
  </channel>
</rss>

