<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Count by Group Returning Duplicate Rows Proc SQL in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785514#M250696</link>
    <description>&lt;P&gt;Also use the CALCULATED keyword in the GROUP BY clause for newly created variables that have the same name as the source variables in the dataset.&lt;/P&gt;</description>
    <pubDate>Sat, 11 Dec 2021 07:44:17 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2021-12-11T07:44:17Z</dc:date>
    <item>
      <title>Count by Group Returning Duplicate Rows Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785483#M250682</link>
      <description>&lt;P&gt;I am trying to get a count of members per network, product, line of business, zip and county. It works until I add in County. Then I get duplicate rows and it splits the totals as if the county is different, when it's not (All rows are Los Angeles County). I have tried removing county from the group by, but not select statement, and then I still get duplicate rows only with the same count on the dup row.&lt;/P&gt;&lt;P&gt;The count only works if I remove the County from both the group by and the Select (and order by). I have tried format and length statements as well as trimming the County. Nothing works. My code and sample data examples are below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Proc SQL;&lt;BR /&gt;Create Table ActualMbrshp AS&lt;BR /&gt;Select Distinct&lt;/P&gt;&lt;P&gt;TRIM(NTWK_NM) as NTWK_NM&lt;BR /&gt;,TRIM(NWID.NetworkID) as NetworkID&lt;BR /&gt;,CAT(TRIM(ProdLvl2)," ",TRIM(MBULvl2)) as LineofBusiness&lt;BR /&gt;,TRIM(ProdNm) as ProdNm&lt;BR /&gt;,TRIM(UPPER(County)) as County format =$20. length=20&lt;BR /&gt;,TRIM(Zip) as Zip&lt;BR /&gt;,Count (*) as Count&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;From Mbrshp.PSHMO_Membership as mbr&lt;BR /&gt;left join Mbrshp.NetworkIds as NWID on UPPER(NWID.NetworkName)=mbr.NTWK_NM&lt;BR /&gt;Where Zip = "90001"&lt;BR /&gt;Group by NTWK_NM, LineofBusiness, ProdNm, County, Zip&lt;BR /&gt;Order By NTWK_NM, LineofBusiness, ProdNm, County, Zip;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Result:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jwilliams_0-1639173239345.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/66635i929A83A63B57801A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jwilliams_0-1639173239345.png" alt="jwilliams_0-1639173239345.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Desired Result (only with county name included):&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jwilliams_1-1639173325876.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/66636i106E5F1B22A80896/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jwilliams_1-1639173325876.png" alt="jwilliams_1-1639173325876.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Any help is much appreciated!&lt;/P&gt;</description>
      <pubDate>Fri, 10 Dec 2021 22:36:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785483#M250682</guid>
      <dc:creator>jwilliams</dc:creator>
      <dc:date>2021-12-10T22:36:29Z</dc:date>
    </item>
    <item>
      <title>Re: Count by Group Returning Duplicate Rows Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785492#M250687</link>
      <description>&lt;P&gt;First thing I'd be doing is to run below code.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=ActualMbrshp out=test;
  by &amp;lt;variables from order by clause&amp;gt;;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If the "duplicates" remain then you know that there are actual differences in your group by variables - for example tabs that don't print.&lt;/P&gt;
&lt;P&gt;If there aren't duplicates anymore then you know that you need to tweak your SQL.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let us know what case applies and we go from there.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Dec 2021 01:36:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785492#M250687</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2021-12-11T01:36:34Z</dc:date>
    </item>
    <item>
      <title>Re: Count by Group Returning Duplicate Rows Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785497#M250689</link>
      <description>&lt;P&gt;Looks to me like you are selecting 6 variables,&lt;/P&gt;
&lt;PRE&gt;1 -&amp;gt; TRIM(NTWK_NM) as NTWK_NM
2 -&amp;gt; ,TRIM(NWID.NetworkID) as NetworkID
3 -&amp;gt; ,CAT(TRIM(ProdLvl2)," ",TRIM(MBULvl2)) as LineofBusiness
4 -&amp;gt; ,TRIM(ProdNm) as ProdNm
5 -&amp;gt; ,TRIM(UPPER(County)) as County format =$20. length=20
6 -&amp;gt; ,TRIM(Zip) as Zip&lt;/PRE&gt;
&lt;P&gt;but grouping by only 5 of them.&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Group by 
1 -&amp;gt; NTWK_NM
2 -&amp;gt; , LineofBusiness
3 -&amp;gt; , ProdNm
4 -&amp;gt; , County
5 -&amp;gt; , Zip&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So the 6th variable must include different values.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Dec 2021 02:22:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785497#M250689</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-12-11T02:22:33Z</dc:date>
    </item>
    <item>
      <title>Re: Count by Group Returning Duplicate Rows Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785505#M250692</link>
      <description>NetWORKID is on your SELECT but not your GROUP BY. You should see a message in your log about remerging statistics? Either include it in the GROUP BY or remove it from the SELECT. &lt;BR /&gt;&lt;BR /&gt;Most SQL will actually error out on this query, but it's also a very handy feature when you want to add summary statistics to a data set quickly.</description>
      <pubDate>Sat, 11 Dec 2021 03:43:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785505#M250692</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-12-11T03:43:01Z</dc:date>
    </item>
    <item>
      <title>Re: Count by Group Returning Duplicate Rows Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785514#M250696</link>
      <description>&lt;P&gt;Also use the CALCULATED keyword in the GROUP BY clause for newly created variables that have the same name as the source variables in the dataset.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Dec 2021 07:44:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785514#M250696</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-12-11T07:44:17Z</dc:date>
    </item>
    <item>
      <title>Re: Count by Group Returning Duplicate Rows Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785769#M250808</link>
      <description>&lt;P&gt;Thank you! I ran this and still got dup rows so I went back to the original source file (XLSX) and trimmed/cleaned the County column there and saved as txt (for infile). I reran my code and I no longer get dups! Not sure why the trim/clean in SAS SQL didn't take care of that but perhaps it was because that column was originally saved as General instead of Text/ I converted it&amp;nbsp; to text this time before saving the file as txt.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Dec 2021 15:10:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-by-Group-Returning-Duplicate-Rows-Proc-SQL/m-p/785769#M250808</guid>
      <dc:creator>jwilliams</dc:creator>
      <dc:date>2021-12-13T15:10:33Z</dc:date>
    </item>
  </channel>
</rss>

