<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Select distinct count not working in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927239#M41618</link>
    <description>Thanks! Sorry, do you mean the merging code or the SQL code?</description>
    <pubDate>Mon, 06 May 2024 21:31:05 GMT</pubDate>
    <dc:creator>Stanley3</dc:creator>
    <dc:date>2024-05-06T21:31:05Z</dc:date>
    <item>
      <title>Select distinct count not working</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927237#M41616</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm currently merging multiple datasets and consistently using the following SQL code to verify there are no duplicate IDs at each stage of the process:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;proc sql;&lt;BR /&gt;select count(distinct ID) as UniqueIDs,&lt;BR /&gt;count(*) as NObs&lt;BR /&gt;from dataset;&lt;BR /&gt;quit;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;This approach effectively identifies duplicates when datasets have multiple timepoints, and I make sure to only select one timepoint per ID. Throughout the merging process, my checks confirm that all IDs remain unique (e.g., 1500 observations and 1500 unique IDs).&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;However, after the final merge, the same SQL query unexpectedly shows 1500 observations but only 1300 unique IDs. Manual verification confirms that duplicates are present, despite prior checks showing no such issues. I'm looking for insights into why these duplicates weren’t detected sooner by the SQL query, or if there's a specific merging condition I might have overlooked.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Edited to add:&lt;/P&gt;&lt;P&gt;The SQL code above is the one that I use to check each dataset for duplicates (which it seems to find pretty well) and after each merge.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Merge code (throughout and also the last one):&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;data merged_dataset;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; merge dataset1 dataset2;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; by ID;&lt;/DIV&gt;&lt;DIV&gt;run;&lt;/DIV&gt;&lt;DIV&gt;Attaching the LOG for the last merge + SQL check. The result for this was 1181 UniqueIDs and 1229 NObs, while previously I was getting 1229 UniqueIDs and 1229 NObs.&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="LOG for the last merge + SQL check" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/96204iE62F353B410F36DB/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2024-05-06 at 5.48.52 PM.png" alt="LOG for the last merge + SQL check" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;LOG for the last merge + SQL check&lt;/span&gt;&lt;/span&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 06 May 2024 21:53:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927237#M41616</guid>
      <dc:creator>Stanley3</dc:creator>
      <dc:date>2024-05-06T21:53:10Z</dc:date>
    </item>
    <item>
      <title>Re: Select distinct count not working</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927238#M41617</link>
      <description>&lt;P&gt;Show the code for the "final merge" and we might have some ideas.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Checking on "counts" does not in any way guarantee that the same values are present in each step.&lt;/P&gt;</description>
      <pubDate>Mon, 06 May 2024 21:27:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927238#M41617</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-05-06T21:27:03Z</dc:date>
    </item>
    <item>
      <title>Re: Select distinct count not working</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927239#M41618</link>
      <description>Thanks! Sorry, do you mean the merging code or the SQL code?</description>
      <pubDate>Mon, 06 May 2024 21:31:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927239#M41618</guid>
      <dc:creator>Stanley3</dc:creator>
      <dc:date>2024-05-06T21:31:05Z</dc:date>
    </item>
    <item>
      <title>Re: Select distinct count not working</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927240#M41619</link>
      <description>&lt;P&gt;Probably both would be a good idea.&lt;/P&gt;
&lt;P&gt;If you have a LOG from running the code that would be a bit better as well.&lt;/P&gt;</description>
      <pubDate>Mon, 06 May 2024 21:38:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927240#M41619</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-05-06T21:38:19Z</dc:date>
    </item>
    <item>
      <title>Re: Select distinct count not working</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927244#M41621</link>
      <description>Thanks! I edited to add both</description>
      <pubDate>Mon, 06 May 2024 21:53:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927244#M41621</guid>
      <dc:creator>Stanley3</dc:creator>
      <dc:date>2024-05-06T21:53:40Z</dc:date>
    </item>
    <item>
      <title>Re: Select distinct count not working</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927245#M41622</link>
      <description>&lt;P&gt;Are you intending to do a one-to-one merge, i.e. have unique values for ID in both datasets?&amp;nbsp; If so, the easiest way to check for duplicates is to add a logical assertion your MERGE step, e.g.:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data merged_dataset; 
      merge dataset1 dataset2;
      by ID;
      if NOT (first.ID and last.ID) then put "ERROR: found a duplicate! " ID= ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Of course if you have a lot of duplicates, that will flood your log with errors.&amp;nbsp; I wrote a little %ASSERT macro that has a limit on the number of errors it throws.&amp;nbsp; You would use it like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data merged_dataset; 
      merge dataset1 dataset2;
      by ID;
      %assert(first.ID and last.ID) 
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;%Assert is in this paper &lt;A href="https://www.lexjansen.com/nesug/nesug12/cc/cc31.pdf" target="_blank"&gt;https://www.lexjansen.com/nesug/nesug12/cc/cc31.pdf&lt;/A&gt; , and there is a %DupCk&amp;nbsp; macro&amp;nbsp; in related paper (&lt;A href="https://www.lexjansen.com/sesug/2022/SESUG2022_Paper_187_Final_PDF.pdf" target="_blank"&gt;https://www.lexjansen.com/sesug/2022/SESUG2022_Paper_187_Final_PDF.pdf&lt;/A&gt;), which I use for checking a dataset for duplicates.&amp;nbsp; So if I do a one:many merge, I will often check that the first dataset really is unique, and would code it like:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%dupck(dataset1, by=ID)

data merged_dataset; 
      merge dataset1 dataset2;
      by ID;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 06 May 2024 22:09:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927245#M41622</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2024-05-06T22:09:31Z</dc:date>
    </item>
    <item>
      <title>Re: Select distinct count not working</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927249#M41623</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/465761"&gt;@Stanley3&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Thanks! I edited to add both&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I don't see any result from the Proc sql that indicates there are or are not duplicates.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also don't under stand that comment about "no variables overwritten". From the code shown and the previous summary information of the Maestra_antro3 data set with the same number of observations and variables the only way you don't get anything overwritten is one of these:&lt;/P&gt;
&lt;P&gt;1) Data set Crp has only Folio as a variable&lt;/P&gt;
&lt;P&gt;2) All the variables in Crp other than folio duplicate values already in Maestra_antro3 for that value of Folio and you just can't see the overwriting that did happen.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Suggestion:&lt;/P&gt;
&lt;P&gt;Stop using code like this until you identify the problem step and cause.&lt;/P&gt;
&lt;PRE&gt;Data maestro_antro3;
   set maestro_antro3;
(especially not MERGE)&lt;/PRE&gt;
&lt;P&gt;That completely replaces the source data set and you can't be sure what actually did happen.&lt;/P&gt;
&lt;P&gt;If you make new data sets you can COMPARE them with Proc Compare or other code.&lt;/P&gt;
&lt;P&gt;If you routinely use the same set as the source and the output set you have no idea where errors, like duplicates, creep in.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 May 2024 22:48:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927249#M41623</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-05-06T22:48:57Z</dc:date>
    </item>
    <item>
      <title>Re: Select distinct count not working</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927261#M41625</link>
      <description>&lt;P&gt;You must already have duplicate ID's in at least one of your source tables. You could use below code for investigation.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset1_dups;
  set dataset1;
  by id;
  if not first.id;
run;

data dataset2_dups;
  set dataset2;
  by id;
  if not first.id;
run;

data merged_dataset merged_dups; 
  merge dataset1 dataset2;
  by ID;
  if first.ID and last.ID then output merged_dataset;
  else output merged_dups;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 May 2024 03:13:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Select-distinct-count-not-working/m-p/927261#M41625</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2024-05-07T03:13:27Z</dc:date>
    </item>
  </channel>
</rss>

