<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Dropping duplicate rows in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712027#M219432</link>
    <description>&lt;P&gt;I'm glad &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; please remember to close the thread.&lt;/P&gt;</description>
    <pubDate>Mon, 18 Jan 2021 06:14:19 GMT</pubDate>
    <dc:creator>PeterClemmensen</dc:creator>
    <dc:date>2021-01-18T06:14:19Z</dc:date>
    <item>
      <title>Dropping duplicate rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712022#M219428</link>
      <description>&lt;P&gt;I am trying to do Exploratory Data Analysis with SAS by following the steps laid out in the following article.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Article: &lt;A href="https://towardsdatascience.com/exploratory-data-analysis-in-python-c9a77dfa39ce" target="_blank" rel="noopener"&gt;https://towardsdatascience.com/exploratory-data-analysis-in-python-c9a77dfa39ce&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dataset:&amp;nbsp;&lt;A href="https://www.kaggle.com/CooperUnion/cardataset" target="_blank" rel="noopener"&gt;https://www.kaggle.com/CooperUnion/cardataset&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dropping &lt;STRIKE&gt;null values &lt;/STRIKE&gt;&amp;nbsp;duplicate rows with .drop_duplicates() in Python drops a total of 989 rows, while dropping null values using NODUP, NODUPKEY or NODUPREC leaves substantially less rows (around 300~400) rows.&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SORT DATA = PRACTICE.CARS NODUPKEY;
BY ENGINE_HP ENGINE_CYLINDERS;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I'd very much appreciate some pointers on how to drop duplicates correctly.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;EDIT: I meant dropping duplicate rows&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2021 06:14:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712022#M219428</guid>
      <dc:creator>danielchoi626</dc:creator>
      <dc:date>2021-01-18T06:14:45Z</dc:date>
    </item>
    <item>
      <title>Re: Dropping duplicate rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712025#M219430</link>
      <description>&lt;P&gt;Dropping null (I take it you mean missing) values and dropping duplicates are two very different things.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I take it that you want to remove duplicate observations from your data set. I have no idea about &lt;SPAN&gt;.drop_duplicates() in Python. However, I have a feeling that you want to remove observations where the entire observation is duplicate and not just the values of ENGINE_HP and ENGINE_CYLINDERS. Try using the _ALL_ keyword in the By Statement.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You can see the difference in the small example below.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input x y;
datalines;
1 2
1 2
1 3
2 4
2 4
2 5
;

proc sort data=have nodupkey;
   by x;       /* 4 obs */
  *by _ALL_;   /* 2 obs */
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2021 06:01:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712025#M219430</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2021-01-18T06:01:24Z</dc:date>
    </item>
    <item>
      <title>Re: Dropping duplicate rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712026#M219431</link>
      <description>&lt;P&gt;Whoops, that was a typo. But yes, I was looking for a way to drop duplicates for the entire dataset and replicate the effects of .drop_duplicates() from Python in SAS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your assistance! Your answer was exactly what I was looking for.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2021 06:12:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712026#M219431</guid>
      <dc:creator>danielchoi626</dc:creator>
      <dc:date>2021-01-18T06:12:45Z</dc:date>
    </item>
    <item>
      <title>Re: Dropping duplicate rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712027#M219432</link>
      <description>&lt;P&gt;I'm glad &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; please remember to close the thread.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2021 06:14:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dropping-duplicate-rows/m-p/712027#M219432</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2021-01-18T06:14:19Z</dc:date>
    </item>
  </channel>
</rss>

