<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows in Programming 1 and 2</title>
    <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715312#M697</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/361239"&gt;@ncd&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Dear Cynthia, thank you so much.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have rerun the file that makes that data now I have 153 obs in the raw file. Somehow the file after duplicates are deleted was overwritten on the original file when I was working on it or some other glitch occurred. Now I set it up from the beginning and the original file has 153 obs. Thanks for your help.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Proc Sort when you do not use the OUT= option &lt;STRONG&gt;replaces&lt;/STRONG&gt; the data set used.&lt;/P&gt;
&lt;P&gt;It is quite typical for people to use&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Proc sort data=somedataset;
   by thisvar thatvar;
run;&lt;/PRE&gt;
&lt;P&gt;Which sorts in place, i.e. replaces the original set with one sorted.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But if you use&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Proc sort data=somedataset nodupkey;
   by thisvar thatvar;
run;&lt;/PRE&gt;
&lt;P&gt;Then it replaces the data set with one sorted and with the duplicates removed.&lt;/P&gt;
&lt;P&gt;This is the designed behavior and not a "glitch".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You would not be the first person to unintentionally delete records. Ask me how I know &lt;span class="lia-unicode-emoji" title=":flushed_face:"&gt;😳&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 29 Jan 2021 15:13:10 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2021-01-29T15:13:10Z</dc:date>
    <item>
      <title>SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715119#M688</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to do the practice on&amp;nbsp;Dashboard/&amp;nbsp;&lt;SPAN style="font-family: inherit;"&gt;My courses/&amp;nbsp;&lt;/SPAN&gt;SAS Programming 1: Essentials/&amp;nbsp;Lessons/&amp;nbsp;Lesson 3: Exploring and Validating Data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The code I wrote is:&lt;/P&gt;&lt;P&gt;proc sort data=PG1.np_largeparks nodupkey out=park_clean dupout=park_dups;&lt;BR /&gt;by _all_;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;and the code solution says:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;proc sort data=pg1.np_largeparks
		  out=park_clean
		  dupout=park_dups
		  nodupkey;
    by _all_;
run;&lt;/PRE&gt;&lt;DIV class="ml-auto d-flex"&gt;Unfortunately, neither of them works. I pasted the log below. Cant figure why there appears 0 observations. The solution says there must be 30 duplicates.&lt;/DIV&gt;&lt;DIV class="ml-auto d-flex"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2021-01-29 at 00.14.53.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/54048iD43F484FF89C95CD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screen Shot 2021-01-29 at 00.14.53.png" alt="Screen Shot 2021-01-29 at 00.14.53.png" /&gt;&lt;/span&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Cagri&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 28 Jan 2021 21:19:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715119#M688</guid>
      <dc:creator>ncd</dc:creator>
      <dc:date>2021-01-28T21:19:25Z</dc:date>
    </item>
    <item>
      <title>Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715127#M689</link>
      <description>&lt;P&gt;How many records were in PG1.np_largeparks when it was created at the set up of the training data sets?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I might suspect an earlier Proc sort without the OUT= that sorted the data set in place (see the note about the data is already sorted in the log?) and deleted the records already. So there is nothing to remove now.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jan 2021 21:43:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715127#M689</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-01-28T21:43:26Z</dc:date>
    </item>
    <item>
      <title>Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715141#M690</link>
      <description>&lt;P&gt;Hi:&lt;BR /&gt;If you want to restore the data back to the start point of class, all you need to do is rerun the program that makes the data. If you rerun the program (as you did when you initially set up the data), the class files will be refreshed. &lt;BR /&gt;As you can see from my LOG, below:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Cynthia_sas_0-1611872907124.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/54050iB844FCB7D00168B7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Cynthia_sas_0-1611872907124.png" alt="Cynthia_sas_0-1611872907124.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;after I make the data for class, you should start with 153 rows in PG1.NP_LARGEPARKS with 30 duplicate rows. So it appears that you've already deleted the dups from the LARGEPARKS data table.&lt;BR /&gt;Cynthia&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jan 2021 22:29:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715141#M690</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2021-01-28T22:29:04Z</dc:date>
    </item>
    <item>
      <title>Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715269#M694</link>
      <description>Interestingly enough, there are 123 obs from the beginning. Somehow the file after duplicates are deleted was overwritten on the original file. Now I set it up from the beginning and the original file has 153 obs. Thanks for the quick reply.</description>
      <pubDate>Fri, 29 Jan 2021 13:39:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715269#M694</guid>
      <dc:creator>ncd</dc:creator>
      <dc:date>2021-01-29T13:39:02Z</dc:date>
    </item>
    <item>
      <title>Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715270#M695</link>
      <description>&lt;P&gt;Dear Cynthia, thank you so much.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have rerun the file that makes that data now I have 153 obs in the raw file. Somehow the file after duplicates are deleted was overwritten on the original file when I was working on it or some other glitch occurred. Now I set it up from the beginning and the original file has 153 obs. Thanks for your help.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2021 13:40:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715270#M695</guid>
      <dc:creator>ncd</dc:creator>
      <dc:date>2021-01-29T13:40:48Z</dc:date>
    </item>
    <item>
      <title>Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715312#M697</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/361239"&gt;@ncd&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Dear Cynthia, thank you so much.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have rerun the file that makes that data now I have 153 obs in the raw file. Somehow the file after duplicates are deleted was overwritten on the original file when I was working on it or some other glitch occurred. Now I set it up from the beginning and the original file has 153 obs. Thanks for your help.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Proc Sort when you do not use the OUT= option &lt;STRONG&gt;replaces&lt;/STRONG&gt; the data set used.&lt;/P&gt;
&lt;P&gt;It is quite typical for people to use&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Proc sort data=somedataset;
   by thisvar thatvar;
run;&lt;/PRE&gt;
&lt;P&gt;Which sorts in place, i.e. replaces the original set with one sorted.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But if you use&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Proc sort data=somedataset nodupkey;
   by thisvar thatvar;
run;&lt;/PRE&gt;
&lt;P&gt;Then it replaces the data set with one sorted and with the duplicates removed.&lt;/P&gt;
&lt;P&gt;This is the designed behavior and not a "glitch".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You would not be the first person to unintentionally delete records. Ask me how I know &lt;span class="lia-unicode-emoji" title=":flushed_face:"&gt;😳&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2021 15:13:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/715312#M697</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-01-29T15:13:10Z</dc:date>
    </item>
    <item>
      <title>Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/887332#M1406</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let my ask another question in this regard. Why there was neither error nor the discrepancy in the output data when I put "nodupkey" prior to dupout=park.dups:?&lt;/P&gt;
&lt;P&gt;How can I understand when the commands order is strict and when I can be "creative"?&lt;/P&gt;
&lt;PRE&gt;1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         
 74         proc sort data=pg1.np_largeparks out=park_clean
 75         nodupkey dupout=park_dups;
 76         by _all_;
 77         run;
 
 NOTE: There were 153 observations read from the data set PG1.NP_LARGEPARKS.
 NOTE: 30 observations with duplicate key values were deleted.
 NOTE: The data set WORK.PARK_CLEAN has 123 observations and 5 variables.
 NOTE: The data set WORK.PARK_DUPS has 30 observations and 5 variables.
 NOTE: PROCEDURE SORT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.01 seconds&lt;/PRE&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV id="sasLogNote4_1690899918424" class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;
&lt;PRE id="pre_sasLog_65605" class="sasLog"&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2023 14:57:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/887332#M1406</guid>
      <dc:creator>SASRB</dc:creator>
      <dc:date>2023-08-01T14:57:13Z</dc:date>
    </item>
    <item>
      <title>Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/887345#M1407</link>
      <description>&lt;P&gt;Hi:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; We recommend that you refer to the documentation to find whether an option for a procedure is required to be specified a certain way. Here are 3 different invocations of PROC SORT. Note that all 3 invocations work, even if the options like DATA=, OUT=, DUPOUT= and NODUPKEY are listed in a different order each time:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Cynthia_sas_0-1690908449508.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/86335i19D340E5F938FDD5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Cynthia_sas_0-1690908449508.png" alt="Cynthia_sas_0-1690908449508.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; Generally, after the keyword PROC you must list the procedure name and then usually other options can be specified in any order. As a best practice, I always use the DATA= option and the OUT= option first, when I code my PROC SORT, but even DATA= is optional because if you don't have it, then SAS uses the value of the automatic variable _LAST_.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cynthia&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2023 17:02:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/887345#M1407</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2023-08-01T17:02:42Z</dc:date>
    </item>
    <item>
      <title>Re: SCYP Training: Level 2 Practice: Sorting Data to Remove Duplicate Rows</title>
      <link>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/887649#M1408</link>
      <description>Thank you for your clarification.&lt;BR /&gt;It's good to know about the possibility to  get the same outcome in a slightly different ways.</description>
      <pubDate>Thu, 03 Aug 2023 10:30:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Programming-1-and-2/SCYP-Training-Level-2-Practice-Sorting-Data-to-Remove-Duplicate/m-p/887649#M1408</guid>
      <dc:creator>SASRB</dc:creator>
      <dc:date>2023-08-03T10:30:51Z</dc:date>
    </item>
  </channel>
</rss>

