<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Removing Duplicate Observations in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33973#M8246</link>
    <description>I agree with Scott: What's so bad about a first-last approach? It's not a lot of code and easy to understand and maintain.&lt;BR /&gt;
&lt;BR /&gt;
Of course: If your data is stored in a database and you want to alter the values in this database table then a SQL approach would be better.&lt;BR /&gt;
&lt;BR /&gt;
DATA Temp;&lt;BR /&gt;
INPUT date data;&lt;BR /&gt;
CARDS;&lt;BR /&gt;
1 5000&lt;BR /&gt;
1 6000&lt;BR /&gt;
1 2100&lt;BR /&gt;
2 1110&lt;BR /&gt;
3 5000&lt;BR /&gt;
3 5000&lt;BR /&gt;
;&lt;BR /&gt;
RUN;&lt;BR /&gt;
&lt;BR /&gt;
proc sql noprint;&lt;BR /&gt;
  create view SortedTemp as&lt;BR /&gt;
    select *&lt;BR /&gt;
      from temp&lt;BR /&gt;
    order by date&lt;BR /&gt;
  ;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
data want;&lt;BR /&gt;
  set temp;&lt;BR /&gt;
  by date;&lt;BR /&gt;
  if not (first.date and last.date) then call missing(data);&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
proc print data=want;&lt;BR /&gt;
run;</description>
    <pubDate>Wed, 23 Dec 2009 15:08:19 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2009-12-23T15:08:19Z</dc:date>
    <item>
      <title>Removing Duplicate Observations</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33969#M8242</link>
      <description>Hi All:&lt;BR /&gt;
&lt;BR /&gt;
I'm trying to get my brain bent around this and I know I could write a mile of code, but I thought someone could find an easier solution.&lt;BR /&gt;
&lt;BR /&gt;
I have a ton of records (nearly 2 million) that are sequenced by date.  For some dates there is a single record, for some there are 2 records.    There are never more than 2 for any date.&lt;BR /&gt;
&lt;BR /&gt;
When the 2 values for a selected date are the same I want to set the value for both to missing ("."), but I want to keep the records in the table.  I know I could sort and use "first." and "last.", or use a "lag", but I'm thinkin' there is a more elegant solution. Most solutions I come up with only get rid of the duplicate but I need to get rid of them both.&lt;BR /&gt;
&lt;BR /&gt;
Thanks in advance.</description>
      <pubDate>Tue, 22 Dec 2009 19:57:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33969#M8242</guid>
      <dc:creator>OS2Rules</dc:creator>
      <dc:date>2009-12-22T19:57:03Z</dc:date>
    </item>
    <item>
      <title>Re: Removing Duplicate Observations</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33970#M8243</link>
      <description>One way would be to run proc summary against the data set (http://support.sas.com/kb/37/581.html) to find the duplicates.  Merge back to the original file to update the duplicate records to missing.</description>
      <pubDate>Tue, 22 Dec 2009 20:41:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33970#M8243</guid>
      <dc:creator>Bill</dc:creator>
      <dc:date>2009-12-22T20:41:01Z</dc:date>
    </item>
    <item>
      <title>Re: Removing Duplicate Observations</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33971#M8244</link>
      <description>Elegant?  After your PROC SORT step, you will have less than 10 SAS statements for the DATA step executino.  &lt;BR /&gt;
&lt;BR /&gt;
So, one DATA step pass and test for FIRST.&lt;VARNAME&gt; AND LAST.&lt;VARNAME&gt;  -- if it's not the condition, then set your &lt;VARNAME&gt; to missing values until you get to LAST.&lt;VARNAME&gt;.&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.&lt;/VARNAME&gt;&lt;/VARNAME&gt;&lt;/VARNAME&gt;&lt;/VARNAME&gt;</description>
      <pubDate>Tue, 22 Dec 2009 22:04:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33971#M8244</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2009-12-22T22:04:26Z</dc:date>
    </item>
    <item>
      <title>Re: Removing Duplicate Observations</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33972#M8245</link>
      <description>Maybe this example might help, but take into account that you have a ton of records.&lt;BR /&gt;
Program finds duplicates, keep them in the table and sets values to missing.&lt;BR /&gt;
&lt;BR /&gt;
DATA Temp;&lt;BR /&gt;
INPUT date data;&lt;BR /&gt;
CARDS;&lt;BR /&gt;
1 5000&lt;BR /&gt;
1 6000&lt;BR /&gt;
1 2100&lt;BR /&gt;
2 1110&lt;BR /&gt;
2 5000&lt;BR /&gt;
2 5000&lt;BR /&gt;
3 5000&lt;BR /&gt;
;&lt;BR /&gt;
RUN;&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
 create table temp1&lt;BR /&gt;
 as&lt;BR /&gt;
 select a.date, &lt;BR /&gt;
 		a.data,&lt;BR /&gt;
        (CASE&lt;BR /&gt;
		  		when b.date is not null then .&lt;BR /&gt;
				else a.data&lt;BR /&gt;
 		END) as new_data_value&lt;BR /&gt;
 from temp a&lt;BR /&gt;
 left outer join (&lt;BR /&gt;
 					select date&lt;BR /&gt;
 					from temp &lt;BR /&gt;
 					group by date&lt;BR /&gt;
 					having count(*)&amp;gt;1) b on a.date=b.date&lt;BR /&gt;
 ;&lt;BR /&gt;
quit;</description>
      <pubDate>Wed, 23 Dec 2009 07:58:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33972#M8245</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-12-23T07:58:59Z</dc:date>
    </item>
    <item>
      <title>Re: Removing Duplicate Observations</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33973#M8246</link>
      <description>I agree with Scott: What's so bad about a first-last approach? It's not a lot of code and easy to understand and maintain.&lt;BR /&gt;
&lt;BR /&gt;
Of course: If your data is stored in a database and you want to alter the values in this database table then a SQL approach would be better.&lt;BR /&gt;
&lt;BR /&gt;
DATA Temp;&lt;BR /&gt;
INPUT date data;&lt;BR /&gt;
CARDS;&lt;BR /&gt;
1 5000&lt;BR /&gt;
1 6000&lt;BR /&gt;
1 2100&lt;BR /&gt;
2 1110&lt;BR /&gt;
3 5000&lt;BR /&gt;
3 5000&lt;BR /&gt;
;&lt;BR /&gt;
RUN;&lt;BR /&gt;
&lt;BR /&gt;
proc sql noprint;&lt;BR /&gt;
  create view SortedTemp as&lt;BR /&gt;
    select *&lt;BR /&gt;
      from temp&lt;BR /&gt;
    order by date&lt;BR /&gt;
  ;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
data want;&lt;BR /&gt;
  set temp;&lt;BR /&gt;
  by date;&lt;BR /&gt;
  if not (first.date and last.date) then call missing(data);&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
proc print data=want;&lt;BR /&gt;
run;</description>
      <pubDate>Wed, 23 Dec 2009 15:08:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Removing-Duplicate-Observations/m-p/33973#M8246</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2009-12-23T15:08:19Z</dc:date>
    </item>
  </channel>
</rss>

