<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Find rows with duplicates in one column but not another? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695342#M212177</link>
    <description>&lt;P&gt;You COULD try a cartesian join in SQL with an appropriate WHERE clause.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
   infile datalines dsd;
   ID+1;
   informat address $25.  city $10. state $2. zip;
   input address  city  state  zip;
datalines;
550 Jack Horner, Rhymes, NJ, 06430
123 My Road, Omaha,  NE, 23143
123 My Road, Auburn, NE, 23143
123 My Road, Omaha,  NE, 23144
123 My Road, Auburn, NE, 23144
123 My Road, Auburn, ME, 04210
311 Main St, Big Sur, CA, 90201
;

proc sql;
title "Complete table";
select * 
   from have
;
title "Problem Rows";
select * 
   from have as a
      , have as b
   where    a.zip=b.zip 
           and a.address=b.address 
           and   a.State=b.State 
           and   a.City &amp;lt;&amp;gt; b.City
;
quit;
title;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Thu, 29 Oct 2020 20:01:49 GMT</pubDate>
    <dc:creator>SASJedi</dc:creator>
    <dc:date>2020-10-29T20:01:49Z</dc:date>
    <item>
      <title>Find rows with duplicates in one column but not another?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695316#M212167</link>
      <description>&lt;P&gt;Hello SAS community!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm kind of stuck here so would appreciate any and all advice. I've got a dataset with a list of building locations including ID, address, city, state, and zip. Is there a way to pull out only rows where it's the same address but different city, or rows with the same zip but different state? These are data quality errors that I'd like to identify.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2020 19:19:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695316#M212167</guid>
      <dc:creator>adventuresinsas</dc:creator>
      <dc:date>2020-10-29T19:19:06Z</dc:date>
    </item>
    <item>
      <title>Re: Find rows with duplicates in one column but not another?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695342#M212177</link>
      <description>&lt;P&gt;You COULD try a cartesian join in SQL with an appropriate WHERE clause.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
   infile datalines dsd;
   ID+1;
   informat address $25.  city $10. state $2. zip;
   input address  city  state  zip;
datalines;
550 Jack Horner, Rhymes, NJ, 06430
123 My Road, Omaha,  NE, 23143
123 My Road, Auburn, NE, 23143
123 My Road, Omaha,  NE, 23144
123 My Road, Auburn, NE, 23144
123 My Road, Auburn, ME, 04210
311 Main St, Big Sur, CA, 90201
;

proc sql;
title "Complete table";
select * 
   from have
;
title "Problem Rows";
select * 
   from have as a
      , have as b
   where    a.zip=b.zip 
           and a.address=b.address 
           and   a.State=b.State 
           and   a.City &amp;lt;&amp;gt; b.City
;
quit;
title;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 29 Oct 2020 20:01:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695342#M212177</guid>
      <dc:creator>SASJedi</dc:creator>
      <dc:date>2020-10-29T20:01:49Z</dc:date>
    </item>
    <item>
      <title>Re: Find rows with duplicates in one column but not another?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695346#M212179</link>
      <description>&lt;P&gt;If the idea is to check for data quality you might consider some of the ZIP related functions.&lt;/P&gt;
&lt;P&gt;Such as ZIPSTATE, ZIPNAME or ZIPNAMEL (depending on the way you have state in the data).&lt;/P&gt;
&lt;P&gt;If you have 2 letter state codes you could do something like:&lt;/P&gt;
&lt;PRE&gt;Data zipstateprob;
   set have;
   if zipname(zipcode) ne Statecode then output;
end;&lt;/PRE&gt;
&lt;P&gt;to get a list where the zip appears to disagree with the state.&lt;/P&gt;
&lt;P&gt;There is also a ZIPCITY that returns the name of a city and the 2 letter state code.&lt;/P&gt;
&lt;P&gt;These do find "mismatches" for some areas because of local spelliing, like NY, NY or ABQ, NM or CdA, ID&lt;/P&gt;
&lt;P&gt;but may be helpful.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm not sure I would start with "same address, different city" as there are just too many streets with common numbers. Quick, how far are you from a street with a name like Washington, Jefferson, Franklin, Lincoln, or other president or state names? Not to mention "Main St" or "Frontage Rd".&lt;/P&gt;</description>
      <pubDate>Thu, 29 Oct 2020 20:21:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695346#M212179</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-10-29T20:21:06Z</dc:date>
    </item>
    <item>
      <title>Re: Find rows with duplicates in one column but not another?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695445#M212227</link>
      <description>Code Not Tested:&lt;BR /&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table want as&lt;BR /&gt;select * from have&lt;BR /&gt; group by address&lt;BR /&gt;  having count(distinct city) &amp;gt; 1;&lt;BR /&gt;quit;</description>
      <pubDate>Fri, 30 Oct 2020 12:22:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-rows-with-duplicates-in-one-column-but-not-another/m-p/695445#M212227</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-10-30T12:22:37Z</dc:date>
    </item>
  </channel>
</rss>

