<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Identify duplicate records in a dataset without specifying all variables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768327#M243699</link>
    <description>Unfortunately there is no equivalent to SQL. In general, SQL doesn't work well with shortcut references or formats. &lt;BR /&gt;&lt;BR /&gt;You could create a macro variable list from sashelp but _all_ is faster and more dynamic. &lt;BR /&gt;</description>
    <pubDate>Fri, 17 Sep 2021 15:57:49 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2021-09-17T15:57:49Z</dc:date>
    <item>
      <title>Identify duplicate records in a dataset without specifying all variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768319#M243692</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a number of datasets with different variables; I need to return the duplicate records for each dataset (if they exist).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'd like to do this by not having to specify all the variables (so I can wrap it in a macro which will go through all the datasets).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My example below: in the dataset TEST, Nick's row is entered twice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can use PROC SORT nouniquekeys to identify that, but in the BY&amp;nbsp; I need to specify&amp;nbsp;all the variables ( Name Age Car).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I cannot find how to use, say, *, without having to list all the variables?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data TEST;                                                                                                                            
   input Name $ Age Car $;                                                                                                                 
   datalines;                                                                                                                           
Mike 40 Volvo
Nick 35 Nissan
Susan 51 BMW
Bill 60 Volvo
Tom 35 Ford
Nick 35 Nissan
Nadia 49 Nissan
;                                                                                                                                       
run;                    

proc sort data=WORK.TEST nouniquekeys out=duplicates;
    by  Name Age Car;
run;

&lt;/PRE&gt;
&lt;P&gt;many thanks&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Sep 2021 15:24:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768319#M243692</guid>
      <dc:creator>MART1</dc:creator>
      <dc:date>2021-09-17T15:24:29Z</dc:date>
    </item>
    <item>
      <title>Re: Identify duplicate records in a dataset without specifying all variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768320#M243693</link>
      <description>&lt;P&gt;Use the keyword _ALL_ instead which is a shortcut for all variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=WORK.TEST nouniquekeys out=duplicates;
    by  _all_;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Here is a reference that illustrates how to refer to variables and datasets in a short cut list:&lt;BR /&gt;&lt;A href="https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/321375"&gt;@MART1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a number of datasets with different variables; I need to return the duplicate records for each dataset (if they exist).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'd like to do this by not having to specify all the variables (so I can wrap it in a macro which will go through all the datasets).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My example below: in the dataset TEST, Nick's row is entered twice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can use PROC SORT nouniquekeys to identify that, but in the BY&amp;nbsp; I need to specify&amp;nbsp;all the variables ( Name Age Car).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I cannot find how to use, say, *, without having to list all the variables?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data TEST;                                                                                                                            
   input Name $ Age Car $;                                                                                                                 
   datalines;                                                                                                                           
Mike 40 Volvo
Nick 35 Nissan
Susan 51 BMW
Bill 60 Volvo
Tom 35 Ford
Nick 35 Nissan
Nadia 49 Nissan
;                                                                                                                                       
run;                    

proc sort data=WORK.TEST nouniquekeys out=duplicates;
    by  Name Age Car;
run;

&lt;/PRE&gt;
&lt;P&gt;many thanks&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Sep 2021 15:29:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768320#M243693</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-09-17T15:29:04Z</dc:date>
    </item>
    <item>
      <title>Re: Identify duplicate records in a dataset without specifying all variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768323#M243696</link>
      <description>&lt;P&gt;That's great thanks so much&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Very interesting paper&amp;nbsp; - just one question (not strictly related to the topic I appreciate) ; the _ALL_ cannot be used in PROC SQL, for example&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;PROC SQL;
	CREATE TABLE DUPS AS 
		SELECT COUNT(DISTINCT _ALL_) FROM TEST;
QUIT;&lt;/PRE&gt;
&lt;P&gt;Do you know if there's an _ALL_ equivalent for SQL? (* can be used, but not with DISTINCT, or in a GROUP BY)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Sep 2021 15:48:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768323#M243696</guid>
      <dc:creator>MART1</dc:creator>
      <dc:date>2021-09-17T15:48:24Z</dc:date>
    </item>
    <item>
      <title>Re: Identify duplicate records in a dataset without specifying all variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768327#M243699</link>
      <description>Unfortunately there is no equivalent to SQL. In general, SQL doesn't work well with shortcut references or formats. &lt;BR /&gt;&lt;BR /&gt;You could create a macro variable list from sashelp but _all_ is faster and more dynamic. &lt;BR /&gt;</description>
      <pubDate>Fri, 17 Sep 2021 15:57:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768327#M243699</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-09-17T15:57:49Z</dc:date>
    </item>
    <item>
      <title>Re: Identify duplicate records in a dataset without specifying all variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768332#M243701</link>
      <description>&lt;P&gt;The SQL equivalent of _ALL_ is *.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So this code will reduce HAVE from 19 observations to 11 distinct observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  set sashelp.class;
  keep age sex ;
run;

proc sql;
create table distinct as select distinct * from have ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 17 Sep 2021 16:40:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768332#M243701</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-09-17T16:40:39Z</dc:date>
    </item>
    <item>
      <title>Re: Identify duplicate records in a dataset without specifying all variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768333#M243702</link>
      <description>&lt;PRE&gt; 69         proc sql;
 70         create table test as select count(distinct *) from sashelp.class ;
 ERROR: * used in an illegal position.
 NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 71         quit;&lt;/PRE&gt;
&lt;P&gt;Cannot be used for counts, but could be used to de-duplicate&lt;/P&gt;</description>
      <pubDate>Fri, 17 Sep 2021 16:43:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768333#M243702</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-09-17T16:43:12Z</dc:date>
    </item>
    <item>
      <title>Re: Identify duplicate records in a dataset without specifying all variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768341#M243710</link>
      <description>&lt;P&gt;But that is just the same invalid syntax you sent before.&lt;/P&gt;
&lt;P&gt;If you want to count the number of distinct records use:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;select count(*) as number_distinct from (select distinct * from have);&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 17 Sep 2021 17:54:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Identify-duplicate-records-in-a-dataset-without-specifying-all/m-p/768341#M243710</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-09-17T17:54:04Z</dc:date>
    </item>
  </channel>
</rss>

