<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Check for duplicates based on changing, multiple criteria in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787579#M251671</link>
    <description>&lt;P&gt;OK, moving in that direction ... the form of the program you need isn't 100% clear.&amp;nbsp; Here's something similar that will probably give you some ideas about how to proceed.&amp;nbsp; Feel free to come back and ask questions for more detail.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Macro language can get you a set of tables.&amp;nbsp; It types out the statements for you, instead of making you do it.&amp;nbsp; For example, here are 100 PROC FREQ tables:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=have noprint;
 
%macro tables40;
   %local i j;
   %do i=1 %to 40;
      tables var1
      %do j=2 %to &amp;amp;i;
         * var&amp;amp;j
      %end;
      / out=freq&amp;amp;i (drop=percent where=(count &amp;gt; 1));
   %end;
%mend tables40;

%tables40
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It's untested code, but looks about right.&amp;nbsp; There could be memory problems with 40 variables going into the same table, so we may have to cross bridges as we come to them.&lt;/P&gt;</description>
    <pubDate>Wed, 29 Dec 2021 04:54:22 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2021-12-29T04:54:22Z</dc:date>
    <item>
      <title>Check for duplicates based on changing, multiple criteria</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787550#M251659</link>
      <description>&lt;P&gt;I have a very large program with approximately 40 variables. For the sake of argument, let's call them var1-var40.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to know how many duplicates there are for var1, var1 and var2, var1 var2 and var3, var1 var 2 var3 and var4, ........., var1 var2 var3 ...... var 40.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a way to create this array without having to manually type it, and then recursively run it through a proc sort or proc SQL command so that I can see how many duplicates there are side by side for each of the combinations above.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Currently, I am running 40 proc SQL statements, following by 40 proc freq commands on the individual outputs and manually checking them against each other.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I feel there must be a way to incorporate an array with each variable combination into a do loop that automatically generates and stores the number of detected duplicates.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Bonus points if the above code could show me which observations are dropping at each point as the combinations increase in variable size.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;Zach&lt;/P&gt;</description>
      <pubDate>Tue, 28 Dec 2021 20:11:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787550#M251659</guid>
      <dc:creator>ZachLandone</dc:creator>
      <dc:date>2021-12-28T20:11:12Z</dc:date>
    </item>
    <item>
      <title>Re: Check for duplicates based on changing, multiple criteria</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787558#M251661</link>
      <description>What about other combinations such as var2 and var4, or var5 and var10 and var15?  You may not care about those but if you do, you may not live long enough to inspect all the output.  There would be over a trillion combinations.</description>
      <pubDate>Tue, 28 Dec 2021 22:38:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787558#M251661</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2021-12-28T22:38:30Z</dc:date>
    </item>
    <item>
      <title>Re: Check for duplicates based on changing, multiple criteria</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787571#M251667</link>
      <description>&lt;P&gt;Yeah, I'm not interested in all possible combinations. Just the 40 combinations that describe that pattern.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Dec 2021 01:38:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787571#M251667</guid>
      <dc:creator>ZachLandone</dc:creator>
      <dc:date>2021-12-29T01:38:03Z</dc:date>
    </item>
    <item>
      <title>Re: Check for duplicates based on changing, multiple criteria</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787579#M251671</link>
      <description>&lt;P&gt;OK, moving in that direction ... the form of the program you need isn't 100% clear.&amp;nbsp; Here's something similar that will probably give you some ideas about how to proceed.&amp;nbsp; Feel free to come back and ask questions for more detail.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Macro language can get you a set of tables.&amp;nbsp; It types out the statements for you, instead of making you do it.&amp;nbsp; For example, here are 100 PROC FREQ tables:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=have noprint;
 
%macro tables40;
   %local i j;
   %do i=1 %to 40;
      tables var1
      %do j=2 %to &amp;amp;i;
         * var&amp;amp;j
      %end;
      / out=freq&amp;amp;i (drop=percent where=(count &amp;gt; 1));
   %end;
%mend tables40;

%tables40
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It's untested code, but looks about right.&amp;nbsp; There could be memory problems with 40 variables going into the same table, so we may have to cross bridges as we come to them.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Dec 2021 04:54:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787579#M251671</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2021-12-29T04:54:22Z</dc:date>
    </item>
    <item>
      <title>Re: Check for duplicates based on changing, multiple criteria</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787587#M251675</link>
      <description>&lt;P&gt;Not clear what you expect as result or what the existing code actually does, so please post data in usable form, so that we see what you have. Also add the expected result for that data and the code (just one proc sql + freq) you have.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Dec 2021 07:19:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787587#M251675</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2021-12-29T07:19:20Z</dc:date>
    </item>
    <item>
      <title>Re: Check for duplicates based on changing, multiple criteria</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787607#M251682</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
 set sashelp.heart;
run;




data _null_;
 set sashelp.vcolumn(keep=libname memname name where=(libname='WORK' and memname='HAVE'));
 length list_vars $ 4000;
 retain list_vars;
 list_vars=catx(' ',list_vars,name); putlog list_vars= ;
 call execute(cat('proc sort data=have out=dummy dupout=_dup_',_n_,' nodupkey; by ',list_vars,';run;'));
 call execute(catt('data _dup_',_n_,'; set _dup_',_n_,';length from_vars $ 2000; from_vars="',list_vars,'";run;'));
run;
data all ;
 set _dup_:;
run;
proc sql;
create table want as
select from_vars,count(*) as n_duplicate
 from all
  group by from_vars;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 29 Dec 2021 12:38:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787607#M251682</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2021-12-29T12:38:22Z</dc:date>
    </item>
    <item>
      <title>Re: Check for duplicates based on changing, multiple criteria</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787622#M251691</link>
      <description>&lt;P&gt;OK, your objective is becoming clearer.&amp;nbsp; Let's start with sorting the data set once instead of 40 times:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have;
   by var1-var40;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;That step lets you use any of these BY statements later:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;by var1;

by var11 var2;

by var1 var2 var3;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;How you proceed after sorting ... well I'll show the simpler variation that outputs all the duplicates.&amp;nbsp; Counting the duplicates is not much more difficult but let's start with subsetting:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dup_01 dup_02 dup03;
   set have;
   by var1-var3;
   if first.var1=0 or last.var1=0 then output dup_01;
   if first.var2=0 or last.var2=0 then output dup_02;
   if first.var3=0 or last.var3=0 then output dup_03;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;And you get all this processing your data set twice, not 80 times.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Dec 2021 14:57:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787622#M251691</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2021-12-29T14:57:34Z</dc:date>
    </item>
    <item>
      <title>Re: Check for duplicates based on changing, multiple criteria</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787630#M251697</link>
      <description>Works like a charm! Thanks</description>
      <pubDate>Wed, 29 Dec 2021 16:06:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-for-duplicates-based-on-changing-multiple-criteria/m-p/787630#M251697</guid>
      <dc:creator>ZachLandone</dc:creator>
      <dc:date>2021-12-29T16:06:51Z</dc:date>
    </item>
  </channel>
</rss>

