Macro-loop to look for duplicates in a dataset based by costumer and d...

melhaf · Posted 05-31-2024 04:49 AM

Hi,

I try to generate a code that looks for duplicates in a dataset based by costumer and dates, here is a random test-data I created:

data testdata;
    do year = 2014 to 2023;

        date = input(cats(put(year, 4.), '1231'), yymmdd8.);

        do i = 1 to 25;
            customer = cats('Customer', (year - 2014) * 25 + i);
            output;
            if i <= 10 then do;
                customer = cats('Customer', (year - 2014) * 25 + i);
                output;
            end;
        end;
    end;
    format date yymmddn8.;
    drop i year;
run;

But my orginal data have 300,000 obs,

and I want to look for duplicates between year=20201231,20211231,20221231 and 20231231, only. I tried this one:.. But it takes ages since it loop around for every costumer. I know fo sure that it is easy to just use nodupkey (can't remember) but I want it to be a macro-loop.

macro check_duplicates(date_list, dataset);

    %do i = 1 %to %sysfunc(countw(&date_list));
        %let date = %scan(&date_list, &i);
 
        %if %sysfunc(inputn(&date, yymmdd8.)) ne . %then %do;
 
            data _duplicates_;
                set &dataset;
                where date_id = &date;
                by date_id;
                retain count;
                if first.date_id then count = 0;
                count + 1;
                if count > 1 then do;
                    put coustumer;
                    put date_id;
                    output; /* Output observation if duplicate */
                end;
                if last then put "No dup found for &date.";
            run;
 
            proc print data=_duplicates_ noobs;
                title "Dup found for date &date";
            run;
 
        %end;
        %else %put No date found for &date.;
 
    %end;
%mend;
 
%check_duplicates(20201231 20221231 202331, lib.dataset);

JB1_DK · Posted 05-31-2024 05:29 AM

Here is a suggestion, but not sure what the point is of the dataset _duplicates_ in your example, since it gets overridden for each date in the list.

You could try this code, which works from the testdata dataset you create. If you want the customers who are duplicates, you can find them in the _AllDups_ table:

%macro check_duplicates(date_list, dataset);

   proc sort data=&dataset.; by date customer; run;

   data _AllDups_;
      set &dataset.; 
      by date customer; 
      if first.customer then c=0;
      c+1;
      if last.customer and c>1 then output;
   run;

    %do i = 1 %to %sysfunc(countw(&date_list));
        %let date = %scan(&date_list, &i);
 
        %if %sysfunc(inputn(&date, yymmdd8.)) ne . %then %do;

        data _null_;
           dt=input("&date.", yymmdd8.);
           put dt= date9.;
           if _N_ = 1 then do;
               length date 8;
               declare hash stats(dataset:"AllDups");
               stats.defineKey('date');
               stats.defineDone();
            end;
            rc=stats.Check(key:dt);
            if rc=0 then put "Dup found for &date.";
            else put "No dup found for &date.";
         run;

         %end;
   %end;

%mend;

%check_duplicates(20201231 20221231 20231231, testdata);

PaigeMiller · Posted 05-31-2024 06:17 AM

I know fo sure that it is easy to just use nodupkey (can't remember) but I want it to be a macro-loop.

Why?

I find it almost impossible to imagine that a macro could work faster than the NODUPKEY option in PROC SORT.

--
Paige Miller

Macro-loop to look for duplicates in a dataset based by costumer and dates

Re: Macro-loop to look for duplicates in a dataset based by costumer and dates

Re: Macro-loop to look for duplicates in a dataset based by costumer and dates

Catch up on SAS Innovate 2026

Macro-loop to look for duplicates in a dataset based by costumer and dates

Re: Macro-loop to look for duplicates in a dataset based by costumer and dates

Re: Macro-loop to look for duplicates in a dataset based by costumer and dates

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away