Solved: Re: SAS cumulative count by unique ID and date

Pradeepbanu · Posted 06-03-2020 10:38 PM

Hello All,

I have a dataset like below

Customer_ID	Vistited_Date
1234	7-Feb-20
4567	7-Feb-20
9870	7-Feb-20
1234	14-Feb-20
7654	14-Feb-20
3421	14-Feb-20

I am trying find the cumulative unique count of customers by date, assuming my output will be like below

Cust_count	Vistited_Date
3	7-Feb-20
2	14-Feb-20

7-Feb-2020 has 3 unique customers, whereas 14-Feb-2020 has only 2 hence customer 1234 has visited already.

Anyone knows how I could develop a data set in these conditions?

Sorry if my question is not clear enough, and I am available to give more details if necessary.

Thanks!

ChrisNZ · Posted 06-03-2020 11:42 PM

Perfect use for a hash table.

data WANT;
  if _N_=1 then do;
   dcl hash H(dataset:'HAVE(obs=0)');
   H.definekey('ID');
   H.definedone();
  end;
  set HAVE;
  by DATE;    
  if first.DATE then SUM=0;
  if H.check() then do;
    SUM+1;
    H.add();
  end;
  if last.DATE then output;
  keep DATE SUM;
run;

DATE	SUM
07FEB20	3
14FEB20	2

High-Performance SAS Coding - Third Edition

View solution in original post

ballardw · Posted 06-03-2020 10:51 PM

One classic approach is a double proc freq

proc freq data=have noprint;
   tables date*id/out=work.count1;
run;

proc freq data=work.count1 noprint;
   tables date / out=work.want;
run;

Misread the requirement.

ChrisNZ · Posted 06-03-2020 11:43 PM

@ballardw Your code does not count customers once only.

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 06-03-2020 11:42 PM

Perfect use for a hash table.

data WANT;
  if _N_=1 then do;
   dcl hash H(dataset:'HAVE(obs=0)');
   H.definekey('ID');
   H.definedone();
  end;
  set HAVE;
  by DATE;    
  if first.DATE then SUM=0;
  if H.check() then do;
    SUM+1;
    H.add();
  end;
  if last.DATE then output;
  keep DATE SUM;
run;

DATE	SUM
07FEB20	3
14FEB20	2

High-Performance SAS Coding - Third Edition

Patrick · Posted 06-04-2020 01:15 AM

Not as neat as the hash approach but works as well.

data have;
  infile datalines dsd;
  input Customer_ID Visited_Date :anydtdte.;
  format Visited_Date date9.;
  datalines;
1234,7-Feb-20
4567,7-Feb-20
9870,7-Feb-20
1234,14-Feb-20
7654,14-Feb-20
3421,14-Feb-20
;

proc sort data=have out=inter;
  by Customer_ID Visited_Date;
run;

proc sort data=inter out=inter nodupkey;
  by Customer_ID;
run;

proc sql;
/*  create table want as*/
    select 
      count(customer_id) as cust_count, 
      Visited_Date
    from inter
    group by Visited_Date
    ;
quit;

Pradeepbanu · Posted 06-25-2020 09:56 AM

Thanks for the quick solution!!

s_lassen · Posted 06-04-2020 01:52 AM

A single (although nested) SQL query should suffice:

proc sql;
  create table want as select Visited_date,count(distinct Customer_ID) as Cust_Count
    from(select * from have group by Customer_ID having Visited_date=min(Visited_date))
    group by Visited_Date;
quit;

PeterClemmensen · Posted 06-04-2020 03:33 AM

data have;
input Customer_ID Vistited_Date :anydtdte12.;
format Vistited_Date date9.;
datalines;
1234 7-Feb-2020
4567 7-Feb-2020
9870 7-Feb-2020
1234 14-Feb-2020
7654 14-Feb-2020
3421 14-Feb-2020
;

data want (drop=Customer_ID);
    if _N_=1 then do;
        declare hash h ();
        h.definekey ('Customer_ID');
        h.definedone ();
    end;

    do until (last.Vistited_Date);
        set have;
        by Vistited_Date;
        if h.add() = 0 then Count = sum(Count, 1);
    end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

ed_sas_member · Posted 06-04-2020 03:33 AM

proc sort data=have out=have_nodup;
	by Visited_Date Customer_ID;
run;

proc sort data=have_nodup nodupkey;
	by Customer_ID;
run;

proc sql;
	select Visited_Date, count(Customer_ID) as Cust_count
	from have_nodup
	group by Visited_Date;
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away