Hello All,
I have a dataset like below
Customer_ID | Vistited_Date |
1234 | 7-Feb-20 |
4567 | 7-Feb-20 |
9870 | 7-Feb-20 |
1234 | 14-Feb-20 |
7654 | 14-Feb-20 |
3421 | 14-Feb-20 |
I am trying find the cumulative unique count of customers by date, assuming my output will be like below
Cust_count | Vistited_Date |
3 | 7-Feb-20 |
2 | 14-Feb-20 |
7-Feb-2020 has 3 unique customers, whereas 14-Feb-2020 has only 2 hence customer 1234 has visited already.
Anyone knows how I could develop a data set in these conditions?
Sorry if my question is not clear enough, and I am available to give more details if necessary.
Thanks!
Perfect use for a hash table.
data WANT;
if _N_=1 then do;
dcl hash H(dataset:'HAVE(obs=0)');
H.definekey('ID');
H.definedone();
end;
set HAVE;
by DATE;
if first.DATE then SUM=0;
if H.check() then do;
SUM+1;
H.add();
end;
if last.DATE then output;
keep DATE SUM;
run;
DATE | SUM |
---|---|
07FEB20 | 3 |
14FEB20 | 2 |
One classic approach is a double proc freq
proc freq data=have noprint; tables date*id/out=work.count1; run; proc freq data=work.count1 noprint; tables date / out=work.want; run;
Misread the requirement.
@ballardw Your code does not count customers once only.
Perfect use for a hash table.
data WANT;
if _N_=1 then do;
dcl hash H(dataset:'HAVE(obs=0)');
H.definekey('ID');
H.definedone();
end;
set HAVE;
by DATE;
if first.DATE then SUM=0;
if H.check() then do;
SUM+1;
H.add();
end;
if last.DATE then output;
keep DATE SUM;
run;
DATE | SUM |
---|---|
07FEB20 | 3 |
14FEB20 | 2 |
Not as neat as the hash approach but works as well.
data have;
infile datalines dsd;
input Customer_ID Visited_Date :anydtdte.;
format Visited_Date date9.;
datalines;
1234,7-Feb-20
4567,7-Feb-20
9870,7-Feb-20
1234,14-Feb-20
7654,14-Feb-20
3421,14-Feb-20
;
proc sort data=have out=inter;
by Customer_ID Visited_Date;
run;
proc sort data=inter out=inter nodupkey;
by Customer_ID;
run;
proc sql;
/* create table want as*/
select
count(customer_id) as cust_count,
Visited_Date
from inter
group by Visited_Date
;
quit;
Thanks for the quick solution!!
A single (although nested) SQL query should suffice:
proc sql;
create table want as select Visited_date,count(distinct Customer_ID) as Cust_Count
from(select * from have group by Customer_ID having Visited_date=min(Visited_date))
group by Visited_Date;
quit;
data have;
input Customer_ID Vistited_Date :anydtdte12.;
format Vistited_Date date9.;
datalines;
1234 7-Feb-2020
4567 7-Feb-2020
9870 7-Feb-2020
1234 14-Feb-2020
7654 14-Feb-2020
3421 14-Feb-2020
;
data want (drop=Customer_ID);
if _N_=1 then do;
declare hash h ();
h.definekey ('Customer_ID');
h.definedone ();
end;
do until (last.Vistited_Date);
set have;
by Vistited_Date;
if h.add() = 0 then Count = sum(Count, 1);
end;
run;
proc sort data=have out=have_nodup;
by Visited_Date Customer_ID;
run;
proc sort data=have_nodup nodupkey;
by Customer_ID;
run;
proc sql;
select Visited_Date, count(Customer_ID) as Cust_count
from have_nodup
group by Visited_Date;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.