Hello All,
I have a dataset like below
Customer_ID | Vistited_Date |
1234 | 7-Feb-20 |
4567 | 7-Feb-20 |
9870 | 7-Feb-20 |
1234 | 14-Feb-20 |
7654 | 14-Feb-20 |
3421 | 14-Feb-20 |
I am trying find the cumulative unique count of customers by date, assuming my output will be like below
Cust_count | Vistited_Date |
3 | 7-Feb-20 |
2 | 14-Feb-20 |
7-Feb-2020 has 3 unique customers, whereas 14-Feb-2020 has only 2 hence customer 1234 has visited already.
Anyone knows how I could develop a data set in these conditions?
Sorry if my question is not clear enough, and I am available to give more details if necessary.
Thanks!
Perfect use for a hash table.
data WANT;
if _N_=1 then do;
dcl hash H(dataset:'HAVE(obs=0)');
H.definekey('ID');
H.definedone();
end;
set HAVE;
by DATE;
if first.DATE then SUM=0;
if H.check() then do;
SUM+1;
H.add();
end;
if last.DATE then output;
keep DATE SUM;
run;
DATE | SUM |
---|---|
07FEB20 | 3 |
14FEB20 | 2 |
One classic approach is a double proc freq
proc freq data=have noprint; tables date*id/out=work.count1; run; proc freq data=work.count1 noprint; tables date / out=work.want; run;
Misread the requirement.
@ballardw Your code does not count customers once only.
Perfect use for a hash table.
data WANT;
if _N_=1 then do;
dcl hash H(dataset:'HAVE(obs=0)');
H.definekey('ID');
H.definedone();
end;
set HAVE;
by DATE;
if first.DATE then SUM=0;
if H.check() then do;
SUM+1;
H.add();
end;
if last.DATE then output;
keep DATE SUM;
run;
DATE | SUM |
---|---|
07FEB20 | 3 |
14FEB20 | 2 |
Not as neat as the hash approach but works as well.
data have;
infile datalines dsd;
input Customer_ID Visited_Date :anydtdte.;
format Visited_Date date9.;
datalines;
1234,7-Feb-20
4567,7-Feb-20
9870,7-Feb-20
1234,14-Feb-20
7654,14-Feb-20
3421,14-Feb-20
;
proc sort data=have out=inter;
by Customer_ID Visited_Date;
run;
proc sort data=inter out=inter nodupkey;
by Customer_ID;
run;
proc sql;
/* create table want as*/
select
count(customer_id) as cust_count,
Visited_Date
from inter
group by Visited_Date
;
quit;
Thanks for the quick solution!!
A single (although nested) SQL query should suffice:
proc sql;
create table want as select Visited_date,count(distinct Customer_ID) as Cust_Count
from(select * from have group by Customer_ID having Visited_date=min(Visited_date))
group by Visited_Date;
quit;
data have;
input Customer_ID Vistited_Date :anydtdte12.;
format Vistited_Date date9.;
datalines;
1234 7-Feb-2020
4567 7-Feb-2020
9870 7-Feb-2020
1234 14-Feb-2020
7654 14-Feb-2020
3421 14-Feb-2020
;
data want (drop=Customer_ID);
if _N_=1 then do;
declare hash h ();
h.definekey ('Customer_ID');
h.definedone ();
end;
do until (last.Vistited_Date);
set have;
by Vistited_Date;
if h.add() = 0 then Count = sum(Count, 1);
end;
run;
proc sort data=have out=have_nodup;
by Visited_Date Customer_ID;
run;
proc sort data=have_nodup nodupkey;
by Customer_ID;
run;
proc sql;
select Visited_Date, count(Customer_ID) as Cust_count
from have_nodup
group by Visited_Date;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: