Hi SAS Community,
I had a question for you- I tried to look through to forums to see if something similar has been answered before but couldn't find anything- I apologize if it has! I'm completely new to SAS and have just been figuring things out as I go along and have run into a bit of a road-bump. I've posted the details below.
So I have a dataset that looks like the following:
Customer ID | Year | Supplier |
---|---|---|
12014 | 2001 | 88888 |
12014 | 2001 | 88888 |
22222 | 2002 | 88888 |
11111 | 2001 | 99999 |
41243 | 2003 | 99999 |
11111 | 2004 | 99999 |
41531 | 2004 | 88888 |
What I want to do is sum the number of unique customers that each supplier has by year, to get a database that looks like the following:
Supplier | Year | # of Unique Customers |
---|---|---|
88888 | 2001 | 1 |
88888 | 2002 | 1 |
88888 | 2003 | 0 |
88888 | 2004 | 1 |
99999 | 2001 | 1 |
99999 | 2002 | 0 |
99999 | 2003 | 1 |
99999 | 2004 | 1 |
If a customer comes one year and again the next, they would count as unique. If they make multiple purchases in one year, they are not considered unique. I have really no idea how to get from the data I have to the data I want. Any advice would be much appreciated! Thank you.
You don't say if you want a dataset or output. This will create output, prefix with a "create table <name> as" before the first select. "Have" represents the name of your current dataset.
proc sql;
select supplier, year, count(*) as customers
from (select distinct supplier,year, customerid from have)
group by supplier, year;
quit;
You don't say if you want a dataset or output. This will create output, prefix with a "create table <name> as" before the first select. "Have" represents the name of your current dataset.
proc sql;
select supplier, year, count(*) as customers
from (select distinct supplier,year, customerid from have)
group by supplier, year;
quit;
data have;
input CustomerID Year Supplier;
datalines;
12014 2001 88888
12014 2001 88888
22222 2002 88888
11111 2001 99999
41243 2003 99999
11111 2004 99999
41531 2004 88888
;
data base;
do supplier=88888 to 99999 by 11111;
do year=2001 to 2004;
output;
end;
end;
run;
proc sql;
create table want as
select a.supplier,a.year,count(distinct b.customerid) as unique_customer from base a
left join have b
on a.supplier=b.supplier
and a.year=b.year
group by a.supplier,a.year;
quit;
The only problem for solving this with SQL is getting zeros for suppliers/years without data. It can be done this way:
proc sql;
create table want as
select s.supplier, y.year, coalesce(c.nbCust, 0) as nbCust
from
(select distinct supplier from have) as s cross join
(select distinct year from have) as y left join
(select supplier, year, count(distinct customerid) as nbCust
from have
group by supplier, year) as c
on s.supplier=c.supplier and y.year=c.year;
select * from want;
quit;
PG
Ok, the more, the merrier. Freshly learned from ,
data have;
input Customer_ID Year Supplier;
cards;
12014 2001 88888
12014 2001 88888
22222 2002 88888
11111 2001 99999
41243 2003 99999
11111 2004 99999
41531 2004 88888
;
proc sql;
create table want as
select a.supplier, a.year, count(distinct b.customer_id) as ct from
(select distinct * from have (keep=supplier) , have (keep=year) ) a
natural left join have b
group by a.supplier,a.year
order by a.supplier,a.year
;
quit;
Haikuo
Good Point, PG!
Update: if anyone is interested, PG is saying the above statement is better off replace by the following:
Select * from (select distinct supplier from have), (select distinct year from have)
I've always been a fan of the double proc freq, just because its easy to understand
proc freq data=have noprint;
table supplier*year*customer_id/out=temp;
run;
proc freq data=temp noprint;
table supplier*year/out=want sparse;
run;
Nice!
Thanks for sharing, Reeza!
Thank you all so much- all your input has been incredibly helpful and I greatly appreciate it!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.