BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jackie32
Calcite | Level 5

Hi SAS Community,

I had a question for you- I tried to look through to forums to see if something similar has been answered before but couldn't find anything- I apologize if it has! I'm completely new to SAS and have just been figuring things out as I go along and have run into a bit of a road-bump. I've posted the details below.

So I have a dataset that looks like the following:

Customer IDYearSupplier

12014

200188888
12014200188888
22222200288888
11111200199999
41243200399999
11111200499999
41531200488888

What I want to do is sum the number of unique customers that each supplier has by year, to get a database that looks like the following:

SupplierYear# of Unique Customers
8888820011
8888820021
8888820030
8888820041
9999920011
9999920020
9999920031
9999920041

If a customer comes one year and again the next, they would count as unique. If they make multiple purchases in one year, they are not considered unique. I have really no idea how to get from the data I have to the data I want. Any advice would be much appreciated! Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

You don't say if you want a dataset or output. This will create output, prefix with a "create table <name> as" before the first select. "Have" represents the name of your current dataset.

proc sql;

     select supplier, year, count(*) as customers

     from (select distinct supplier,year, customerid from have)

     group by supplier, year;

quit;

View solution in original post

9 REPLIES 9
ballardw
Super User

You don't say if you want a dataset or output. This will create output, prefix with a "create table <name> as" before the first select. "Have" represents the name of your current dataset.

proc sql;

     select supplier, year, count(*) as customers

     from (select distinct supplier,year, customerid from have)

     group by supplier, year;

quit;

stat_sas
Ammonite | Level 13

data have;
input CustomerID Year Supplier;
datalines;
12014 2001 88888
12014 2001 88888
22222 2002 88888
11111 2001 99999
41243 2003 99999
11111 2004 99999
41531 2004 88888
;

data base;
do supplier=88888 to 99999 by 11111;
do year=2001 to 2004;
output;
end;
end;
run;

proc sql;
create table want as
select a.supplier,a.year,count(distinct b.customerid) as unique_customer from base a
left join have b
on a.supplier=b.supplier
and a.year=b.year
group by a.supplier,a.year;
quit;

PGStats
Opal | Level 21

The only problem for solving this with SQL is getting zeros for suppliers/years without data. It can be done this way:

proc sql;

create table want as

select s.supplier, y.year, coalesce(c.nbCust, 0) as nbCust

from

(select distinct supplier from have) as s cross join

(select distinct year from have) as y left join

(select supplier, year, count(distinct customerid) as nbCust

from have

group by supplier, year) as c

    on s.supplier=c.supplier and y.year=c.year;

select * from want;

quit;

PG

PG
Haikuo
Onyx | Level 15

Ok, the more, the merrier. Freshly learned from ,

data have;

     input Customer_ID    Year Supplier;

     cards;

12014 2001 88888

12014 2001 88888

22222 2002 88888

11111 2001 99999

41243 2003 99999

11111 2004 99999

41531 2004 88888

;

proc sql;

     create table want as

           select a.supplier, a.year, count(distinct b.customer_id) as ct from

           (select distinct * from have (keep=supplier) , have (keep=year) ) a

                natural left join have b

           group by a.supplier,a.year

                order by  a.supplier,a.year

     ;

quit;

Haikuo

PGStats
Opal | Level 21

Hi , the crossproduct join

select distinct * from have (keep=supplier) , have (keep=year)


might generate a large table before being shrunk by the distinct clause (unless it is somehow optimized out). I prefer to apply distinct before doing the crossproduct.

PG

PG
Haikuo
Onyx | Level 15

Good Point, PG!

Update: if anyone is interested, PG is saying the above statement is better off replace by the following:

Select * from (select distinct supplier from have), (select distinct year from have)

Reeza
Super User

I've always been a fan of the double proc freq, just because its easy to understand Smiley Happy

proc freq data=have noprint;

table supplier*year*customer_id/out=temp;

run;

proc freq data=temp noprint;

table supplier*year/out=want sparse;

run;

Haikuo
Onyx | Level 15

Nice!

Thanks for sharing, Reeza!

jackie32
Calcite | Level 5

Thank you all so much- all your input has been incredibly helpful and I greatly appreciate it!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2252 views
  • 4 likes
  • 6 in conversation