Help using Base SAS procedures

Need Help with Summing Distinct in a Database

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Need Help with Summing Distinct in a Database

Hi SAS Community,

I had a question for you- I tried to look through to forums to see if something similar has been answered before but couldn't find anything- I apologize if it has! I'm completely new to SAS and have just been figuring things out as I go along and have run into a bit of a road-bump. I've posted the details below.

So I have a dataset that looks like the following:

Customer IDYearSupplier

12014

200188888
12014200188888
22222200288888
11111200199999
41243200399999
11111200499999
41531200488888

What I want to do is sum the number of unique customers that each supplier has by year, to get a database that looks like the following:

SupplierYear# of Unique Customers
8888820011
8888820021
8888820030
8888820041
9999920011
9999920020
9999920031
9999920041

If a customer comes one year and again the next, they would count as unique. If they make multiple purchases in one year, they are not considered unique. I have really no idea how to get from the data I have to the data I want. Any advice would be much appreciated! Thank you.


Accepted Solutions
Solution
‎09-18-2014 02:31 PM
Super User
Posts: 10,466

Re: Need Help with Summing Distinct in a Database

You don't say if you want a dataset or output. This will create output, prefix with a "create table <name> as" before the first select. "Have" represents the name of your current dataset.

proc sql;

     select supplier, year, count(*) as customers

     from (select distinct supplier,year, customerid from have)

     group by supplier, year;

quit;

View solution in original post


All Replies
Solution
‎09-18-2014 02:31 PM
Super User
Posts: 10,466

Re: Need Help with Summing Distinct in a Database

You don't say if you want a dataset or output. This will create output, prefix with a "create table <name> as" before the first select. "Have" represents the name of your current dataset.

proc sql;

     select supplier, year, count(*) as customers

     from (select distinct supplier,year, customerid from have)

     group by supplier, year;

quit;

Trusted Advisor
Posts: 1,204

Re: Need Help with Summing Distinct in a Database

data have;
input CustomerID Year Supplier;
datalines;
12014 2001 88888
12014 2001 88888
22222 2002 88888
11111 2001 99999
41243 2003 99999
11111 2004 99999
41531 2004 88888
;

data base;
do supplier=88888 to 99999 by 11111;
do year=2001 to 2004;
output;
end;
end;
run;

proc sql;
create table want as
select a.supplier,a.year,count(distinct b.customerid) as unique_customer from base a
left join have b
on a.supplier=b.supplier
and a.year=b.year
group by a.supplier,a.year;
quit;

Respected Advisor
Posts: 4,640

Re: Need Help with Summing Distinct in a Database

The only problem for solving this with SQL is getting zeros for suppliers/years without data. It can be done this way:

proc sql;

create table want as

select s.supplier, y.year, coalesce(c.nbCust, 0) as nbCust

from

(select distinct supplier from have) as s cross join

(select distinct year from have) as y left join

(select supplier, year, count(distinct customerid) as nbCust

from have

group by supplier, year) as c

    on s.supplier=c.supplier and y.year=c.year;

select * from want;

quit;

PG

PG
Respected Advisor
Posts: 3,124

Re: Need Help with Summing Distinct in a Database

Ok, the more, the merrier. Freshly learned from ,

data have;

     input Customer_ID    Year Supplier;

     cards;

12014 2001 88888

12014 2001 88888

22222 2002 88888

11111 2001 99999

41243 2003 99999

11111 2004 99999

41531 2004 88888

;

proc sql;

     create table want as

           select a.supplier, a.year, count(distinct b.customer_id) as ct from

           (select distinct * from have (keep=supplier) , have (keep=year) ) a

                natural left join have b

           group by a.supplier,a.year

                order by  a.supplier,a.year

     ;

quit;

Haikuo

Respected Advisor
Posts: 4,640

Re: Need Help with Summing Distinct in a Database

Hi , the crossproduct join

select distinct * from have (keep=supplier) , have (keep=year)


might generate a large table before being shrunk by the distinct clause (unless it is somehow optimized out). I prefer to apply distinct before doing the crossproduct.

PG

PG
Respected Advisor
Posts: 3,124

Re: Need Help with Summing Distinct in a Database

Good Point, PG!

Update: if anyone is interested, PG is saying the above statement is better off replace by the following:

Select * from (select distinct supplier from have), (select distinct year from have)

Super User
Posts: 17,748

Re: Need Help with Summing Distinct in a Database

I've always been a fan of the double proc freq, just because its easy to understand Smiley Happy

proc freq data=have noprint;

table supplier*year*customer_id/out=temp;

run;

proc freq data=temp noprint;

table supplier*year/out=want sparse;

run;

Respected Advisor
Posts: 3,124

Re: Need Help with Summing Distinct in a Database

Nice!

Thanks for sharing, Reeza!

Occasional Contributor
Posts: 6

Re: Need Help with Summing Distinct in a Database

Thank you all so much- all your input has been incredibly helpful and I greatly appreciate it!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 256 views
  • 4 likes
  • 6 in conversation