BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jackie32
Calcite | Level 5

Hi SAS Community,

I had a question for you- I tried to look through to forums to see if something similar has been answered before but couldn't find anything- I apologize if it has! I'm completely new to SAS and have just been figuring things out as I go along and have run into a bit of a road-bump. I've posted the details below.

So I have a dataset that looks like the following:

Customer IDYearSupplier

12014

200188888
12014200188888
22222200288888
11111200199999
41243200399999
11111200499999
41531200488888

What I want to do is sum the number of unique customers that each supplier has by year, to get a database that looks like the following:

SupplierYear# of Unique Customers
8888820011
8888820021
8888820030
8888820041
9999920011
9999920020
9999920031
9999920041

If a customer comes one year and again the next, they would count as unique. If they make multiple purchases in one year, they are not considered unique. I have really no idea how to get from the data I have to the data I want. Any advice would be much appreciated! Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

You don't say if you want a dataset or output. This will create output, prefix with a "create table <name> as" before the first select. "Have" represents the name of your current dataset.

proc sql;

     select supplier, year, count(*) as customers

     from (select distinct supplier,year, customerid from have)

     group by supplier, year;

quit;

View solution in original post

9 REPLIES 9
ballardw
Super User

You don't say if you want a dataset or output. This will create output, prefix with a "create table <name> as" before the first select. "Have" represents the name of your current dataset.

proc sql;

     select supplier, year, count(*) as customers

     from (select distinct supplier,year, customerid from have)

     group by supplier, year;

quit;

stat_sas
Ammonite | Level 13

data have;
input CustomerID Year Supplier;
datalines;
12014 2001 88888
12014 2001 88888
22222 2002 88888
11111 2001 99999
41243 2003 99999
11111 2004 99999
41531 2004 88888
;

data base;
do supplier=88888 to 99999 by 11111;
do year=2001 to 2004;
output;
end;
end;
run;

proc sql;
create table want as
select a.supplier,a.year,count(distinct b.customerid) as unique_customer from base a
left join have b
on a.supplier=b.supplier
and a.year=b.year
group by a.supplier,a.year;
quit;

PGStats
Opal | Level 21

The only problem for solving this with SQL is getting zeros for suppliers/years without data. It can be done this way:

proc sql;

create table want as

select s.supplier, y.year, coalesce(c.nbCust, 0) as nbCust

from

(select distinct supplier from have) as s cross join

(select distinct year from have) as y left join

(select supplier, year, count(distinct customerid) as nbCust

from have

group by supplier, year) as c

    on s.supplier=c.supplier and y.year=c.year;

select * from want;

quit;

PG

PG
Haikuo
Onyx | Level 15

Ok, the more, the merrier. Freshly learned from ,

data have;

     input Customer_ID    Year Supplier;

     cards;

12014 2001 88888

12014 2001 88888

22222 2002 88888

11111 2001 99999

41243 2003 99999

11111 2004 99999

41531 2004 88888

;

proc sql;

     create table want as

           select a.supplier, a.year, count(distinct b.customer_id) as ct from

           (select distinct * from have (keep=supplier) , have (keep=year) ) a

                natural left join have b

           group by a.supplier,a.year

                order by  a.supplier,a.year

     ;

quit;

Haikuo

PGStats
Opal | Level 21

Hi , the crossproduct join

select distinct * from have (keep=supplier) , have (keep=year)


might generate a large table before being shrunk by the distinct clause (unless it is somehow optimized out). I prefer to apply distinct before doing the crossproduct.

PG

PG
Haikuo
Onyx | Level 15

Good Point, PG!

Update: if anyone is interested, PG is saying the above statement is better off replace by the following:

Select * from (select distinct supplier from have), (select distinct year from have)

Reeza
Super User

I've always been a fan of the double proc freq, just because its easy to understand Smiley Happy

proc freq data=have noprint;

table supplier*year*customer_id/out=temp;

run;

proc freq data=temp noprint;

table supplier*year/out=want sparse;

run;

Haikuo
Onyx | Level 15

Nice!

Thanks for sharing, Reeza!

jackie32
Calcite | Level 5

Thank you all so much- all your input has been incredibly helpful and I greatly appreciate it!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 902 views
  • 4 likes
  • 6 in conversation