BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
qqian
Calcite | Level 5

Hi,

I have a sample dataset follows:

********************************************

data sample;
infile datalines delimiter=',';
input Email $ Product_Type Quantity;
datalines;
E1,1,2
E1,1,2
E1,1,2
E1,1,2
E1,1,2
E1,3,5
E2,1,1
E2,1,1
E2,1,1
E2,1,1
E2,2,3
;
run;

********************************************


I divided users into different groups based of number of orders they made:

 

********************************************

proc format;
value Orders
1 = '1 order'
2 = '2 orders'
3 - high = '3+ orders'
. = 'No purchase';
run;

********************************************

 

Then I used proc sql to first group data by email and product type, then put them in a subquery to group it again based on product_type and formatted range.

 

********************************************

proc sql;
select product_type, put(number_of_orders, Orders.) as number_of_orders, count(*) as number_of_email_accounts
from (select email, product_type, count(*) as number_of_orders
from sample
group by email, product_type) a
group by number_of_orders, a.product_type;
quit;

********************************************

 

Both E1 and E2 have more than 3 orders for Product_Type 1. Output of subquery:

Email

Product_Type

number_of_orders

E1

1

5

E1

3

1

E2

1

4

E2

2

1



Final Output:

********************************************

Product_Type

number_of_orders

number_of_email_accounts

1

3+ orders

1

1

3+ orders

1

2

1 order

1

3

1 order

1


********************************************             

It does give the correct group name, but why the group by clause is not working? Can I get a result like

Product_Type

number_of_orders

number_of_email_accounts

1

3+ orders

2

2

1 order

1

3

1 order

1

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi Thank you @qqian  Much better. Here you go, the following should meet your need

 





data sample;
infile datalines delimiter=',';
input Email $ Product_Type Quantity;
datalines;
E1,1,2
E1,1,2
E1,1,2
E1,1,2
E1,1,2
E1,3,5
E2,1,1
E2,1,1
E2,1,1
E2,1,1
E2,2,3
;
run;



proc format;
value Orders
1 = '1 order'
2 = '2 orders'
3 - high = 'More than 3 orders'
. = 'No purchase';
run;


proc sql;
create table want as
select product_type,number_of_orders,count(*) as number_of_email_accounts
from 
(select email, product_type, put(count(*), Orders.) as number_of_orders 
from sample
group by email, product_type)
group by number_of_orders,product_type
order by product_type;
quit;

EDITED: To include formatted value

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

The data set you show is NOT the dataset used in the sub-query

 

(select email, product_type, count(*) as number_of_orders
from db1
group by email, product_type) 

since you don't show a variable named email. Thus, we can't replicate the result you are getting or explain why it is happening.

 

So please, give us (a portion of) the actual data used by this code in the data set DB1, and not some "close facsimile" of the data set. Please use these instructions to provide the data set as a SAS data step. DO NOT USE ANY OTHER METHOD TO PROVIDE THE DATA.

 

Also, I don't see any reason for you to turn the variable QUANTITY to character, unnecessary effort to turn a numeric variable into character here.

--
Paige Miller
qqian
Calcite | Level 5

Hi Paige,

Sorry for the confusing information. I modified my message and provided the dataset I created, the code and also the output.

Thanks,

Quan

novinosrin
Tourmaline | Level 20

Hi @qqian   Breaking your SQL into steps  makes me wonder how do you expect what you are looking for

 


data have;
input Email $	Product_Type	Quantity;
cards;
E1	1	2
E2	1	1
E2	2	3
E3	4	24
E4	3	5
E5	2	6
E5	2	8
E5	3	2
;

proc format;
value Orders
1 = '1 order'
2 = '2 orders'
3 - high = 'More than 3 orders'
. = 'No purchase';
run;

proc sql;
create table temp as
select email, product_type, count(*) as number_of_orders
from have
group by email, product_type;
quit;


proc sql;
create table w as
select *, put(number_of_orders, Orders.) as number_of_orders1
from temp
group by number_of_orders1, product_type;
quit;

Please take a look at Temp and W, and clarify

qqian
Calcite | Level 5

Hi @novinosrin,

 

Thanks for your response!! I modified my message above and provided the output I got and the ideal output I want to have.

 

Thanks,

Quan

novinosrin
Tourmaline | Level 20

Hi Thank you @qqian  Much better. Here you go, the following should meet your need

 





data sample;
infile datalines delimiter=',';
input Email $ Product_Type Quantity;
datalines;
E1,1,2
E1,1,2
E1,1,2
E1,1,2
E1,1,2
E1,3,5
E2,1,1
E2,1,1
E2,1,1
E2,1,1
E2,2,3
;
run;



proc format;
value Orders
1 = '1 order'
2 = '2 orders'
3 - high = 'More than 3 orders'
. = 'No purchase';
run;


proc sql;
create table want as
select product_type,number_of_orders,count(*) as number_of_email_accounts
from 
(select email, product_type, put(count(*), Orders.) as number_of_orders 
from sample
group by email, product_type)
group by number_of_orders,product_type
order by product_type;
quit;

EDITED: To include formatted value

qqian
Calcite | Level 5

Hi @novinosrin,

 

I'd like to add number_of_orders in the want table, then it works! I don't know it should be solved by creating a new table.

proc sql;
create table want as
select product_type, number_of_orders, count(*) as number_of_email_accounts
from 
(select email, product_type, put(count(*), Orders.) as number_of_orders 
from sample
group by email, product_type)
group by number_of_orders,product_type;
quit;

Thank you so much!!

novinosrin
Tourmaline | Level 20

@qqian  No need to create a new table WANT. That has become a habit for us in the coommunity to use HAVE and WANT everywhere lol

 

Feel free to remove the create table statement as i tested below

 


proc sql;
select product_type,number_of_orders,count(*) as number_of_email_accounts
from 
(select email, product_type, put(count(*), Orders.) as number_of_orders 
from sample
group by email, product_type)
group by number_of_orders,product_type
order by product_type;
quit;
Product_Type number_of_orders number_of_email_accounts
1 More than 3 orders 2
2 1 order 1
3 1 order 1
qqian
Calcite | Level 5

gotcha, so the key is to move 'put' in the subquery. Learned a lot.!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2172 views
  • 0 likes
  • 3 in conversation