Recently, I am working on prediction for one dataset(for a store). I want to use group_by function to conclude some attributes. But my goal is a little complicated, because I want to separate the data into April and May.
For each specific customer(CUSTOMER_ID):
Profit, AGE, GENDER for each customer in April
QUANTITY purchased by each customer in April
I count TRANSACTION_DATE to find the how many times each customers come to store in April
SUB_NUM means whether the customers purchase certain product, I use 1 represent the customer purchased it, 0 represents they don't. Here, my goal is: for each customer, did they purchase it on April, and did they purchase it on May, so basically, it's two columns. Here is the most confused point.
The my desired outcome is like this, for each specific customer, I can conclude:
customer_id
gender
age
sub_num on April
sub_num on May
Quantity purchased on April
sum of "transaction_date" on April
1
2
3
here is the coding I have
proc sql;
SELECT customer_ID, sum (Profit) as totalProfit, count (QUANTITY) as totalQuantity, count (TRANSACTION_DATE) as frequency,
From cus_tran
where TRANSACTION_DATE < "01MAY2014"d
Group By customer_ID;
quit;
I am confused, because I want to group each specific customers, and conclude several data only on April, one only on May, and combine those dataset in one table. I think where clause is not available here, if only focus on April data, the data is likely to become incomplete, thus I cannot got data for each customers. I spend a so hard time on it, can you please help me fix it? Thanks so much!
... View more