Hello,
I am a newbie trying to learn and had a query.
I am trying to identify new products being ordered by customers. I want to check if a customer has placed a order(say in 2016 and 2017) ,did he/she order the same product in 2014-2015 or is it something new to him/her. If it is new then I want to have the count of new ordered products(how many products new to the customer were ordered), the count of customers ordering new product and sales man id associated with order.
Every item purchased has a unique ID Prod_id, Customer is represented by Customer_id, date of order is Order_date and sales man is represented by sales_id.
Any help will be highly appreciated.
Help us help you. Show what the incoming data looks like. Show what you want the result data to look like.
incoming data is sales data for a store from 2014-2017.
Cust_id, order_date and sales id are character variables. Prod_id is numeric variable.
I want the results in a tabular form and to only include products that were new to customer(not ordered in the previous two years)
for example
Cust_id prod_id sales_id order_date
A345 74 S231 09/09/2017
I am sorry, I am new to all this, is this what was expected ??
you could do this using Proc SQL and a union statement for customer Id and product.
if you would like a sample please post sample data that represents your case with more than one sample. While we can create a lot of sample data using your example is best because the solutions will fit your case study.
Salesman_name | Prod_id | Order_date | Cust_name | Sales_id | Cust_id | Paymode_ID |
john | 233 | 3/27/2017 | Bill.N | S1922 | J231 | 1 |
john | 235 | 3/27/2017 | Bill.N | S1922 | J231 | 1 |
john | 244 | 3/27/2017 | Bill.N | S1922 | J231 | 1 |
john | 245 | 3/27/2017 | Bill.N | S1922 | J231 | 1 |
Bail | 252 | 3/27/2017 | Susan | S1750 | S19 | 1 |
Bail | 257 | 3/27/2017 | Susan | S1750 | S19 | 1 |
Chuck | 262 | 3/27/2017 | Areeb | S2011 | A116 | 1 |
Chuck | 266 | 3/27/2017 | Areeb | S2011 | A116 | 1 |
Chuck | 419 | 3/27/2017 | Areeb | S2011 | A116 | 1 |
Chuck | 420 | 3/27/2017 | Areeb | S2011 | A116 | 1 |
phoung | 428 | 3/28/2017 | Michelle | S1872 | M103 | 3 |
Chuck | 431 | 3/28/2017 | Liz | S2011 | L103 | 2 |
Chuck | 434 | 3/28/2017 | Liz | S2011 | L103 | 2 |
Christy | 443 | 3/28/2017 | Laxmi | S1881 | L72 | 2 |
Christy | 446 | 3/28/2017 | Laxmi | S1881 | L72 | 2 |
john | 455 | 3/29/2017 | James | S1922 | J24 | 3 |
john | 457 | 3/29/2017 | James | S1922 | J24 | 3 |
john | 460 | 3/29/2017 | James | S1922 | J24 | 3 |
john | 596 | 3/29/2017 | James | S1922 | J24 | 3 |
phoung | 603 | 3/29/2017 | Rana | S1872 | R189 | 1 |
Couldn't copy the whole things, the data set has 53 variables and around 300,000 rows for 4 years.
You did not supply data that would meet the selection criteria you have listed.
I created extra records to confirm this works. But you will need to create the order_year our of the order_date for the logic to work.
proc sql;
create table want as
select a.*, b.order_date as prior_order_date
from have2 as a, have2 as b
where a.salesman_name = b.salesman_name
and a.prod_id = b.prod_id
and b.order_year in (2014,2015)
and a.order_year in (2016,2017);
quit;
Note this in not using a union but works for your case study.
Thanks,
but when I run the code it says work.have created with 0 rows and 8 columns
please I am also attaching a sample data set if you can have a look
Post the code and log from what you tried as well.
95 proc sql;
96 create table want as
97 select a.*, b.order_date as prior_order_date
98 from work.a as a, work.a as b
99 where a.Cust_id = b.Cust_id
100 and a.Prod_id = b.Prod_id
101 and b.order_date in (2014,2015)
102 and a.order_date in (2016,2017);
NOTE: Table WORK.WANT created, with 0 rows and 8 columns.
103 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
If the data are sorted by date, or by date within cust_id then this works:
data want;
set have;
if _n_=1 then do;
declare hash h ();
h.definekey('cust_id','prod_id');
h.definedone();
end;
set have;
if date<'01jan2016'd then h.add();
if date>='01jan2016'd and h.find()^=0;
run;
At the first incoming observation, this program instantiates a hash object (a "lookup table" in this case) keyed on cust_id/prod_id. The object only is populated (one record at a time) by instances in which date < '01jan2016'.
Then when later dates are encountered, the h.find method looks to see if the incoming cust_id/prod_id is already in the object (indicating a pre-2016 purchase). If it is not, the find method returns a non-zero. These are the cases kept for output.
I repeat, this depends on records being sorted by date, or by date within cust_id.
If the data is not sorted, then either it has to be sorted and submitted to the code above, or more complicated coding could be written for a single data step.
When I try it, it gives me the observations which took place after 2016. Irrespective of it being repeated in the previous year.
I am trying to exclude products ordered by a customers more than once.
Say for example
in the attached excel sheet there are 10 products(highlighted in yellow) new to the customer(not ordered in their previous encounters). I want those rows to be shown in the results
sort you data by customer Id and purchase date, prior to doing the proc sql process;
then those sql selections will work.
you need to obtain the year part from your date field. your dates are in the format of mmddyy, where you need to test the year not the month a day parts of the date.
Converted dates into year4. format, still no luck. I am sorry, I am naive at this doing it for the first time.
379 data year;
380 set work.a;
381 format order_date year4.;
382 run;
NOTE: There were 108 observations read from the data set WORK.A.
NOTE: The data set WORK.YEAR has 108 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
383 proc sort data=work.year;
384 by order_date cust_id;run;
NOTE: There were 108 observations read from the data set WORK.YEAR.
NOTE: The data set WORK.YEAR has 108 observations and 7 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.03 seconds
385 proc sql;
386 create table japan as
387 select a.*, b.order_date as prior_order_date
388 from work.year as a, work.year as b
389 where a.Cust_id = b.Cust_id
390 and a.Prod_id = b.Prod_id
391 and b.order_date in (2015)
392 and a.order_date in (2017);
NOTE: Table WORK.JAPAN created, with 0 rows and 8 columns.
393 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.