Hello!
I would sincerely appreciate any insight on the following question below. Basically I need a distinct count of the cust_ids to identify item codes that were ordered on the same day by the same person. I have to complete this analysis by tomorrow. I am grateful for your time and help!
There is a better solution with hash object, I'm sure. But assuming I understand you correctly...
data codes;
infile datalines dlm=',';
length date 8 item_code 8 item_desc $ 10 cust_id 8;
format date date9.;
informat date mmddyy.;
input date Item_Code item_desc cust_id;
datalines;
08/18/2016,1245,Pizza,5555
09/17/2014,1245,Pizza,4357
08/18/2016,3344,Chicken,5555
09/17/2014,6742,Bread,4357
10/10/2015,3217,Salad,3218
08/18/2016,1245,Pizza,9999
09/17/2014,1245,Pizza,4357
08/18/2016,3344,Chicken,9999
;
run;
proc sort data=codes;
by date cust_id Item_Code;
run;
/* assemble combos per day, per customer */
data sameday_combos(keep=date combo cust_id);
length combo $ 40;
retain combo prev_item;
set codes;
by date cust_id Item_Code ;
if prev_item ^= item_code then do;
combo = catx(',',item_code,combo);
prev_item=item_code;
end;
if last.cust_id then do;
output; call missing (combo,prev_item) ;
end;
run;
/* sum the # customers per combo, per day */
proc sql;
create table cust_totals as
select date, combo, count(cust_id) as Customers
from sameday_combos
group by date, combo;
quit;
Output:
You might have wanted each item code to be in its own variable (item_code1, item_code2, etc.) per day. That's more of a transpose operation, and you may be able to adopt the above to get that...
Chris
Please let me know if I need to clariy my question. Thanks!
There is a better solution with hash object, I'm sure. But assuming I understand you correctly...
data codes;
infile datalines dlm=',';
length date 8 item_code 8 item_desc $ 10 cust_id 8;
format date date9.;
informat date mmddyy.;
input date Item_Code item_desc cust_id;
datalines;
08/18/2016,1245,Pizza,5555
09/17/2014,1245,Pizza,4357
08/18/2016,3344,Chicken,5555
09/17/2014,6742,Bread,4357
10/10/2015,3217,Salad,3218
08/18/2016,1245,Pizza,9999
09/17/2014,1245,Pizza,4357
08/18/2016,3344,Chicken,9999
;
run;
proc sort data=codes;
by date cust_id Item_Code;
run;
/* assemble combos per day, per customer */
data sameday_combos(keep=date combo cust_id);
length combo $ 40;
retain combo prev_item;
set codes;
by date cust_id Item_Code ;
if prev_item ^= item_code then do;
combo = catx(',',item_code,combo);
prev_item=item_code;
end;
if last.cust_id then do;
output; call missing (combo,prev_item) ;
end;
run;
/* sum the # customers per combo, per day */
proc sql;
create table cust_totals as
select date, combo, count(cust_id) as Customers
from sameday_combos
group by date, combo;
quit;
Output:
You might have wanted each item code to be in its own variable (item_code1, item_code2, etc.) per day. That's more of a transpose operation, and you may be able to adopt the above to get that...
Chris
That is perfect Chris! I wish I could thank you a million times. I researched several white papers but couldn't find what I was looking for. I really appreciate your time.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.