Hi SAS Users,
I have a road block in conduct a project. The problem is how to compare observations within groups. My data set has a column named STORE_ID, which is the ID number of each store in a city. Another column is customer_id, obveriously it is ID of each consumed customer. I want to know the list of Customer_id that consumed not only in store A, but also in store B. I think correlated sub-query might be a possible solution, but don't know how to develop code. Any suggestions and sample codes are highly appreciated.
Thanks,
Harry
Hi, how about INTERSECT in SQL (works for 2 stores , works for 'n' stores) ...
data x;
input store_id :$1. customer_id @@;
datalines;
A 123 B 234 A 789 B 123 A 999 A 234 B 765 C 123
;
title 'CUSTOMERS COMMON TO STORES A & B';
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B';
quit;
title 'CUSTOMERS COMMON TO STORES A & B & C;
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B'
intersect
select customer_id from x where store_id = 'C';
quit;
CUSTOMERS COMMON TO STORES A & B
customer_id
123
234
CUSTOMERS COMMON TO STORES A & B & C
customer_id
123
Post some test data (in the form of a datastep) and required output.
If you are only concerned with those two stores, here's a way:
proc sort data=have;
by customer_id;
run;
data a_only b_only both;
merge have (where=(store_id='A') in=shopped_at_a)
have (where=(store_id='B') in=shopped_at_b);
by customer_id;
if shopped_at_a=0 then output b_only;
else if shopped_at_b=0 then output a_only;
else output both;
run;
Note that this doesn't account for the possibility that a customer shopped more than one time at a store. There are ways to handle that, but we'd need a better picture of what is in your data and what you are trying to achieve.
Good luck.
Hi, how about INTERSECT in SQL (works for 2 stores , works for 'n' stores) ...
data x;
input store_id :$1. customer_id @@;
datalines;
A 123 B 234 A 789 B 123 A 999 A 234 B 765 C 123
;
title 'CUSTOMERS COMMON TO STORES A & B';
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B';
quit;
title 'CUSTOMERS COMMON TO STORES A & B & C;
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B'
intersect
select customer_id from x where store_id = 'C';
quit;
CUSTOMERS COMMON TO STORES A & B
customer_id
123
234
CUSTOMERS COMMON TO STORES A & B & C
customer_id
123
Try this.
data have;
input Store_id $ Customer_ID;
datalines;
CAL 11542
CAL 11526
CHI 10069
CHI 11433
CHI 11542
GFC 11493
GFC 11526
GFC 10068
KAN 11482
KAN 11493
;
proc sql;
create table store as
select * from have group by customer_id
having count(distinct store_id)>1;
quit;
proc transpose data=store out=want(drop=_Name_);
by customer_id;
var store_id;
run;
proc print data=want;
run;
A more general approach might be just what you need.
proc sort data=have (keep=customer_id store_id) out=full_list nodupkey;
by customer_id store_id;
run;
data full_list;
set full_list;
shopped_there='Y';
run;
proc transpose data=full_list out=want;
by customer_id;
id store_id;
var shopped_there;
run;
It assumes that the store names form valid variable names in SAS, and gives you a "Y" for every store where a customer shopped. You can use that for any combination of subsetting going forward.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.