BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HarryLiu
Obsidian | Level 7

Hi SAS Users,

I have a road block in conduct a project. The problem is how to compare observations within groups. My data set has a column named STORE_ID, which is the ID number of each store in a city. Another column is customer_id, obveriously it is ID of each consumed customer. I want to know the list of Customer_id that consumed not only in store A, but also in store B. I think correlated sub-query might be a possible solution, but don't know how to develop code. Any suggestions and sample codes are highly appreciated.

Thanks,

Harry 

 

1 ACCEPTED SOLUTION

Accepted Solutions
MikeZdeb
Rhodochrosite | Level 12

Hi, how about INTERSECT in SQL (works for 2 stores , works for 'n' stores) ...

 

data x;
input store_id :$1. customer_id @@;
datalines;
A 123 B 234 A 789 B 123 A 999 A 234 B 765 C 123
;

 

title 'CUSTOMERS COMMON TO STORES A & B';
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B';
quit;

 

title 'CUSTOMERS COMMON TO STORES A & B & C;
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B'
intersect
select customer_id from x where store_id = 'C';
quit;


CUSTOMERS COMMON TO STORES A & B

customer_id
        123
        234

 

CUSTOMERS COMMON TO STORES A & B & C

customer_id
        123

View solution in original post

9 REPLIES 9
LinusH
Tourmaline | Level 20
I think that a group by with a having count(*) =2 clause would work, given that the store and customer the combined key.
Data never sleeps
Reeza
Super User
Do you only have two stores or multiple stores?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post some test data (in the form of a datastep) and required output.

Astounding
PROC Star

If you are only concerned with those two stores, here's a way:

 

proc sort data=have;

by customer_id;

run;

 

data a_only b_only both;

   merge have (where=(store_id='A') in=shopped_at_a)

              have (where=(store_id='B') in=shopped_at_b);

   by customer_id;

   if shopped_at_a=0 then output b_only;

   else if shopped_at_b=0 then output a_only;

   else output both;

run;

 

Note that this doesn't account for the possibility that a customer shopped more than one time at a store.  There are ways to handle that, but we'd need a better picture of what is in your data and what you are trying to achieve.

 

Good luck.

HarryLiu
Obsidian | Level 7
Thanks so much for everybody's suggestion. I have hundreds of stores and need to compare any interested stores (usually less than 4 stores). Attached is a sample of data set. Astounding's code looks good, but will be time consuming, since I have hundred stores.
Here is my data set,
Store_id Customer_ID
CAL 11542
CAL 11526
CHI 10069
CHI 11433
CHI 11542
GFC 11493
GFC 11526
GFC 10068
KAN 11482
KAN 11493

If I compare CAL with CHI, the result should be 11542. IF between CAL and GFC, the result should be 11526. If between GFC and KAN, the result should be 11493. If between CAL and KAN, result should be nothing.
Thanks,
Harry
MikeZdeb
Rhodochrosite | Level 12

Hi, how about INTERSECT in SQL (works for 2 stores , works for 'n' stores) ...

 

data x;
input store_id :$1. customer_id @@;
datalines;
A 123 B 234 A 789 B 123 A 999 A 234 B 765 C 123
;

 

title 'CUSTOMERS COMMON TO STORES A & B';
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B';
quit;

 

title 'CUSTOMERS COMMON TO STORES A & B & C;
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B'
intersect
select customer_id from x where store_id = 'C';
quit;


CUSTOMERS COMMON TO STORES A & B

customer_id
        123
        234

 

CUSTOMERS COMMON TO STORES A & B & C

customer_id
        123

HarryLiu
Obsidian | Level 7
Thanks everybody for various suggestions and helps. I found all code work very well for my project but this one is the most efficient. The only reason is that my data set is complex and there will be tons of comparison.
I appreciate everybody's time.
Thanks,
Harry
stat_sas
Ammonite | Level 13

Try this.

 

data have;
input Store_id $ Customer_ID;
datalines;
CAL 11542
CAL 11526
CHI 10069
CHI 11433
CHI 11542
GFC 11493
GFC 11526
GFC 10068
KAN 11482
KAN 11493
;

proc sql;
create table store as
select * from have group by customer_id
having count(distinct store_id)>1;
quit;

 

proc transpose data=store out=want(drop=_Name_);
by customer_id;
var store_id;
run;

 

proc print data=want;
run;

Astounding
PROC Star

A more general approach might be just what you need.

 

proc sort data=have (keep=customer_id store_id) out=full_list nodupkey;

   by customer_id store_id;

run;

 

data full_list;

   set full_list;

   shopped_there='Y';

run;

 

proc transpose data=full_list out=want;

   by customer_id;

   id store_id;

   var shopped_there;

run;

 

It assumes that the store names form valid variable names in SAS, and gives you a "Y" for every store where a customer shopped.  You can use that for any combination of subsetting going forward.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1938 views
  • 2 likes
  • 7 in conversation