Solved: Re: Compare observations within groups

HarryLiu · Posted 10-29-2015 10:19 AM

Hi SAS Users,

I have a road block in conduct a project. The problem is how to compare observations within groups. My data set has a column named STORE_ID, which is the ID number of each store in a city. Another column is customer_id, obveriously it is ID of each consumed customer. I want to know the list of Customer_id that consumed not only in store A, but also in store B. I think correlated sub-query might be a possible solution, but don't know how to develop code. Any suggestions and sample codes are highly appreciated.

Thanks,

Harry

MikeZdeb · Posted 10-29-2015 11:25 AM

Hi, how about INTERSECT in SQL (works for 2 stores , works for 'n' stores) ...

data x;
input store_id :$1. customer_id @@;
datalines;
A 123 B 234 A 789 B 123 A 999 A 234 B 765 C 123
;

title 'CUSTOMERS COMMON TO STORES A & B';
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B';
quit;

title 'CUSTOMERS COMMON TO STORES A & B & C;
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B'
intersect
select customer_id from x where store_id = 'C';
quit;

CUSTOMERS COMMON TO STORES A & B

customer_id
123
234

CUSTOMERS COMMON TO STORES A & B & C

customer_id
123

View solution in original post

LinusH · Posted 10-29-2015 10:37 AM

I think that a group by with a having count(*) =2 clause would work, given that the store and customer the combined key.

Data never sleeps

Reeza · Posted 10-29-2015 10:48 AM

Do you only have two stores or multiple stores?

RW9 · Posted 10-29-2015 10:51 AM

Post some test data (in the form of a datastep) and required output.

Astounding · Posted 10-29-2015 10:56 AM

If you are only concerned with those two stores, here's a way:

proc sort data=have;

by customer_id;

run;

data a_only b_only both;

merge have (where=(store_id='A') in=shopped_at_a)

have (where=(store_id='B') in=shopped_at_b);

by customer_id;

if shopped_at_a=0 then output b_only;

else if shopped_at_b=0 then output a_only;

else output both;

run;

Note that this doesn't account for the possibility that a customer shopped more than one time at a store. There are ways to handle that, but we'd need a better picture of what is in your data and what you are trying to achieve.

Good luck.

HarryLiu · Posted 10-29-2015 11:12 AM

Thanks so much for everybody's suggestion. I have hundreds of stores and need to compare any interested stores (usually less than 4 stores). Attached is a sample of data set. Astounding's code looks good, but will be time consuming, since I have hundred stores.
Here is my data set,
Store_id Customer_ID
CAL 11542
CAL 11526
CHI 10069
CHI 11433
CHI 11542
GFC 11493
GFC 11526
GFC 10068
KAN 11482
KAN 11493

If I compare CAL with CHI, the result should be 11542. IF between CAL and GFC, the result should be 11526. If between GFC and KAN, the result should be 11493. If between CAL and KAN, result should be nothing.
Thanks,
Harry

MikeZdeb · Posted 10-29-2015 11:25 AM

Hi, how about INTERSECT in SQL (works for 2 stores , works for 'n' stores) ...

data x;
input store_id :$1. customer_id @@;
datalines;
A 123 B 234 A 789 B 123 A 999 A 234 B 765 C 123
;

title 'CUSTOMERS COMMON TO STORES A & B';
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B';
quit;

title 'CUSTOMERS COMMON TO STORES A & B & C;
proc sql;
select customer_id from x where store_id = 'A'
intersect
select customer_id from x where store_id = 'B'
intersect
select customer_id from x where store_id = 'C';
quit;

CUSTOMERS COMMON TO STORES A & B

customer_id
123
234

CUSTOMERS COMMON TO STORES A & B & C

customer_id
123

HarryLiu · Posted 10-29-2015 02:36 PM

Thanks everybody for various suggestions and helps. I found all code work very well for my project but this one is the most efficient. The only reason is that my data set is complex and there will be tons of comparison.
I appreciate everybody's time.
Thanks,
Harry

stat_sas · Posted 10-29-2015 11:58 AM

Try this.

data have;
input Store_id $ Customer_ID;
datalines;
CAL 11542
CAL 11526
CHI 10069
CHI 11433
CHI 11542
GFC 11493
GFC 11526
GFC 10068
KAN 11482
KAN 11493
;

proc sql;
create table store as
select * from have group by customer_id
having count(distinct store_id)>1;
quit;

proc transpose data=store out=want(drop=_Name_);
by customer_id;
var store_id;
run;

proc print data=want;
run;

Astounding · Posted 10-29-2015 12:07 PM

A more general approach might be just what you need.

proc sort data=have (keep=customer_id store_id) out=full_list nodupkey;

by customer_id store_id;

run;

data full_list;

set full_list;

shopped_there='Y';

run;

proc transpose data=full_list out=want;

by customer_id;

id store_id;

var shopped_there;

run;

It assumes that the store names form valid variable names in SAS, and gives you a "Y" for every store where a customer shopped. You can use that for any combination of subsetting going forward.

Registration is open

SAS Training: Just a Click Away