I have a dataset that contains purchases by customer by store. Customer #s & Store #s can be listed multiple times as customers make multiple purchases. Transaction # are all unique.
Store # Customer # Transaction #
1 10 7171
2 21 8389
3 40 5118
3 40 8971
4 40 9008
5 88 6673
Some of the stores are in close proximity to one another & have been grouped in pairs. I want to know which customers shopped at both stores. For example - Store # 3 & 4 have been identified as a pair. In the example above I need to write code that would identify that customer 40 shopped at stores 3 & 4 & my code would need to keep all three observations for customer 40.
I am a VERY novice SAS user & my approach would be to create a dataset for customers shopping at store 3 & one for customers shopping at store 4 - merge them to determine which customer #s were in both & then take those customer #s & bump them up against the dataset above to get all the transactions for customers that shopped at both stores. My problem is I have about 20 pairs of stores to check & my dataset has over 6K stores & about 10 million transactions.
I am wondering if there is a much easier way (just using the dataset above) to tell SAS to pull all transactions for customers that shopped at store # 3 & 4. In my example above I would be left with 3 obs for customer # 40.
Thank you for any suggestions.
Question: 1. How does your table that defines store-pairs structured? 2. Please confirm that you only are interested in those customers who shop in both stores of a pairs? If not, please lay out the wanted outcome.
Haikuo
Yes - I am only interested in the customers that shopped at both store pairs - and I need to include all the transactions those customers made at both store pairs.
Right now I don't have a table with the store pairs - I just have a written list. I suppose if I were to create a table I would just have 2 variables store1 & store2 & have the pairs on the same row:
store1 store2
3 4
* store 3 & 4 are considered a "pair"
Thanks for the response!
This could be a little bit complex (for a novice at least).
Spontaneously, I think data step programming with hash tables would be suitable way to solve this. Unfortunately, this is usually a task for the experienced programmer.
Another way to solve this, in a few steps, would to store your stora pairs table using a pair id (and one row for each store).
Then join with the transaction table (which could have been reduced to distinct values of customer and store on beforehand). Then you'll keep only customer/stores that has two records within the same store pair id (by using GROUP BY and HAVING constructs in SQL).
Here is how I would approach ... somehow you will have to tell sas the pairs you are interested in either by manual coding the 20 pairs or creating a separate excel and reading it in. Remember you need to link stores to pairs so have a row per store in you "pair table" seems like a better approach.
Here is what I did using your sample data (I just read in the data you had and made a pair table with the stores in the pair and A pair id that you could make into whatever you need):
data trans;
infile cards dlm=',';
input store customer transaction;
cards;
1, 10, 7171
2, 21, 8389
3, 40, 5118
3, 40, 8971
4, 40, 9008
5, 88, 6673
;
run;
data storepairs;
infile cards dlm=',';
input store pair $;
cards;
3, P1
4, P1
;
run;
proc sql;
create table transpair as
select a.*, b.pair
from trans as a , storepairs as b
where a.store = b.store
;
quit;
Hope this helps!
EJ
The first post just leaves you with the transactions of the pairs and I not completely sure what the rest of the need is. So if more is needed reply back and we can continue the solution.
EJ
Thank you for the responses. In my actual data I have 100s of customers that made multiple transactions at only store 3 or only store 4. I am only interested in those customers that purchased at both. I tried the code above, but it leaves me with not only customers that purchased at both stores, but also those customers that only purhcased at only 1 of the stores in the pair. The code below gives me the results I need, but I thought there may be an easier (less manual) way to get the same results. Thanks!
data store_3;
set trans;
where store=3; run;
data store_4;
set trans;
where store=4; run;
proc sql;
create table both as
select a.customer
from store_3 a
inner join store_4 b
on a.customer=b.customer; quit;
proc sort nodupkey; by customer; run;
proc sql;
create table Final as
select *
from trans a
inner join both b
on a.customer=b.customer
where store in (3,4); quit;
Results of Final:
Obs Store Customer Transaction
1 3 40 5118
2 3 40 8971
3 4 40 9008
Tweaked on Eric's code:
proc sql;
create table Final as
select a.*, b.pair
from trans as a , storepairs as b
where a.store = b.store
group by pair, customer
having count(distinct store)=2
;
quit;
Great idea ... I came up with the data step approach below but I think this works even better! I expanded the sample data to add a few more test cases in it.
data trans;
infile cards dlm=',';
input store customer transaction;
cards;
1, 10, 7171
2, 21, 8389
3, 40, 5118
3, 40, 8971
4, 40, 9008
4, 30, 10000
5, 88, 6673
5, 40,3333
6, 33,2222
3, 22,4555
7,22,5555
8,33,1111
9,33,8888
;
run;
data storepairs;
infile cards dlm=',';
input store pair $;
cards;
3, P1
4, P1
6, P2
8, P2
;
run;
proc sql;
create table transpair as
select a.*, b.pair
from trans as a , storepairs as b
where a.store = b.store
/*group by pair, customer
having count(distinct store)=2*/
order by b.pair, a.customer, a.store
;
quit;
data transpair2;
set transpair;
by pair customer store;
if first.customer then do;
match = store;
both = 0;
end;
if match ne store then both = 1;
retain match both;
if both = 1 then output;
run;
proc sql;
create table final as
select a.*
from transpair as a , transpair2 as b
where a.pair = b.pair and a.customer = b.customer
;
quit;
At least the two approaches should be able to give so insight into alternatives.
EJ
Thanks for your help guys. This is exactly what I needed!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.