Extracting data based on criteria in multiple rows

Reply
New Contributor
Posts: 4

Extracting data based on criteria in multiple rows

I have a dataset that contains purchases by customer by store.  Customer #s & Store #s can be listed multiple times as customers make multiple purchases.  Transaction # are all unique. 

Store #      Customer #      Transaction #

1               10                   7171

2               21                   8389

3               40                   5118

3               40                   8971

4               40                   9008

5               88                   6673

Some of the stores are in close proximity to one another & have been grouped in pairs.  I want to know which customers shopped at both stores.  For example - Store # 3 & 4 have been identified as a pair.  In the example above I need to write code that would identify that customer 40 shopped at stores 3 & 4 & my code would need to keep all three observations for customer 40. 

I am a VERY novice SAS user & my approach would be to create a dataset for customers shopping at store 3 & one for customers shopping at store 4 - merge them to determine which customer #s were in both & then take those customer #s & bump them up against the dataset above to get all the transactions for customers that shopped at both stores.  My problem is I have about 20 pairs of stores to check & my dataset has over 6K stores & about 10 million transactions. 

I am wondering if there is a much easier way (just using the dataset above) to tell SAS to pull all transactions for customers that shopped at store # 3 & 4.  In my example above I would be left with 3 obs for customer # 40. 

Thank you for any suggestions. 

Respected Advisor
Posts: 3,124

Re: Extracting data based on criteria in multiple rows

Question: 1. How does your table that defines store-pairs structured? 2. Please confirm that you only are interested in those customers who shop in both stores of a pairs? If not, please lay out the wanted outcome.

Haikuo

New Contributor
Posts: 4

Re: Extracting data based on criteria in multiple rows

Yes - I am only interested in the customers that shopped at both store pairs - and I need to include all the transactions those customers made at both store pairs.

Right now I don't have a table with the store pairs - I just have a written list.  I suppose if I were to create a table I would just have 2 variables store1 & store2 & have the pairs on the same row:

store1  store2

3          4

* store 3 & 4 are considered a "pair"

Thanks for the response!

Super User
Posts: 5,256

Re: Extracting data based on criteria in multiple rows

This could be a little bit complex (for a novice at least).

Spontaneously, I think data step programming with hash tables would be suitable way to solve this. Unfortunately, this is usually a task for the experienced programmer.

Another way to solve this, in a few steps, would to store your stora pairs table using a pair id (and one row for each store).

Then join with the transaction table (which could have been reduced to distinct values of customer and store on beforehand). Then you'll keep only customer/stores that has two records within the same store pair id (by using GROUP BY and HAVING constructs in SQL).

Data never sleeps
Super Contributor
Posts: 333

Re: Extracting data based on criteria in multiple rows

Here is how I would approach ... somehow you will have to tell sas the pairs you are interested in either by manual coding the 20 pairs or creating a separate excel and reading it in.  Remember you need to link stores to pairs so have a row per store in you "pair table" seems like a better approach.

Here is what I did using your sample data (I just read in the data you had and made a pair table with the stores in the pair and A pair id that you could make into whatever you need):

data trans;

  infile cards dlm=',';

  input store customer transaction;

  cards;

1,               10,                   7171

2,              21,                   8389

3,               40,                   5118

3,               40,                   8971

4,               40,                   9008

5,               88,                   6673

;

run;

data storepairs;

  infile cards dlm=',';

  input store pair $;

  cards;

3, P1

4, P1

;

run;

proc sql;

  create table transpair as

  select a.*, b.pair

  from trans as a , storepairs as b

  where a.store = b.store

  ;

quit;

Hope this helps!

EJ

Super Contributor
Posts: 333

Re: Extracting data based on criteria in multiple rows

The first post just leaves you with the transactions of the pairs and I not completely sure what the rest of the need is. So if more is needed reply back and we can continue the solution.

EJ

New Contributor
Posts: 4

Re: Extracting data based on criteria in multiple rows

Thank you for the responses.  In my actual data I have 100s of customers that made multiple transactions at only store 3 or only store 4.  I am only interested in those customers that purchased at both.  I tried the code above, but it leaves me with not only customers that purchased at both stores, but also those customers that only purhcased at only 1 of the stores in the pair.  The code below gives me the results I need, but I thought there may be an easier (less manual) way to get the same results.  Thanks! 

data store_3;

set trans;

where store=3; run;

data store_4;

set trans;

where store=4; run;

proc sql;

create table both as

select a.customer

from store_3 a

inner join store_4 b

on a.customer=b.customer; quit;

proc sort nodupkey; by customer; run;

proc sql;

create table Final as

select *

from trans a

inner join both b

on a.customer=b.customer

where store in (3,4); quit;

Results of Final:

Obs       Store    Customer    Transaction

  1            3          40        5118

  2            3          40        8971

  3            4          40        9008

Respected Advisor
Posts: 3,124

Re: Extracting data based on criteria in multiple rows

Tweaked on Eric's code:

proc sql;

  create table Final as

  select a.*, b.pair

  from trans as a , storepairs as b

  where a.store = b.store

  group by pair, customer

  having count(distinct store)=2

  ;

quit;

Super Contributor
Posts: 333

Re: Extracting data based on criteria in multiple rows

Great idea ... I came up with the data step approach below but I think this works even better! I expanded the sample data to add a few more test cases in it.

data trans;

  infile cards dlm=',';

  input store customer transaction;

  cards;

1,               10,                   7171

2,              21,                   8389

3,               40,                   5118

3,               40,                   8971

4,               40,                   9008

4, 30, 10000

5,               88,                   6673

5, 40,3333

6, 33,2222

3, 22,4555

7,22,5555

8,33,1111

9,33,8888

;

run;

data storepairs;

  infile cards dlm=',';

  input store pair $;

  cards;

3, P1

4, P1

6, P2

8, P2

;

run;

proc sql;

  create table transpair as

  select a.*, b.pair

  from trans as a , storepairs as b

  where a.store = b.store

  /*group by pair, customer

  having count(distinct store)=2*/

  order by b.pair, a.customer, a.store

  ;

quit;

data transpair2;

  set transpair;

  by pair customer store;

  if first.customer then do;

  match = store;

  both = 0;

  end;

  if match ne store then both = 1;

  retain match both;

  if both = 1 then output;

run;

proc sql;

  create table final as

  select a.*

  from transpair as a , transpair2 as b

  where a.pair = b.pair and a.customer = b.customer

  ;

quit;

At least the two approaches should be able to give so insight into alternatives.

EJ

New Contributor
Posts: 4

Re: Extracting data based on criteria in multiple rows

Thanks for your help guys.  This is exactly what I needed! 

Ask a Question
Discussion stats
  • 9 replies
  • 669 views
  • 3 likes
  • 4 in conversation