BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mairam2345
Fluorite | Level 6

Hello all, 

 

 

I have a transaction data set with two different columns. One is Customer_ID and the other ist Good_ID. What I would like to do is a comparison of those two attributes group by every single customer. 

 

For example: 

 

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 2

 

Cust_ID = 2 and Good_ID = 3

Cust_ID = 2 and Good_ID = 3

Cust_ID = 2 and Good_ID = 4

 

I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer

 

While searching for a solution I find the compare function in SAS. Is the compare function in this case applicable? 

 

Thanks and regards,

Mariam

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
TomKari
Onyx | Level 15

Here's what I hope will work for you.

 

The first step is the query that I described in my first post. getting the count by region and product. Let's say the result dataset is named "Have1", and the count field is named "Count_Field".

 

Now create a query based on Have1, and pull region into the "Select Data" columns, then pull it in a second time. On the second column, again, set the Summary to "count", and the summary groups box should show that you're summarizing on region. Let's say the result dataset is named "Have2", and the count field is "Count_Field1". Next, on the Filter tab, drag the "computed column", which is your Count_Field1, to the "Filter the summarized data" area, and set the filter to select cases "Greater than" and 1. This will identify the region values that have more than one value of product.

 

Now a third query, on Have2. Add a table, your orignal dataset. Click the "Join Tables" button, and change the join from an inner join to a left or right join, whichever your original dataset is (so if it's on the right, make it a right join).

 

Pull all the variables from "Have" onto the "Select Data" tab. Now do "Computed Columns" > "New" > "Advanced Expression", and use a "Case" function to set a variable (I'll call it "Flag") to 0 if the region field in the Have2 dataset is null, or 1 otherwise.

 

This will identify the records that were joined, which are the ones that had more than one value of product.

 

Tom

View solution in original post

11 REPLIES 11
ballardw
Super User

Proc compare is likely not what you want if I understand what you are asking.

Best would be to provide short examples of the two data sets with the variables of concern and a desired result.

 

 

Mairam2345
Fluorite | Level 6

Hi, 

 

thanks for your quick reply. 

 

The transaction data set looks like this: 

 

Column 1: Customer_ID | Column 2: Good_ID 

 

CustomerID_1 | Good_Banana

CustomerID_1 | Good_Banana

 

CustomerID_2 | Good_Banana

CustomerID_2 | Good_Banana

CustomerID_2 | Good_Banana

CustomerID_2 | Good_Orange

 

In this case I have 2 different customers. I have 2 transactions for customer 1 where the customer buys bananas. For customer 2 I have 4 transaction record. This customer 2 usually buys bananas but in one transaction he buys oranges. I want to find those transactions which have different goods than the usual transactions for one customer. 

 

Is that understandable? 

 

Thanks

Mar

 

 

 

ballardw
Super User

I might try something like:

proc freq data=have order=freq;
tables customer_id*good_id/list  nopercent ;
run;

Which would create ordered output by decreasing frequency.

 

I am not going to guess at a rule for "different goods than the usual" because I don't know your customers or what typical orders might look like.

 

You might send the output to a data set with the OUT= option on the tables statement for your further processing.

Reeza
Super User

Defining what's different is actually quite hard at scale but easy for a single example. If I regularly buy 6 bananas and 6 apples but buy 10 is that different? If I buy 3 socks is that different?

 

At any rate, it sounds like Market Basket Analysis is what you're doing. There's a SAS macro or if you have SAS EM that helps to enable that type of analysis.  

 

 

Mairam2345
Fluorite | Level 6

Hi,

 

 

in my transaction data one customer usually buys the same product for all transactions.  So the scale is usually based on the one product that the customer has bought in the past. If customer1 buys bananas in 3 different transactions then he will also buys in further transactions. I want to find those outliers where customer1 suddenly buys oranges instead of bananas. 

 

I'm using the SAS enterprise guide. 

Reeza
Super User

So, if customer primarily purchases a single item, I'd probably subset my data on those that purchased multiple items. 

You can use the HAVING clause in a query to select customers or records where they have multiple products.

 

 

proc sql;
create table want as
select *
group by customer_ID
having count(distinct productID) > 1;
quit;

There's some other examples here of counting distinct by groups and then you can filter based on those who have more than one group. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas

Mairam2345
Fluorite | Level 6

Hi, 

 

 

is there no possibility to compare the the Good_IDs of one customer to other Good_Ids of the same customer?? 

 

Something like: If Good_ID of transaction1 is not equal to Good_ID of transaction 2 then new column = 1 ??


It's about comparing the results between different rows in one column...

 

 

Reeza
Super User

@Mairam2345 Sure, there's lots of ways.  

But it's still not clear what you want. At this point its probably easiest if you post sample data and the exact output in the format you want. 

 

 

TomKari
Onyx | Level 15

Here's what I hope will work for you.

 

The first step is the query that I described in my first post. getting the count by region and product. Let's say the result dataset is named "Have1", and the count field is named "Count_Field".

 

Now create a query based on Have1, and pull region into the "Select Data" columns, then pull it in a second time. On the second column, again, set the Summary to "count", and the summary groups box should show that you're summarizing on region. Let's say the result dataset is named "Have2", and the count field is "Count_Field1". Next, on the Filter tab, drag the "computed column", which is your Count_Field1, to the "Filter the summarized data" area, and set the filter to select cases "Greater than" and 1. This will identify the region values that have more than one value of product.

 

Now a third query, on Have2. Add a table, your orignal dataset. Click the "Join Tables" button, and change the join from an inner join to a left or right join, whichever your original dataset is (so if it's on the right, make it a right join).

 

Pull all the variables from "Have" onto the "Select Data" tab. Now do "Computed Columns" > "New" > "Advanced Expression", and use a "Case" function to set a variable (I'll call it "Flag") to 0 if the region field in the Have2 dataset is null, or 1 otherwise.

 

This will identify the records that were joined, which are the ones that had more than one value of product.

 

Tom

Reeza
Super User

No, compare is not designed for what you're trying to do here. I'm not sure there is a defined procedure here in fact. 

 

Could you provide a data example of what you're trying to do, ie sample data and sample expected output? A small example is all that's needed.It's not exactly clear what output you'd want that would be the following:

 

I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer

And, are you using the GUI or programming?

 


@Mairam2345 wrote:

Hello all, 

 

 

I have a transaction data set with two different columns. One is Customer_ID and the other ist Good_ID. What I would like to do is a comparison of those two attributes group by every single customer. 

 

For example: 

 

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 2

 

Cust_ID = 2 and Good_ID = 3

Cust_ID = 2 and Good_ID = 3

Cust_ID = 2 and Good_ID = 4

 

I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer

 

While searching for a solution I find the compare function in SAS. Is the compare function in this case applicable? 

 

Thanks and regards,

Mariam

 

 

 


 

TomKari
Onyx | Level 15

I don't find it completely clear what you're after, but here are a couple of queries that might produce some useful data for you.

 

1. Add the counts to your original data

 

Set up a new query, with the results of the query I described in the previous post. Add your original table, and do an inner join between Customer_ID and Good_ID. Include all of the columns from the result table, and set the sort order to Customer_ID, the count field descending, and Good_ID. Run the query.

 

You should see that the results are grouped by customer, with the ID with the highest counts at the start.

 

2. How many different products per customer?

 

Set up a new query, with the results of the query I described in the previous post. Pull Customer_ID and Good_ID into the "Select Data" columns, and set the Summary to "count" for Good_ID.

 

The result will have one record per customer, with the count of how many different fruits they had.

 

Tom

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1425 views
  • 0 likes
  • 4 in conversation