Hello all,
I have a transaction data set with two different columns. One is Customer_ID and the other ist Good_ID. What I would like to do is a comparison of those two attributes group by every single customer.
For example:
Cust_ID = 1 and Good_ID = 1
Cust_ID = 1 and Good_ID = 1
Cust_ID = 1 and Good_ID = 1
Cust_ID = 1 and Good_ID = 2
Cust_ID = 2 and Good_ID = 3
Cust_ID = 2 and Good_ID = 3
Cust_ID = 2 and Good_ID = 4
I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer
While searching for a solution I find the compare function in SAS. Is the compare function in this case applicable?
Thanks and regards,
Mariam
Here's what I hope will work for you.
The first step is the query that I described in my first post. getting the count by region and product. Let's say the result dataset is named "Have1", and the count field is named "Count_Field".
Now create a query based on Have1, and pull region into the "Select Data" columns, then pull it in a second time. On the second column, again, set the Summary to "count", and the summary groups box should show that you're summarizing on region. Let's say the result dataset is named "Have2", and the count field is "Count_Field1". Next, on the Filter tab, drag the "computed column", which is your Count_Field1, to the "Filter the summarized data" area, and set the filter to select cases "Greater than" and 1. This will identify the region values that have more than one value of product.
Now a third query, on Have2. Add a table, your orignal dataset. Click the "Join Tables" button, and change the join from an inner join to a left or right join, whichever your original dataset is (so if it's on the right, make it a right join).
Pull all the variables from "Have" onto the "Select Data" tab. Now do "Computed Columns" > "New" > "Advanced Expression", and use a "Case" function to set a variable (I'll call it "Flag") to 0 if the region field in the Have2 dataset is null, or 1 otherwise.
This will identify the records that were joined, which are the ones that had more than one value of product.
Tom
Proc compare is likely not what you want if I understand what you are asking.
Best would be to provide short examples of the two data sets with the variables of concern and a desired result.
Hi,
thanks for your quick reply.
The transaction data set looks like this:
Column 1: Customer_ID | Column 2: Good_ID
CustomerID_1 | Good_Banana
CustomerID_1 | Good_Banana
CustomerID_2 | Good_Banana
CustomerID_2 | Good_Banana
CustomerID_2 | Good_Banana
CustomerID_2 | Good_Orange
In this case I have 2 different customers. I have 2 transactions for customer 1 where the customer buys bananas. For customer 2 I have 4 transaction record. This customer 2 usually buys bananas but in one transaction he buys oranges. I want to find those transactions which have different goods than the usual transactions for one customer.
Is that understandable?
Thanks
Mar
I might try something like:
proc freq data=have order=freq; tables customer_id*good_id/list nopercent ; run;
Which would create ordered output by decreasing frequency.
I am not going to guess at a rule for "different goods than the usual" because I don't know your customers or what typical orders might look like.
You might send the output to a data set with the OUT= option on the tables statement for your further processing.
Defining what's different is actually quite hard at scale but easy for a single example. If I regularly buy 6 bananas and 6 apples but buy 10 is that different? If I buy 3 socks is that different?
At any rate, it sounds like Market Basket Analysis is what you're doing. There's a SAS macro or if you have SAS EM that helps to enable that type of analysis.
Hi,
in my transaction data one customer usually buys the same product for all transactions. So the scale is usually based on the one product that the customer has bought in the past. If customer1 buys bananas in 3 different transactions then he will also buys in further transactions. I want to find those outliers where customer1 suddenly buys oranges instead of bananas.
I'm using the SAS enterprise guide.
So, if customer primarily purchases a single item, I'd probably subset my data on those that purchased multiple items.
You can use the HAVING clause in a query to select customers or records where they have multiple products.
proc sql;
create table want as
select *
group by customer_ID
having count(distinct productID) > 1;
quit;
There's some other examples here of counting distinct by groups and then you can filter based on those who have more than one group.
https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas
Hi,
is there no possibility to compare the the Good_IDs of one customer to other Good_Ids of the same customer??
Something like: If Good_ID of transaction1 is not equal to Good_ID of transaction 2 then new column = 1 ??
It's about comparing the results between different rows in one column...
@Mairam2345 Sure, there's lots of ways.
But it's still not clear what you want. At this point its probably easiest if you post sample data and the exact output in the format you want.
Here's what I hope will work for you.
The first step is the query that I described in my first post. getting the count by region and product. Let's say the result dataset is named "Have1", and the count field is named "Count_Field".
Now create a query based on Have1, and pull region into the "Select Data" columns, then pull it in a second time. On the second column, again, set the Summary to "count", and the summary groups box should show that you're summarizing on region. Let's say the result dataset is named "Have2", and the count field is "Count_Field1". Next, on the Filter tab, drag the "computed column", which is your Count_Field1, to the "Filter the summarized data" area, and set the filter to select cases "Greater than" and 1. This will identify the region values that have more than one value of product.
Now a third query, on Have2. Add a table, your orignal dataset. Click the "Join Tables" button, and change the join from an inner join to a left or right join, whichever your original dataset is (so if it's on the right, make it a right join).
Pull all the variables from "Have" onto the "Select Data" tab. Now do "Computed Columns" > "New" > "Advanced Expression", and use a "Case" function to set a variable (I'll call it "Flag") to 0 if the region field in the Have2 dataset is null, or 1 otherwise.
This will identify the records that were joined, which are the ones that had more than one value of product.
Tom
No, compare is not designed for what you're trying to do here. I'm not sure there is a defined procedure here in fact.
Could you provide a data example of what you're trying to do, ie sample data and sample expected output? A small example is all that's needed.It's not exactly clear what output you'd want that would be the following:
I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer
And, are you using the GUI or programming?
@Mairam2345 wrote:
Hello all,
I have a transaction data set with two different columns. One is Customer_ID and the other ist Good_ID. What I would like to do is a comparison of those two attributes group by every single customer.
For example:
Cust_ID = 1 and Good_ID = 1
Cust_ID = 1 and Good_ID = 1
Cust_ID = 1 and Good_ID = 1
Cust_ID = 1 and Good_ID = 2
Cust_ID = 2 and Good_ID = 3
Cust_ID = 2 and Good_ID = 3
Cust_ID = 2 and Good_ID = 4
I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer
While searching for a solution I find the compare function in SAS. Is the compare function in this case applicable?
Thanks and regards,
Mariam
I don't find it completely clear what you're after, but here are a couple of queries that might produce some useful data for you.
1. Add the counts to your original data
Set up a new query, with the results of the query I described in the previous post. Add your original table, and do an inner join between Customer_ID and Good_ID. Include all of the columns from the result table, and set the sort order to Customer_ID, the count field descending, and Good_ID. Run the query.
You should see that the results are grouped by customer, with the ID with the highest counts at the start.
2. How many different products per customer?
Set up a new query, with the results of the query I described in the previous post. Pull Customer_ID and Good_ID into the "Select Data" columns, and set the Summary to "count" for Good_ID.
The result will have one record per customer, with the count of how many different fruits they had.
Tom
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.