Desktop productivity for business analysts and programmers

Proc compare function?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Proc compare function?

Hello all, 

 

 

I have a transaction data set with two different columns. One is Customer_ID and the other ist Good_ID. What I would like to do is a comparison of those two attributes group by every single customer. 

 

For example: 

 

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 2

 

Cust_ID = 2 and Good_ID = 3

Cust_ID = 2 and Good_ID = 3

Cust_ID = 2 and Good_ID = 4

 

I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer

 

While searching for a solution I find the compare function in SAS. Is the compare function in this case applicable? 

 

Thanks and regards,

Mariam

 

 

 


Accepted Solutions
Solution
‎05-03-2018 03:40 PM
PROC Star
Posts: 1,332

Re: Proc compare function?

Posted in reply to Mairam2345

Here's what I hope will work for you.

 

The first step is the query that I described in my first post. getting the count by region and product. Let's say the result dataset is named "Have1", and the count field is named "Count_Field".

 

Now create a query based on Have1, and pull region into the "Select Data" columns, then pull it in a second time. On the second column, again, set the Summary to "count", and the summary groups box should show that you're summarizing on region. Let's say the result dataset is named "Have2", and the count field is "Count_Field1". Next, on the Filter tab, drag the "computed column", which is your Count_Field1, to the "Filter the summarized data" area, and set the filter to select cases "Greater than" and 1. This will identify the region values that have more than one value of product.

 

Now a third query, on Have2. Add a table, your orignal dataset. Click the "Join Tables" button, and change the join from an inner join to a left or right join, whichever your original dataset is (so if it's on the right, make it a right join).

 

Pull all the variables from "Have" onto the "Select Data" tab. Now do "Computed Columns" > "New" > "Advanced Expression", and use a "Case" function to set a variable (I'll call it "Flag") to 0 if the region field in the Have2 dataset is null, or 1 otherwise.

 

This will identify the records that were joined, which are the ones that had more than one value of product.

 

Tom

View solution in original post


All Replies
Super User
Posts: 13,874

Re: Proc compare function?

Posted in reply to Mairam2345

Proc compare is likely not what you want if I understand what you are asking.

Best would be to provide short examples of the two data sets with the variables of concern and a desired result.

 

 

Occasional Contributor
Posts: 9

Re: Proc compare function?

Hi, 

 

thanks for your quick reply. 

 

The transaction data set looks like this: 

 

Column 1: Customer_ID | Column 2: Good_ID 

 

CustomerID_1 | Good_Banana

CustomerID_1 | Good_Banana

 

CustomerID_2 | Good_Banana

CustomerID_2 | Good_Banana

CustomerID_2 | Good_Banana

CustomerID_2 | Good_Orange

 

In this case I have 2 different customers. I have 2 transactions for customer 1 where the customer buys bananas. For customer 2 I have 4 transaction record. This customer 2 usually buys bananas but in one transaction he buys oranges. I want to find those transactions which have different goods than the usual transactions for one customer. 

 

Is that understandable? 

 

Thanks

Mar

 

 

 

Super User
Posts: 13,874

Re: Proc compare function?

Posted in reply to Mairam2345

I might try something like:

proc freq data=have order=freq;
tables customer_id*good_id/list  nopercent ;
run;

Which would create ordered output by decreasing frequency.

 

I am not going to guess at a rule for "different goods than the usual" because I don't know your customers or what typical orders might look like.

 

You might send the output to a data set with the OUT= option on the tables statement for your further processing.

Super User
Posts: 23,932

Re: Proc compare function?

Posted in reply to Mairam2345

Defining what's different is actually quite hard at scale but easy for a single example. If I regularly buy 6 bananas and 6 apples but buy 10 is that different? If I buy 3 socks is that different?

 

At any rate, it sounds like Market Basket Analysis is what you're doing. There's a SAS macro or if you have SAS EM that helps to enable that type of analysis.  

 

 

Occasional Contributor
Posts: 9

Re: Proc compare function?

Hi,

 

 

in my transaction data one customer usually buys the same product for all transactions.  So the scale is usually based on the one product that the customer has bought in the past. If customer1 buys bananas in 3 different transactions then he will also buys in further transactions. I want to find those outliers where customer1 suddenly buys oranges instead of bananas. 

 

I'm using the SAS enterprise guide. 

Super User
Posts: 23,932

Re: Proc compare function?

Posted in reply to Mairam2345

So, if customer primarily purchases a single item, I'd probably subset my data on those that purchased multiple items. 

You can use the HAVING clause in a query to select customers or records where they have multiple products.

 

 

proc sql;
create table want as
select *
group by customer_ID
having count(distinct productID) > 1;
quit;

There's some other examples here of counting distinct by groups and then you can filter based on those who have more than one group. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas

Occasional Contributor
Posts: 9

Re: Proc compare function?

Hi, 

 

 

is there no possibility to compare the the Good_IDs of one customer to other Good_Ids of the same customer?? 

 

Something like: If Good_ID of transaction1 is not equal to Good_ID of transaction 2 then new column = 1 ??


It's about comparing the results between different rows in one column...

 

 

Super User
Posts: 23,932

Re: Proc compare function?

Posted in reply to Mairam2345

@Mairam2345 Sure, there's lots of ways.  

But it's still not clear what you want. At this point its probably easiest if you post sample data and the exact output in the format you want. 

 

 

Solution
‎05-03-2018 03:40 PM
PROC Star
Posts: 1,332

Re: Proc compare function?

Posted in reply to Mairam2345

Here's what I hope will work for you.

 

The first step is the query that I described in my first post. getting the count by region and product. Let's say the result dataset is named "Have1", and the count field is named "Count_Field".

 

Now create a query based on Have1, and pull region into the "Select Data" columns, then pull it in a second time. On the second column, again, set the Summary to "count", and the summary groups box should show that you're summarizing on region. Let's say the result dataset is named "Have2", and the count field is "Count_Field1". Next, on the Filter tab, drag the "computed column", which is your Count_Field1, to the "Filter the summarized data" area, and set the filter to select cases "Greater than" and 1. This will identify the region values that have more than one value of product.

 

Now a third query, on Have2. Add a table, your orignal dataset. Click the "Join Tables" button, and change the join from an inner join to a left or right join, whichever your original dataset is (so if it's on the right, make it a right join).

 

Pull all the variables from "Have" onto the "Select Data" tab. Now do "Computed Columns" > "New" > "Advanced Expression", and use a "Case" function to set a variable (I'll call it "Flag") to 0 if the region field in the Have2 dataset is null, or 1 otherwise.

 

This will identify the records that were joined, which are the ones that had more than one value of product.

 

Tom

Super User
Posts: 23,932

Re: Proc compare function?

Posted in reply to Mairam2345

No, compare is not designed for what you're trying to do here. I'm not sure there is a defined procedure here in fact. 

 

Could you provide a data example of what you're trying to do, ie sample data and sample expected output? A small example is all that's needed.It's not exactly clear what output you'd want that would be the following:

 

I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer

And, are you using the GUI or programming?

 


@Mairam2345 wrote:

Hello all, 

 

 

I have a transaction data set with two different columns. One is Customer_ID and the other ist Good_ID. What I would like to do is a comparison of those two attributes group by every single customer. 

 

For example: 

 

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 1

Cust_ID = 1 and Good_ID = 2

 

Cust_ID = 2 and Good_ID = 3

Cust_ID = 2 and Good_ID = 3

Cust_ID = 2 and Good_ID = 4

 

I want to create a comparison for every single customer where each Good_ID is compare to other Good_ID's of the Customer

 

While searching for a solution I find the compare function in SAS. Is the compare function in this case applicable? 

 

Thanks and regards,

Mariam

 

 

 


 

PROC Star
Posts: 1,332

Re: Proc compare function?

I don't find it completely clear what you're after, but here are a couple of queries that might produce some useful data for you.

 

1. Add the counts to your original data

 

Set up a new query, with the results of the query I described in the previous post. Add your original table, and do an inner join between Customer_ID and Good_ID. Include all of the columns from the result table, and set the sort order to Customer_ID, the count field descending, and Good_ID. Run the query.

 

You should see that the results are grouped by customer, with the ID with the highest counts at the start.

 

2. How many different products per customer?

 

Set up a new query, with the results of the query I described in the previous post. Pull Customer_ID and Good_ID into the "Select Data" columns, and set the Summary to "count" for Good_ID.

 

The result will have one record per customer, with the count of how many different fruits they had.

 

Tom

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 294 views
  • 0 likes
  • 4 in conversation