Desktop productivity for business analysts and programmers

Comparing two variables/observations within a dataset

Occasional Contributor
Posts: 5

Comparing two variables/observations within a dataset

Hey, I have a dataset that has a couple of variables, with two particuliar variables possibly being different.  For example I have Location, Product, Item Number and Price and Year (price of that product for that year).   I have another record for each product for a different year.  So my data looks somewhat like this:

Location          Product                    Item #                    Price               Year    

Northeast        Window Cleaner      12345                    $1.99               2011

Northeast        Window Cleaner      12345                    $1.89               2010

What I need out of the thousands of records is if the price is the SAME for a particuliar product comparing those two years (2010 vs. 2011), then I want to extract those records ONLY.  I don't care if the prices were different.

I simply cannot find a way to do this from within EG.  I am new to SAS and do have some (very little!) programming experience, but not enough to be able to figure it out in the code.

Can anyone help?  Any help would be greatly appreciated and thanks in advance!


Super User
Posts: 23,932

Re: Comparing two variables/observations within a dataset

Posted in reply to SASintheSouth

Add the columns you want in query builder and then two calculated columns in your query builder - max of price and min of price.

In your filter say that they should be the same for the summarized data.

Super User
Posts: 6,899

Re: Comparing two variables/observations within a dataset

Posted in reply to SASintheSouth

Well, since EG lets you create a customized node, here's how you might do it by writing your own logic:

proc sort data=have;

   by location item_num year;


data want;

   set have;

*** If desired, add here:  where year in (2010, 2011);

   by location item_num price notsorted;

   if first.price=0 or last.price=0;


This code is a little more generalized than you asked for ... it will work if you have multiple years in the data.  It also assumes that you want to compare prices within the same location only.

Good luck.

Posts: 1,332

Re: Comparing two variables/observations within a dataset

Posted in reply to SASintheSouth

This is quite straightforward in Enterprise Guide:

1. If your dataset has more records than you're interested in, get rid of the unwanted ones with a Query Builder step. It's not clear from your description, but if you want ONLY 2010 and 2011 data, and have other years in the dataset, extract out only 2010 and 2011 into a temporary dataset.

2. Create a Query Builder step, and add item_number and price into the "Select Data" tab. (This assumes that item_number is the only thing you care about for duplicates. If you need to use location and product, add them as well.

3. Add price a second time (the resulting column name will probably be price1. Change that to count). In the Summary box, select "COUNT". That should turn on summary groups, and automatically add all of the other columns to the grouping.

4. Go to the "Filter Data" tab. In the "Filter the summarized data" section, set a filter to count > 1.

5. Run the query. What is should do is group the data by item_number and price (and location and product if you added them), and calculate the count of records. Then the "having" clause will select out the result records with a count > 1.

This dataset identifies all of the itemno / price combinations that you want, but it's only one record per result. (Note that there are a number of other ways to do this, but this should get you started). Now you need to extract all of your original records that match.

This is really easy.

1. Create a Query Builder step on the result dataset you just created.

2. Click the "Add Tables" button, and add your original dataset. Click the "Join Tables" button, and make sure that your two tables are joined with inner joins on itemno and price.

3. Move all of the columns from your original dataset to the "Select Data" tab, and run the query. This should be the results you want.


Ask a Question
Discussion stats
  • 3 replies
  • 4 in conversation