10-16-2012 01:24 PM
Hey, I have a dataset that has a couple of variables, with two particuliar variables possibly being different. For example I have Location, Product, Item Number and Price and Year (price of that product for that year). I have another record for each product for a different year. So my data looks somewhat like this:
Location Product Item # Price Year
Northeast Window Cleaner 12345 $1.99 2011
Northeast Window Cleaner 12345 $1.89 2010
What I need out of the thousands of records is if the price is the SAME for a particuliar product comparing those two years (2010 vs. 2011), then I want to extract those records ONLY. I don't care if the prices were different.
I simply cannot find a way to do this from within EG. I am new to SAS and do have some (very little!) programming experience, but not enough to be able to figure it out in the code.
Can anyone help? Any help would be greatly appreciated and thanks in advance!
10-16-2012 01:40 PM
Add the columns you want in query builder and then two calculated columns in your query builder - max of price and min of price.
In your filter say that they should be the same for the summarized data.
10-16-2012 02:02 PM
Well, since EG lets you create a customized node, here's how you might do it by writing your own logic:
proc sort data=have;
by location item_num year;
*** If desired, add here: where year in (2010, 2011);
by location item_num price notsorted;
if first.price=0 or last.price=0;
This code is a little more generalized than you asked for ... it will work if you have multiple years in the data. It also assumes that you want to compare prices within the same location only.
10-16-2012 02:50 PM
This is quite straightforward in Enterprise Guide:
1. If your dataset has more records than you're interested in, get rid of the unwanted ones with a Query Builder step. It's not clear from your description, but if you want ONLY 2010 and 2011 data, and have other years in the dataset, extract out only 2010 and 2011 into a temporary dataset.
2. Create a Query Builder step, and add item_number and price into the "Select Data" tab. (This assumes that item_number is the only thing you care about for duplicates. If you need to use location and product, add them as well.
3. Add price a second time (the resulting column name will probably be price1. Change that to count). In the Summary box, select "COUNT". That should turn on summary groups, and automatically add all of the other columns to the grouping.
4. Go to the "Filter Data" tab. In the "Filter the summarized data" section, set a filter to count > 1.
5. Run the query. What is should do is group the data by item_number and price (and location and product if you added them), and calculate the count of records. Then the "having" clause will select out the result records with a count > 1.
This dataset identifies all of the itemno / price combinations that you want, but it's only one record per result. (Note that there are a number of other ways to do this, but this should get you started). Now you need to extract all of your original records that match.
This is really easy.
1. Create a Query Builder step on the result dataset you just created.
2. Click the "Add Tables" button, and add your original dataset. Click the "Join Tables" button, and make sure that your two tables are joined with inner joins on itemno and price.
3. Move all of the columns from your original dataset to the "Select Data" tab, and run the query. This should be the results you want.