Using SAS EG 5.1
I have two data sets. Data A is a very large data pull (2Million + rows of data and 60+ columns). An example of some of these 60+ variables include Contract # , City, State, Volume, etc. Data B is a small excel file (~1000 rows and 2 columns). These two columns include Contract number and BCO.
What I am aiming to do is take Data A, add a new variable called "BCO 1". "BCO 1" will be the Data B BCO that associates with the corresponding Contract number. Verbadum what you would do when doing a vlookup in excel.
I have attempted to do this via the join tables query using a left join matching Contract # to Contract #. However, when I do this I end up with more rows than I started with. Because I am looking at volume numbers, the extra rows skew my numbers incorrectly. I would like to simply vlookup a value into the BCO 1 column, while keeping the same amount of rows I began with.
Everything looks correct.
I would double check your assumption that the lookup table doesn't have duplicate contract numbers.
You can use the Task for this:
TASK>DATA>Sort Data
Add contract number to the Sort By field
Click Options in left hand menu
Click "Keep Only the first record...."
Check how the results differ.
This would be a LEFT (or RIGHT) JOIN in your query, depending on table order.
The issue is, if you have multiple (many to many) matches you'll still end up with duplicates.
Are there any duplicate keys in your 'lookup' table?
Otherwise, with such a small dataset I would recommend a PROC FORMAT. See Example 4 here:
http://www2.sas.com/proceedings/sugi30/001-30.pdf
Along with the last examples that show how to create a format using a dataset.
FYI - 2 million rows + 60 columns is still small data 😉
In my lookup table (Data B) the Contract #'s are unique but the BCO has duplicates. For example one BCO could have 3 Contract #'s.
I am fairly new to SAS EG. So to understand the PROC Format - I have my two Excel Files imported into SAS EG. Then I should click new > program to bring up the code window. How does the Program code change in terms of my data? Or is there a way to do PROC Format using the query builder?
I would try the LEFT JOIN first, since that's likely easier in EG - especially if you're using the point and click tools, which it sounds like you are.
If you can post the code generated and a screen shot of the query we can help tailor it for what you need.
Original data has 2,207,926 rows. After join data has 2,208,343 rows.
I have limited down the number of variables just so everything would fit in the screenshots. Left join and query snapshot in attachment. Code below.
%_eg_conditional_dropds(WORK.QUERY_FOR_APPEND_TABLE_0001_0001);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_APPEND_TABLE_0001_0001(label="QUERY_FOR_APPEND_TABLE_0001") AS
SELECT t1.Volume,
t1.'Org City'n,
t1.'Org St'n,
t1.'Full Shipper Name'n,
t1.'Fiscal Year'n,
t1.'BO Name'n,
t1.'Freight Payor Contract #'n,
t2.BCO AS 'BCO 1'n
FROM WORK.QUERY_FOR_APPEND_TABLE_0001_0000 t1
LEFT JOIN WORK.MATCH_DATA t2 ON (t1.'Freight Payor Contract #'n = t2.'AGRT NUMBER'n);
QUIT;
Everything looks correct.
I would double check your assumption that the lookup table doesn't have duplicate contract numbers.
You can use the Task for this:
TASK>DATA>Sort Data
Add contract number to the Sort By field
Click Options in left hand menu
Click "Keep Only the first record...."
Check how the results differ.
Wow! That was my problem. The rows are matching up perfectly now. I did have some duplicates in my agreement number afterall.
Thank you!!!!!!!
You should verify why there are duplicates and how you need to deal with them.
This will pull the duplicates out for you. How to deal with them is more a business problem than a coding problem.
proc sort data=have;
by contractNo;
run;
data dups;
set have;
by contractNo;
if not (first.contractNo and last.contractNo);
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.