BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Amber_Nicole94
Obsidian | Level 7

Using SAS EG 5.1

 

I have two data sets. Data A is a very large data pull (2Million + rows of data and 60+ columns). An example of some of these 60+ variables include Contract # , City, State, Volume, etc. Data B is a small excel file (~1000 rows and 2 columns). These two columns include Contract number and BCO.

 

What I am aiming to do is take Data A, add a new variable called "BCO 1". "BCO 1" will be the Data B BCO that associates with the corresponding Contract number. Verbadum what you would do when doing a vlookup in excel.

 

I have attempted to do this via the join tables query using a left join matching Contract # to Contract #. However, when I do this I end up with more rows than I started with. Because I am looking at volume numbers, the extra rows skew my numbers incorrectly. I would like to simply vlookup a value into the BCO 1 column, while keeping the same amount of rows I began with.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Everything looks correct. 

 

I would double check your assumption that the lookup table doesn't have duplicate contract numbers. 

You can use the Task for this:

 

TASK>DATA>Sort Data

Add contract number to the Sort By field

Click Options in left hand menu

Click "Keep Only the first record...."

 

Check how the results differ. 

View solution in original post

7 REPLIES 7
Reeza
Super User

This would be a LEFT (or RIGHT) JOIN in your query, depending on table order. 

 

The issue is, if you have multiple (many to many) matches you'll still end up with duplicates. 

 

Are there any duplicate keys in your 'lookup' table?

 

Otherwise, with such a small dataset I would recommend a PROC FORMAT. See Example 4 here:

http://www2.sas.com/proceedings/sugi30/001-30.pdf

 

Along with the last examples that show how to create a format using a dataset.

 

FYI - 2 million rows + 60 columns is still small data 😉

Amber_Nicole94
Obsidian | Level 7

In my lookup table (Data B) the Contract #'s are unique but the BCO has duplicates. For example one BCO could have 3 Contract #'s.

 

I am fairly new to SAS EG. So to understand the PROC Format - I have my two Excel Files imported into SAS EG. Then I should click new > program to bring up the code window. How does the Program code change in terms of my data? Or is there a way to do PROC Format using the query builder?

 

Reeza
Super User

I would try the LEFT JOIN first, since that's likely easier in EG - especially if you're using the point and click tools, which it sounds like you are. 

 

If you can post the code generated and a screen shot of the query we can help tailor it for what you need.

Amber_Nicole94
Obsidian | Level 7

Original data has 2,207,926 rows. After join data has 2,208,343 rows.

 

I have limited down the number of variables just so everything would fit in the screenshots. Left join and query snapshot in attachment. Code below.

 

%_eg_conditional_dropds(WORK.QUERY_FOR_APPEND_TABLE_0001_0001);

PROC SQL;
   CREATE TABLE WORK.QUERY_FOR_APPEND_TABLE_0001_0001(label="QUERY_FOR_APPEND_TABLE_0001") AS 
   SELECT t1.Volume, 
          t1.'Org City'n, 
          t1.'Org St'n, 
          t1.'Full Shipper Name'n, 
          t1.'Fiscal Year'n, 
          t1.'BO Name'n, 
          t1.'Freight Payor Contract #'n, 
          t2.BCO AS 'BCO 1'n
      FROM WORK.QUERY_FOR_APPEND_TABLE_0001_0000 t1
           LEFT JOIN WORK.MATCH_DATA t2 ON (t1.'Freight Payor Contract #'n = t2.'AGRT NUMBER'n);
QUIT;
Reeza
Super User

Everything looks correct. 

 

I would double check your assumption that the lookup table doesn't have duplicate contract numbers. 

You can use the Task for this:

 

TASK>DATA>Sort Data

Add contract number to the Sort By field

Click Options in left hand menu

Click "Keep Only the first record...."

 

Check how the results differ. 

Amber_Nicole94
Obsidian | Level 7

Wow! That was my problem. The rows are matching up perfectly now. I did have some duplicates in my agreement number afterall.

 

Thank you!!!!!!!

Reeza
Super User

You should verify why there are duplicates and how you need to deal with them.

This will pull the duplicates out for you. How to deal with them is more a business problem than a coding problem. 

 

proc sort data=have;
by contractNo;
run;

data dups;
set have;
by contractNo;

if not (first.contractNo and last.contractNo);

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3035 views
  • 0 likes
  • 2 in conversation