BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tehorne
Fluorite | Level 6

Hi-

 

   I have imported two different excel files:

1. The first excel file is the starting file.

2. The second excel file is the a spreadsheet that continously updates with new records.

 

I have set up an append query to join these two sheets. What I am needing help with is writing the program to get rid of the duplicates that sits in the SECOND excel sheet, because I need to use the records that are in the FIRST spreadsheet that are considered 'duplicates'. I just need the SECOND excel sheet to append all new records, seeing that it will be the one that is updated everyday.

 

I'm not sure of what all you need from me.. but just let me know.

 

FYI,, I am very new to the SAS language and process flows. So you may have to give me step by step examples. 

1 ACCEPTED SOLUTION

Accepted Solutions
tehorne
Fluorite | Level 6

You are absotutely right!

 

This helped me out a lot!

 

Thank you.

View solution in original post

4 REPLIES 4
Astounding
PROC Star

Do both files contain a variable such as ID that uniquely identifies an observation?  (It could be a set of variables rather than just one.)

 

Are you allowed to sort your data before combining the files?

 

If the second file contains a piece of data that was missing in the first file, do you want to use that data or ignore it?  For example, first file contains a record for ID=1, but WEIGHT is missing.  Second file also contains a record for ID=1, but WEIGHT is 150.  Use the 150, or discard it?

tehorne
Fluorite | Level 6

Yes, I have a unique ID called [Segment ID]

 

I just had to think about it. Maybe you can help me find the best resolution.

 

The FIRST file, is an export from my Access database that is missing information in from 5 different columns, but only missing on a couple of the records... because we added the 5 new columns later on in the project. So what had to do was, use a table out of teradata to vlookup the information for the records that were missing data in the 5 columns. Now all the records in the FIRST file have all of the data included. 

 

The SECOND file which is a direct connet to MC Access, still has the same records without data in the 5 columns, but also has the ones that do have data after we added the new columns. This is the file that will continously update everyday. Keep in mind, these records are passed through Sharepoint, so each record can still be updated at anytime. Which means it will be updated in my Access  Database.

 

So after we append the two files... we want to get rid of the duplicate records in the SECOND file that are currently in both, but still be able to capture changes made in database (these changes come Raters that go in Sharepoint to lablel each record). even though changes will NOT be made to those 5 new columns. 

 

Maybe i am thinking too hard about this.. but hopefully you can pont me in the right direction. 

Astounding
PROC Star

This is what I think you are after.  It requires that each file never contains 2+ observations per Segment_ID.

 

proc sort data=first;
   by segment_id;
run;

proc sort data=second;
   by segment_id;
run;

data want;
   update second first;
   by segment_id;
run;

It's a little unusual in that it appears to use SECOND as the master data set, and applies updates from FIRST.  However, the key is how the UPDATE statement handles information.  It uses any nonmissing values from FIRST, and replaces values from SECOND with those nonmissing values.  However, if FIRST contains a missing value, it leaves the value from SECOND in place as the final value.

 

UPDATE requires that SECOND contains no more than one observation per Segment_ID.  If that's an issue, we can program around that by adding a step.

 

UPDATE automatically outputs just one observation per SEGMENT_ID.  So you will definitely lose some data if FIRST contains multiple observations per SEGMENT_ID.  That would be an issue (also handle-able) to tackle before running this program.

 

 

tehorne
Fluorite | Level 6

You are absotutely right!

 

This helped me out a lot!

 

Thank you.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 964 views
  • 0 likes
  • 2 in conversation