New SAS User

tehorne · Posted 03-12-2019 10:52 AM

Hi-

I have imported two different excel files:

1. The first excel file is the starting file.

2. The second excel file is the a spreadsheet that continously updates with new records.

I have set up an append query to join these two sheets. What I am needing help with is writing the program to get rid of the duplicates that sits in the SECOND excel sheet, because I need to use the records that are in the FIRST spreadsheet that are considered 'duplicates'. I just need the SECOND excel sheet to append all new records, seeing that it will be the one that is updated everyday.

I'm not sure of what all you need from me.. but just let me know.

FYI,, I am very new to the SAS language and process flows. So you may have to give me step by step examples.

tehorne · Posted 03-12-2019 05:01 PM

You are absotutely right!

This helped me out a lot!

Thank you.

View solution in original post

Astounding · Posted 03-12-2019 11:23 AM

Do both files contain a variable such as ID that uniquely identifies an observation? (It could be a set of variables rather than just one.)

Are you allowed to sort your data before combining the files?

If the second file contains a piece of data that was missing in the first file, do you want to use that data or ignore it? For example, first file contains a record for ID=1, but WEIGHT is missing. Second file also contains a record for ID=1, but WEIGHT is 150. Use the 150, or discard it?

tehorne · Posted 03-12-2019 12:34 PM

Yes, I have a unique ID called [Segment ID]

I just had to think about it. Maybe you can help me find the best resolution.

The FIRST file, is an export from my Access database that is missing information in from 5 different columns, but only missing on a couple of the records... because we added the 5 new columns later on in the project. So what had to do was, use a table out of teradata to vlookup the information for the records that were missing data in the 5 columns. Now all the records in the FIRST file have all of the data included.

The SECOND file which is a direct connet to MC Access, still has the same records without data in the 5 columns, but also has the ones that do have data after we added the new columns. This is the file that will continously update everyday. Keep in mind, these records are passed through Sharepoint, so each record can still be updated at anytime. Which means it will be updated in my Access Database.

So after we append the two files... we want to get rid of the duplicate records in the SECOND file that are currently in both, but still be able to capture changes made in database (these changes come Raters that go in Sharepoint to lablel each record). even though changes will NOT be made to those 5 new columns.

Maybe i am thinking too hard about this.. but hopefully you can pont me in the right direction.

Astounding · Posted 03-12-2019 01:31 PM

This is what I think you are after. It requires that each file never contains 2+ observations per Segment_ID.

proc sort data=first;
   by segment_id;
run;

proc sort data=second;
   by segment_id;
run;

data want;
   update second first;
   by segment_id;
run;

It's a little unusual in that it appears to use SECOND as the master data set, and applies updates from FIRST. However, the key is how the UPDATE statement handles information. It uses any nonmissing values from FIRST, and replaces values from SECOND with those nonmissing values. However, if FIRST contains a missing value, it leaves the value from SECOND in place as the final value.

UPDATE requires that SECOND contains no more than one observation per Segment_ID. If that's an issue, we can program around that by adding a step.

UPDATE automatically outputs just one observation per SEGMENT_ID. So you will definitely lose some data if FIRST contains multiple observations per SEGMENT_ID. That would be an issue (also handle-able) to tackle before running this program.

tehorne · Posted 03-12-2019 05:01 PM

You are absotutely right!

This helped me out a lot!

Thank you.

New SAS User

Getting rid of duplicates on Append query, and still append new records

Re: Getting rid of duplicates on Append query, and still append new records

Re: Getting rid of duplicates on Append query, and still append new records

Re: Getting rid of duplicates on Append query, and still append new records

Re: Getting rid of duplicates on Append query, and still append new records

Re: Getting rid of duplicates on Append query, and still append new records

Proc Append Duplicates

PROC APPEND Alternatives

Appending data to the lead of a record

Manipulating Data in SAS Studio Flows Part 1: Appending Data

proc append resulting in abnormally large dataset

Follow Us

What is...

New SAS User

Our biggest data and AI event of the year.

Follow Us

What is...