Re: Append and overwrite a second dataset to the base

ywon111 · Posted 09-20-2021 08:48 PM

Hi, I receive a new dataset each month and hoping to append this to the base dataset and overwrite the latest three months. What is the best way to code this? Thanks

Base:

Month	Values
Jan	100
Feb	100
Mar	100
Apr	100
May	100
Jun	100
Jul	100
Aug	100
Sep	100
Oct	100
Nov	100
Dec	100

Second dataset:

Month	Values
Oct	200
Nov	200
Dec	200

This is what I want

Month	Values
Jan	100
Feb	100
Mar	100
Apr	100
May	100
Jun	100
Jul	100
Aug	100
Sep	100
Oct	200
Nov	200
Dec	200

Astounding · Posted 09-20-2021 11:46 PM

Your question stretches the imagination in a few ways:

You always start with 12 observations and always want to replace the last 3 ??
Your data has no identifiable order to it? No year variable? No ordered value for month like a value from 1 to 12?

The problem you state is easy:

data want;
   set first (obs=9) second;
run;

But I sincerely believe the answer will change when I get more information about the actual form to the data.

ywon111 · Posted 09-20-2021 11:57 PM

You always start with 12 observations and always want to replace the last 3 ?
No. There are more month/year data further up the table but will receive last three months so need to keep the historical data while append & overwrite the latest three months.

Your data has no identifiable order to it? No year variable? No ordered value for month like a value from 1 to 12?
The data is currently order by month.

I guess it's something more like this
Month Values
Aug-19 100
Sep-19 100
Oct-19 100
Nov-19 100
Dec-19 100
Jan-20 100
Feb-20 100
Mar-20 100
Apr-20 100
May-20 100
Jun-20 100
Jul-20 100
Aug-20 100
Sep-20 100
Oct-20 200
Nov-20 200
Dec-20 200

Astounding · Posted 09-21-2021 12:43 AM

I'm still not convinced I have the entire picture. But here's a way to address what is known so far:

data want;
   set first obs=totalobs;
   if _n_ >= totalobs - 2 then set second;
run;

One "feature" of this program: it still works if FIRST contains 100 variables, but SECOND only contains 2 variables.

ywon111 · Posted 09-21-2021 02:45 AM

I get records of three quarters' worth of data (e.g. Jun20, Sep20 and Dec20) that needs to be appended and overwrite the base dataset which has historical data tracking from Jun96 plus other quarters except latest Dec20. The latest three quarters need to overwrite existing ones in base as the values there gets updated.

Month Value
Jun19 100
Jun19 100
Sep19 100
Sep19 100
Sep19 100
Dec19 100
Dec19 100
Dec19 100
Dec19 100
Mar20 100
Mar20 100
Jun20 100
Sep20 100
Sep20 100
Sep20 100
Dec20 100

Patrick · Posted 09-21-2021 02:53 AM

You need to be very precise here!

1. Do these latest 3 quarters already exist in your base table? If YES then you want to UPDATE them with the new values, if NO then you want to INSERT (append) them into your base table.

2. What does REPLACE mean? Are you asking to DELETE the oldest 3 months from your base table? Or is it something else?

Does your data look like in your last sample with only one row per month? Or is your data more granular - and if so what's the key that uniquely identifies exactly one row of data?

And last but not least: What are the data volumes? How many rows do you have in your real base table and your real transaction table?

Kurt_Bremser · Posted 09-21-2021 04:17 AM

I guess you also have some kind of id value in the datasets.

I would do this:

proc sort data=have;
by id month;
run;

proc sort data=trans;
by id month;
run;

data want;
merge
  have
  trans (
    in=t
    rename=(value=_value)
  )
;
by id month;
if t then value = _value;
drop _value;
run;

New ID's and/or months will be added, existing ones updated.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ywon111 · Posted 09-21-2021 08:48 PM

Thanks it ran fine but it's missing one part. I receive data quarterly which contains the latest three quarters (e.g. Dec 2019, Mar 2020, Jun 2020), these numbers gets updated so it needs to be appended to the base and overwrite as it contains data from June 1999 to March 2020 (the Dec 2019 and Mar 2020 in base will need to be removed and overwritten with the newly updated data). Hope this makes sense...

Kurt_Bremser · Posted 09-22-2021 03:00 AM

Applying my code to your original data (and omitting ID) like this:

data have;
infile datalines dlm="09"x;
input month :monyy7. values;
format month yymmd7.;
datalines;
Jan2020	100
Feb2020	100
Mar2020	100
Apr2020	100
May2020	100
Jun2020	100
Jul2020	100
Aug2020	100
Sep2020	100
Oct2020	100
Nov2020	100
Dec2020	100
;

data trans;
infile datalines dlm="09"x;
input month :monyy7. values;
format month yymmd7.;
datalines;
Oct2020	200
Nov2020	200
Dec2020	200
;

proc sort data=have;
by month;
run;

proc sort data=trans;
by month;
run;

data want;
merge
  have
  trans (
    in=t
    rename=(values=_values)
  )
;
by month;
if t then values = _values;
drop _values;
run;

creates your intended result from your original question.

If your requirement is different, post your data in usable form (data steps with datalines, see above) and show the expected result out of this.

Use the "little running man" right next to this

for posting your codes.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Registration is open

SAS Training: Just a Click Away