BookmarkSubscribeRSS Feed
ywon111
Quartz | Level 8

Hi, I receive a new dataset each month and hoping to append this to the base dataset and overwrite the latest three months. What is the best way to code this? Thanks

 

Base:

Month Values
Jan 100
Feb 100
Mar 100
Apr 100
May 100
Jun 100
Jul 100
Aug 100
Sep 100
Oct 100
Nov 100
Dec 100

 

Second dataset:

Month Values
Oct 200
Nov 200
Dec 200

 

This is what I want

Month Values
Jan 100
Feb 100
Mar 100
Apr 100
May 100
Jun 100
Jul 100
Aug 100
Sep 100
Oct 200
Nov 200
Dec 200
8 REPLIES 8
Astounding
PROC Star

Your question stretches the imagination in a few ways:

 

  • You always start with 12 observations and always want to replace the last 3 ??
  • Your data has no identifiable order to it?  No year variable?  No ordered value for month like a value from 1 to 12?

The problem you state is easy:

data want;
   set first (obs=9) second;
run;

But I sincerely believe the answer will change when I get more information about the actual form to the data.

ywon111
Quartz | Level 8
You always start with 12 observations and always want to replace the last 3 ?
No. There are more month/year data further up the table but will receive last three months so need to keep the historical data while append & overwrite the latest three months.

Your data has no identifiable order to it? No year variable? No ordered value for month like a value from 1 to 12?
The data is currently order by month.

I guess it's something more like this
Month Values
Aug-19 100
Sep-19 100
Oct-19 100
Nov-19 100
Dec-19 100
Jan-20 100
Feb-20 100
Mar-20 100
Apr-20 100
May-20 100
Jun-20 100
Jul-20 100
Aug-20 100
Sep-20 100
Oct-20 200
Nov-20 200
Dec-20 200
Astounding
PROC Star

I'm still not convinced I have the entire picture.  But here's a way to address what is known so far:

 

data want;
   set first obs=totalobs;
   if _n_ >= totalobs - 2 then set second;
run;

One "feature" of this program:  it still works if FIRST contains 100 variables, but SECOND only contains 2 variables.

ywon111
Quartz | Level 8
I get records of three quarters' worth of data (e.g. Jun20, Sep20 and Dec20) that needs to be appended and overwrite the base dataset which has historical data tracking from Jun96 plus other quarters except latest Dec20. The latest three quarters need to overwrite existing ones in base as the values there gets updated.

Month Value
Jun19 100
Jun19 100
Sep19 100
Sep19 100
Sep19 100
Dec19 100
Dec19 100
Dec19 100
Dec19 100
Mar20 100
Mar20 100
Jun20 100
Sep20 100
Sep20 100
Sep20 100
Dec20 100
Patrick
Opal | Level 21

You need to be very precise here!

1. Do these latest 3 quarters already exist in your base table? If YES then you want to UPDATE them with the new values, if NO then you want to INSERT (append) them into your base table.

2. What does REPLACE mean? Are you asking to DELETE the oldest 3 months from your base table? Or is it something else?

 

Does your data look like in your last sample with only one row per month? Or is your data more granular - and if so what's the key that uniquely identifies exactly one row of data?

 

And last but not least: What are the data volumes? How many rows do you have in your real base table and your real transaction table?

Kurt_Bremser
Super User

I guess you also have some kind of id value in the datasets.

I would do this:

proc sort data=have;
by id month;
run;

proc sort data=trans;
by id month;
run;

data want;
merge
  have
  trans (
    in=t
    rename=(value=_value)
  )
;
by id month;
if t then value = _value;
drop _value;
run;

New ID's and/or months will be added, existing ones updated.

ywon111
Quartz | Level 8
Thanks it ran fine but it's missing one part. I receive data quarterly which contains the latest three quarters (e.g. Dec 2019, Mar 2020, Jun 2020), these numbers gets updated so it needs to be appended to the base and overwrite as it contains data from June 1999 to March 2020 (the Dec 2019 and Mar 2020 in base will need to be removed and overwritten with the newly updated data). Hope this makes sense...
Kurt_Bremser
Super User

Applying my code to your original data (and omitting ID) like this:

data have;
infile datalines dlm="09"x;
input month :monyy7. values;
format month yymmd7.;
datalines;
Jan2020	100
Feb2020	100
Mar2020	100
Apr2020	100
May2020	100
Jun2020	100
Jul2020	100
Aug2020	100
Sep2020	100
Oct2020	100
Nov2020	100
Dec2020	100
;

data trans;
infile datalines dlm="09"x;
input month :monyy7. values;
format month yymmd7.;
datalines;
Oct2020	200
Nov2020	200
Dec2020	200
;

proc sort data=have;
by month;
run;

proc sort data=trans;
by month;
run;

data want;
merge
  have
  trans (
    in=t
    rename=(values=_values)
  )
;
by month;
if t then values = _values;
drop _values;
run;

creates your intended result from your original question.

If your requirement is different, post your data in usable form (data steps with datalines, see above) and show the expected result out of this.

Use the "little running man" right next to this

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

for posting your codes.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2342 views
  • 1 like
  • 4 in conversation