Hi, I receive a new dataset each month and hoping to append this to the base dataset and overwrite the latest three months. What is the best way to code this? Thanks
Base:
Month | Values |
Jan | 100 |
Feb | 100 |
Mar | 100 |
Apr | 100 |
May | 100 |
Jun | 100 |
Jul | 100 |
Aug | 100 |
Sep | 100 |
Oct | 100 |
Nov | 100 |
Dec | 100 |
Second dataset:
Month | Values |
Oct | 200 |
Nov | 200 |
Dec | 200 |
This is what I want
Month | Values |
Jan | 100 |
Feb | 100 |
Mar | 100 |
Apr | 100 |
May | 100 |
Jun | 100 |
Jul | 100 |
Aug | 100 |
Sep | 100 |
Oct | 200 |
Nov | 200 |
Dec | 200 |
Your question stretches the imagination in a few ways:
The problem you state is easy:
data want;
set first (obs=9) second;
run;
But I sincerely believe the answer will change when I get more information about the actual form to the data.
I'm still not convinced I have the entire picture. But here's a way to address what is known so far:
data want;
set first obs=totalobs;
if _n_ >= totalobs - 2 then set second;
run;
One "feature" of this program: it still works if FIRST contains 100 variables, but SECOND only contains 2 variables.
You need to be very precise here!
1. Do these latest 3 quarters already exist in your base table? If YES then you want to UPDATE them with the new values, if NO then you want to INSERT (append) them into your base table.
2. What does REPLACE mean? Are you asking to DELETE the oldest 3 months from your base table? Or is it something else?
Does your data look like in your last sample with only one row per month? Or is your data more granular - and if so what's the key that uniquely identifies exactly one row of data?
And last but not least: What are the data volumes? How many rows do you have in your real base table and your real transaction table?
I guess you also have some kind of id value in the datasets.
I would do this:
proc sort data=have;
by id month;
run;
proc sort data=trans;
by id month;
run;
data want;
merge
have
trans (
in=t
rename=(value=_value)
)
;
by id month;
if t then value = _value;
drop _value;
run;
New ID's and/or months will be added, existing ones updated.
Applying my code to your original data (and omitting ID) like this:
data have;
infile datalines dlm="09"x;
input month :monyy7. values;
format month yymmd7.;
datalines;
Jan2020 100
Feb2020 100
Mar2020 100
Apr2020 100
May2020 100
Jun2020 100
Jul2020 100
Aug2020 100
Sep2020 100
Oct2020 100
Nov2020 100
Dec2020 100
;
data trans;
infile datalines dlm="09"x;
input month :monyy7. values;
format month yymmd7.;
datalines;
Oct2020 200
Nov2020 200
Dec2020 200
;
proc sort data=have;
by month;
run;
proc sort data=trans;
by month;
run;
data want;
merge
have
trans (
in=t
rename=(values=_values)
)
;
by month;
if t then values = _values;
drop _values;
run;
creates your intended result from your original question.
If your requirement is different, post your data in usable form (data steps with datalines, see above) and show the expected result out of this.
Use the "little running man" right next to this
for posting your codes.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.