Hello,
I have the following data:
Year | Date of Forcast | Forecast Value |
2018 | 20171102 | 0.65 |
2018 | 20180227 | 0.63 |
2018 | 20180509 | 0.64 |
2018 | 20180730 | 0.66 |
2018 | 20181110 | 0.69 |
2019 | 20190308 | 0.73 |
2019 | 20190509 | 0.52 |
2019 | 20190806 | 0.49 |
2019 | 20191108 | 0.49 |
The first row, for example, states that the earning forecast for the year 2018 is 0.65 and this forecast was given on the second of August 2017.
The second-row state that the forecast for the year 2018 was updated on the 27th of February 2018 to the value of 0.63.
I want to write a code that will take the latest forecast value for each year as long as the "date of forecast" is not later than the 31 of March of the same year.
For example for the first 5 rows (which all refer to the year 2018), the new variable (called "updated value") will be equal to 0.63.
For the next 4 rows (which all refer to the year 2019) the new variable will be equal to 0.73.
Could anyone assist in writing the code?
Thank you,
Lior
Double DO loop:
data want;
do until (last.year);
set have;
by year;
if date_of_forecast le mdy(3,31,year) then updated_value = forecast_value;
end;
do until (last.year);
set have;
by year;
output;
end;
run;
Code makes several assumptions about your data and is therefore untested. Please do always supply example data in a working data step with datalines.
Writing a program will not be difficult but depends on:
Hi,
Year - actual name: PRD_YR - numeric variable
Date of Forcast - actual name: MOD_DATE - numeric variable (group: date)
Forcast Value - actual name: VAL_1 - numeric variable
Thank you,
Lior
Date of Forcast |
OK, assuming we have SAS dates here, let's first create a variable holding the year:
data want1;
set have;
year = year(mod_date);
run;
Actually it looks like this step isn't needed since your data already contains PRD_YEAR. Then process this data separately for each year:
data want2;
do until (last.year);
set want1;
by year;
if month(mod_date) < 4 then updated_value = val_1;
end;
do until (last.year);
set want1;
by year;
output;
end;
run;
The top loop captures the last value prior to April, then the bottom loop reads the same observations and outputs them with the updated value from the top loop.
I had to adjust the code in order to include dates after 31 march of the previous year.
but other than that the code works well.
Thank you very much!
Lior
Double DO loop:
data want;
do until (last.year);
set have;
by year;
if date_of_forecast le mdy(3,31,year) then updated_value = forecast_value;
end;
do until (last.year);
set have;
by year;
output;
end;
run;
Code makes several assumptions about your data and is therefore untested. Please do always supply example data in a working data step with datalines.
Thank you very much.
Lior
Hello,
I want to change the code in the following way: instead of taking the latest forecast value for each year as long as the "date of forecast" is not later than the 31 of March of the same year, I want to take the latest forecast value for each year as long as the "date of forecast" is not later than 9 months before the date in the last rows:
firm | Year | Date of Forcast | Forecast Value | PRD_MON | PRD_YR | |
aa | 2018 | 20170802 | 0.65 | 6 | 2018 | 30.09.2017 |
aa | 2018 | 20170813 | 0.63 | 6 | 2018 | 30.09.2017 |
aa | 2018 | 20170920 | 0.52 | 6 | 2018 | 30.09.2017 |
aa | 2018 | 20180730 | 0.66 | 6 | 2018 | 30.09.2017 |
aa | 2018 | 20181110 | 0.69 | 6 | 2018 | 30.09.2017 |
aa | 2019 | 20180308 | 0.73 | 6 | 2019 | 30.09.2018 |
aa | 2019 | 20180509 | 0.52 | 6 | 2019 | 30.09.2018 |
aa | 2019 | 20180906 | 0.49 | 6 | 2019 | 30.09.2018 |
aa | 2019 | 20181006 | 0.43 | 6 | 2019 | 30.09.2018 |
For example for the first 5 rows (which all refer to the year 2018), the new variable (called "updated value") will be equal to 0.52.
For the next 4 rows (which all refer to the year 2019) the new variable will be equal to 0.49.
By the way, I didn't understand where is the "working data step" in which I should insert the example data.
Thank you,
Lior
For the year 2019, none of the dates is at least 9 months prior to the last, so you would not select any value there.
Quote from myself in an earlier post:
Please do always supply example data in a working data step with datalines.
Please follow this. I won't try to write any further code without proper data.
Hello,
For the year 2019, the first three rows are prior to 30.9.2018 which is 9 months prior to 30.6.2019, so the code should take the latest row among them, meaning the value of 0.49.
if all the rows had been after 30.09.2018, the value should be empty (missing value).
I want to supply example data in a working data step with datalines, but I don't know how to do so. Could you please guide how to do it?
I can browse the document file if that is what you meant.
Thanks,
Lior
Quick example of data step code:
data have;
input firm $ Year Date_of_Forcast : yymmdd8. Forecast_Value PRD_MON PRD_YR;
format date_of_forcast yymmdd10.;
datalines;
aa 2018 20170802 0.65 6 2018
;
Please expand on this with additional rows/columns.
Thank you,
Lior
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.