Help using Base SAS procedures

Need help figuring out how to loop and change values within dataset....

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Need help figuring out how to loop and change values within dataset....

Dear SAS Heads-

I am working on a program to calculate the replacement costs and ongoing annual maintenance costs of roads over a ten year period.  I have a very large dataset with many roads, so this example is an over-simplification in order to get the code correctly specified.  I am having difficulty with a complicated data step, so I am looking for the "best-of-the-best" BASE SAS programmers.  I know you are out there.  Smiley Happy

Background: My example raw dataset, "have", contains two roads (road_id=1 and road_id=2).  The dataset contains the age of the road in 2012 ("Age_YE2012"), the current age in a particular year ("age"), useful lifespan ("Lifespan"), the year ("Year_Ending"), whether or not the road was replaced ("Replace"), the one-time cost to replace the road ("CAPEX"), and the maintenance cost of the road when it is 1 years old ("OPEX_Age1").  Here is my example raw dataset:

******;

DATA have;

   INPUT Road_id Age_YE2012 Lifespan Year_Ending Age Replace $ CAPEX OPEX_Age1;

   DATALINES;

     1    5    2013    7    6    NO    1000000    1000

    1    5    2014    7    7    NO    1000000    1000

    1    5    2015    7    8    NO    1000000    1000

    1    5    2016    7    9    NO    1000000    1000

    1    5    2017    7    10    NO    1000000    1000

    1    5    2018    7    11    NO    1000000    1000

    1    5    2019    7    12    NO    1000000    1000

    1    5    2020    7    13    NO    1000000    1000

    1    5    2021    7    14    NO    1000000    1000

    1    5    2022    7    15    NO    1000000    1000

    1    5    2023    7    16    NO    1000000    1000

     2    6    2013    8    7    NO    2000000    3000

    2    6    2014    8    8    NO    2000000    3000

    2    6    2015    8    9    NO    2000000    3000

    2    6    2016    8    10    NO    2000000    3000

    2    6    2017    8    11    NO    2000000    3000

    2    6    2018    8    12    NO    2000000    3000

    2    6    2019    8    13    NO    2000000    3000

    2    6    2020    8    14    NO    2000000    3000

    2    6    2021    8    15    NO    2000000    3000

    2    6    2022    8    16    NO    2000000    3000

    2    6    2023    8    17    NO    2000000    3000

;

run;

********;

Question: I need a datastep or a couple of datasteps to perform the following tasks:

(1) Use a combination of the current "Age" and "Lifespan" of the road to determine the year(s) when the road will need to be replaced over that 10 year period.  Basically, if the age exceeds the lifespan, then in the first year when that happens I need the following values to change:

REPLACE="YES" (this indicates the road was replaced that particular year)

CAPEX_ReplaceYr=CAPEX (this indicates that the cost of replacement occurs during the year of that replacement)

OPEX=OPEX_Age1 (this indicates that this specific maintenance cost occurs during the first year of replacement)

Note that in some cases, the road may need to be replaced twice in that ten year period.  This happens in my example for Road_ID=1.

(2) I will need SAS to also adjust the "Age" variable if the road was replaced in Step #1.  For example, Road_ID=1 will need to be replaced in 2015 and 2022.  So, the "Age" in 2015 is 1, 2016 is 2 years old, 2017 is 3 years old and so on until the road is replaced again (2022) and the age gets reset back to 1 and the counting begins again.

(3) Finally, I need SAS to escalate/increase the "OPEX" value by 10% above the starting value for Age 1 based on the actual "Age" of the road.  For example, in 2015, Road_id=1 is only 1 years old, so its OPEX=1000 (i.e., OPEX_Age1).  In 2016, this same road is now 2 years old, so the OPEX increases by 10% to 1100.  If this road is six years old (in years 2013 and 2020) then the OPEX value is 1500.  You can see how the final dataset should look below in my example dataset called "want".      

Anyway, I'm hopeful that someone out there can see this solution very quickly.  I do not think this is going to be a large data step--probably some proc SQL and/or do loops.  I should mention that I DO NOT have IML installed on my machine.

Anyway, I look forward to hearing from you.  Thank you so much.  Please let me know if you have any additional questions.  -Pete

*******;

DATA want;

   INPUT Road_id Age_YE2012 Lifespan Year_Ending Age Replace $ CAPEX OPEX_Age1 CAPEX_ReplaceYr OPEX;

   DATALINES;

     1    5    2013    7    6    NO    1000000    1000    .    1500

    1    5    2014    7    7    NO    1000000    1000    .    1600

    1    5    2015    7    1    YES    1000000    1000    1000000    1000

    1    5    2016    7    2    NO    1000000    1000    .    1100

    1    5    2017    7    3    NO    1000000    1000    .    1200

    1    5    2018    7    4    NO    1000000    1000    .    1300

    1    5    2019    7    5    NO    1000000    1000    .    1400

    1    5    2020    7    6    NO    1000000    1000    .    1500

    1    5    2021    7    7    NO    1000000    1000    .    1600

    1    5    2022    7    1    YES    1000000    1000    1000000    1000

    1    5    2023    7    2    NO    1000000    1000    .    1100

     2    6    2013    8    7    NO    2000000    3000    .    4800

    2    6    2014    8    8    NO    2000000    3000    .    5100

    2    6    2015    8    1    YES    2000000    3000    2000000 3000

    2    6    2016    8    2    NO    2000000    3000    .    3300

    2    6    2017    8    3    NO    2000000    3000    .    3600

    2    6    2018    8    4    NO    2000000    3000    .    3900

    2    6    2019    8    5    NO    2000000    3000    .    4200

    2    6    2020    8    6    NO    2000000    3000    .    4500

    2    6    2021    8    7    NO    2000000    3000    .    4800

    2    6    2022    8    8    NO    2000000    3000    .    5100

    2    6    2023    8    1    YES    2000000    3000    2000000    3000

;

run;


Accepted Solutions
Solution
‎07-24-2015 10:34 PM
Respected Advisor
Posts: 4,659

Re: Need help figuring out how to loop and change values within dataset....

After adjusting the order in your variable list (Year_Ending Lifespan), a simple DO UNTIL() loop does the job:

data want;

do until(last.Road_Id);

    set have; by Road_Id;

    if first.Road_Id then _Age = age;

    else _Age = _Age + 1;

    if _Age > Lifespan then do;

        Replace = "YES";

        _Age = 1;

        end;

    else do;

        Replace = "NO";

        call missing(CAPEX);

        end;

    OPEX = OPEX_Age1 * (1 + (_Age-1) * 0.1);

    output;

    end;

rename _age=age;

drop age OPEX_Age1;

run;

PG

PG

View solution in original post


All Replies
Solution
‎07-24-2015 10:34 PM
Respected Advisor
Posts: 4,659

Re: Need help figuring out how to loop and change values within dataset....

After adjusting the order in your variable list (Year_Ending Lifespan), a simple DO UNTIL() loop does the job:

data want;

do until(last.Road_Id);

    set have; by Road_Id;

    if first.Road_Id then _Age = age;

    else _Age = _Age + 1;

    if _Age > Lifespan then do;

        Replace = "YES";

        _Age = 1;

        end;

    else do;

        Replace = "NO";

        call missing(CAPEX);

        end;

    OPEX = OPEX_Age1 * (1 + (_Age-1) * 0.1);

    output;

    end;

rename _age=age;

drop age OPEX_Age1;

run;

PG

PG
Occasional Contributor
Posts: 5

Re: Need help figuring out how to loop and change values within dataset....

Hi PGStats-

Thank you so much for the help.  This is exactly what I needed--and thanks for catching the typo in my datafiles statement above.

If you are okay with it, I might have one followup question that relates to a variation of the code you sent earlier.  I'll know more by Wednesday when I have time to incorporate this loop into my broader program.

Thanks, PG.

-Pete

Occasional Contributor
Posts: 5

Re: Need help figuring out how to loop and change values within dataset....

Hi PG-

Okay, I have a slight variation in my data.  I think this will be a very quick and easy modification to your existing code, but I couldn't figure it out.

In this case, I have two different road types that have different lifespans, CAPEX (capital expenses), and OPEX (operational expenses).  I would like to see how the costs change, if I switch the road type from Road A to Road B, but only after the first replacement occurs.  If the original road (Road A) has not retired for the first time, then I would like to keep using the increasing OPEX for that road type (see earlier code you sent).

More specifically, here is what I am looking to do:

(1) Use "Age" and "Lifespan_A" to determine the year when the original road (Road A) would need to be replaced for the first time.

(2) In the year this first replacement occurs, insert CAPEX_B (the capital cost for Road B) for this year only and begin escalating the OPEX_Age1_B variable for each year afterward using the same method as in the earlier code.

(3) In other words, when the original road (Road A) is replaced, the new road (Road B) has a new lifespan (Lifespan_B), CAPEX_B for that year only, and the OPEX variable increases each year afterward according to this new age vs. lifespan combination.

(4) If the original road (Road A) has not yet retired for the first time, code inserts missing values for CAPEX_B, but code does insert the escalating the OPEX_Age1_A (operational costs for Road A) variable using the same escalating method as in the earlier code.

Basically, I just want the code to produce results showing that the existing road has increasing operational expenses (OPEX_Age1_A) until it retires for the first time, then at that point, the new road type is built (CAPEX_B) and there are increasing operational expenses for that road type (OPEX_Age1_B) until the age of the new road exceeds the new lifespan--and road B is built again--and so on. 

Any modifications you can make to your existing code would be greatly appreciated.  I owe you big time for this help.

Thanks,

Pete

Here are the datalines statement this time around:

******;

DATA have;

   INPUT Road_id Age_YE2012 Year_Ending Lifespan_A Lifespan_B Age Replace $ CAPEX_A OPEX_Age1_A CAPEX_B OPEX_Age1_B;

   DATALINES;

     1    5    2013    7    5    6    NO    1000000    1000    2000000    2000

    1    5    2014    7    5    7    NO    1000000    1000    2000000    2000

    1    5    2015    7    5    8    NO    1000000    1000    2000000    2000

    1    5    2016    7    5    9    NO    1000000    1000    2000000    2000

    1    5    2017    7    5    10    NO    1000000    1000    2000000    2000

    1    5    2018    7    5    11    NO    1000000    1000    2000000    2000

    1    5    2019    7    5    12    NO    1000000    1000    2000000    2000

    1    5    2020    7    5    13    NO    1000000    1000    2000000    2000

    1    5    2021    7    5    14    NO    1000000    1000    2000000    2000

    1    5    2022    7    5    15    NO    1000000    1000    2000000    2000

    1    5    2023    7    5    16    NO    1000000    1000    2000000    2000

     2    6    2013   8    6    7    NO    2000000    3000    4000000    6000

    2    6    2014    8    6    8    NO    2000000    3000    4000000    6000   

    2    6    2015    8    6    9    NO    2000000    3000    4000000    6000

    2    6    2016    8    6    10    NO    2000000    3000    4000000    6000

    2    6    2017    8    6    11    NO    2000000    3000    4000000    6000

    2    6    2018    8    6    12    NO    2000000    3000    4000000    6000

    2    6    2019    8    6    13    NO    2000000    3000    4000000    6000

    2    6    2020    8    6    14    NO    2000000    3000    4000000    6000

    2    6    2021    8    6    15    NO    2000000    3000    4000000    6000

    2    6    2022    8    6    16    NO    2000000    3000    4000000    6000

    2    6    2023    8    6    17    NO    2000000    3000    4000000    6000

;

run;

*******;

DATA want;

   INPUT Road_id Age_YE2012 Year_Ending Lifespan_A Lifespan_B Age Replace $ CAPEX_B OPEX_Age1_A OPEX_Age1_B CAPEX_ReplaceYr OPEX;

   DATALINES;

    1   5    2013    7    5    6    NO   2000000    1000    2000    .    1500

    1    5    2014    7    5    7    NO    2000000    1000    2000  .    1600

    1    5    2015    7    5    1    YES   2000000    1000    2000    2000000    2000

    1    5    2016    7    5    2    NO    2000000    1000    2000    .    2200

    1    5    2017    7    5    3    NO    2000000    1000    2000    .    2400

    1    5    2018    7    5    4    NO    2000000    1000    2000    .    2600

    1    5    2019    7    5    5    NO    2000000    1000 2000   .    2800

    1    5    2020    7    5    6    YES    2000000    1000 2000    2000000    2000

    1    5    2021    7    5    7    NO    2000000    1000    2000    .    2200

    1    5    2022    7    5    1    NO   2000000    1000    2000   .   2400

    1    5    2023    7    5    2    NO    2000000    1000    2000    .    2600

     2    6    2013   8    6 7    NO   4000000    3000  6000  .    4800

    2    6    2014    8    6 8    NO    4000000    3000    6000   .    5100

    2    6    2015    8    6 1    YES    4000000    3000    6000    4000000 6000

    2    6    2016    8    6  2    NO    4000000    3000    6000    .    6600

    2    6    2017    8    6 3    NO    4000000    3000    6000    .    7200

    2    6    2018    8    6 4    NO    4000000    3000    6000    .    7800

    2    6    2019    8    6 5    NO    4000000    3000    6000    .    8400

    2    6    2020    8    6 6    NO    4000000    3000    6000    .    9000

    2    6    2021    8    6 7   YES   4000000    3000    6000    4000000    6000

    2    6    2022    8    6 8    NO    4000000    3000    6000    .    6600

    2    6    2023    8    6 1    NO   4000000    3000    6000    .    7200

;

run;

********;

N/A
Posts: 1

Re: Need help figuring out how to loop and change values within dataset....

Hi Pete,

Try the following. The trick was to set up a "flag" to check whether the first replacement had been done. The code flow is adjusted based on the flag content. Some streamlining may be useful (e.g. the flag could be a macro variable) but the code does the job.

data want;

    flag = 0;

do until(last.Road_Id);

    if Road_Id ~= Lag(Road_Id) then flag = 0;

    set have; by Road_Id;

    if first.Road_Id then _Age = age;

    else _Age = _Age + 1;

    if flag = 0 then Lifespan = Lifespan_A; else Lifespan = Lifespan_B;

    if _Age > Lifespan then do;

        Replace = "YES";

        _Age = 1;

        CAPEX_ReplaceYr = CAPEX_B;

        flag = 1;

        end;

    else do;

        Replace = "NO";

        call missing(CAPEX_ReplaceYr);

        end;

    if flag = 0 then do;

           OPEX = OPEX_Age1_A * (1 + (_Age-1) * 0.1);

    end;

    else do;

        OPEX = OPEX_Age1_B * (1 + (_Age-1) * 0.1);

    end;

    output;

    end;

rename _age=age;

drop age flag Lifespan;

run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 352 views
  • 0 likes
  • 3 in conversation