BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Barite | Level 11

## How to estimate the growth in a dataset like this one

How do we estimate the growth for written premium (wp) , Earned Unit (EU) and Written Unit (WU) from 2022 to 2023 with a dataset structured like below

``````DATA new_dataset;
INPUT AN WP EU WU;
DATALINES;
2022 1000 1500 1700
2023 1500 1550 1800
;
RUN;``````

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: How to estimate the growth in a dataset like this one

```data result;
set new_dataset;
by ctradm type3;
growth_wp= dif(wp)/lag(wp);
growth_eu= dif(eu)/lag(eu);
growth_wu= dif(wu)/lag(wu);
if last.type3;
run;```

For your given data.

What this does:

The DIF function calculates the difference for a variable from the previous observation, lag gets the value of the variable for the previous observation. You can actual get the value from more than the previous: DIF4 gets the difference between the current and the 4th previous observation.

Caveat: Previous is within a queued value group. So when used with "if" or similar conditions the "previous" is the previous time the condition is true.

This will result in notes in the log about missing values because on the first observation is there no "previous" value to use.

The BY statement creates automatic variables that indicate to SAS whether the current observation is the first or last of a by group. The values are accessed with the syntax FIRST.VARIABLE or LAST.VARIABLE and are valued as 1 for true and 0 for false. So IF statements like 'if first.variable then ...' execute ;

The IF used above is a subsetting IF so that only the observation with the last of the TYPE3 is written to the output.

Caution: Your example data implies a sort order by Type3. If this is not the actual case but the data is grouped as need then add NOTSORTED to the BY statement to prevent errors.

Or sort the data by CTRADM TYPE3 AN prior to this step.

4 REPLIES 4
Super User

## Re: How to estimate the growth in a dataset like this one

What would the result look like? And what if the value does not increase?

If you have multiple years do you want a year-to-year change or over the time period of record?

Does your actual data have something else, perhaps an identifier as to what product or client name that groups records together (goes to a better fit of a solution to actual problem)?

Super User

## Re: How to estimate the growth in a dataset like this one

With just year over year change you can't quite make predictions in my opinion, in this case you get more into the art method of predictions than science.
Barite | Level 11

## Re: How to estimate the growth in a dataset like this one

Sorry, may be I was not clear enough.  I just want to estimate the variation of three variable from the previous year to the current year, like below.  You may have many administrative center and type3 class but the dataset structure will be like below.  The result dataset is the one I would like to have.  What's the best way to do that ?

``````DATA new_dataset;
INPUT AN CTRADM \$ TYPE3 WP EU WU;
DATALINES;
2022 ABL 23 1000 1500 1700
2023 ABL 23 1500 1550 1800
2022 ABL 31.1 800 800 800
2023 ABL 31.1 1000 950 900

;
RUN;

DATA Result;
INPUT AN CTRADM \$ TYPE3 WP growth_WP EU growth_EU WU growth_WU;
DATALINES;

2023 ABL 23 1500 0.5 1550 0.033 1800 0.058
2023 ABL 31.1 1000 0.25 950 0.1875 900 0.125
;
RUN;``````
Super User

## Re: How to estimate the growth in a dataset like this one

```data result;
set new_dataset;
by ctradm type3;
growth_wp= dif(wp)/lag(wp);
growth_eu= dif(eu)/lag(eu);
growth_wu= dif(wu)/lag(wu);
if last.type3;
run;```

For your given data.

What this does:

The DIF function calculates the difference for a variable from the previous observation, lag gets the value of the variable for the previous observation. You can actual get the value from more than the previous: DIF4 gets the difference between the current and the 4th previous observation.

Caveat: Previous is within a queued value group. So when used with "if" or similar conditions the "previous" is the previous time the condition is true.

This will result in notes in the log about missing values because on the first observation is there no "previous" value to use.

The BY statement creates automatic variables that indicate to SAS whether the current observation is the first or last of a by group. The values are accessed with the syntax FIRST.VARIABLE or LAST.VARIABLE and are valued as 1 for true and 0 for false. So IF statements like 'if first.variable then ...' execute ;

The IF used above is a subsetting IF so that only the observation with the last of the TYPE3 is written to the output.

Caution: Your example data implies a sort order by Type3. If this is not the actual case but the data is grouped as need then add NOTSORTED to the BY statement to prevent errors.

Or sort the data by CTRADM TYPE3 AN prior to this step.

Discussion stats
• 4 replies
• 509 views
• 1 like
• 3 in conversation