turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Trends analysis Calculation Programming

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-20-2018 11:54 AM

Hi Folks,

I'm struggling to write a program for trend analysis, below is screenshot of what I am trying to accomplish

I have Adjusted_PMPM value for three years (2015,2016, 2017). I have to find the Expected 1 and Expected 2 column values using the formula: Adjusted_PMPM*(1+R), Where, R is 7%.

Value of R is constant for the entire calculation, it's the Adjusted_PMPM which changes.

For Expected_1, start from 2015:

For Expected_1 value for 2016, it will be : 3304.55*(1+0.07) = 3545.55

but then Expected_1 value for 2017 will be 3545.55*(1+0.07) = 3804.12

Then again for Expected_2 value, we will start from 2016

So Expected_2 value for 2017 will be 3145.41*(1+0.07) = 3374.80

And finally, Average the two expected values for 2017 to get the final value in the last column.

I'm attaching the Excel sheet, if in case that helps in the process.

I tried adding the SAS dataset, it won't let me add the dataset here.

Any help will be much appreciated.

Accepted Solutions

Solution

03-20-2018
02:52 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi_Garg

03-20-2018 01:55 PM - edited 03-20-2018 01:57 PM

Yes, indeed your program is more complicated than needed. In particular, you should take advantage of two features of the DATA step:

- The LAG function, which allows you to access data from preceding records (actually preceding invocation of a lag queue - which in this case is the same thing. For instance you can

EXP1=lag(adjusted_pmpm)*(1+trend)

and

EXP2=lag(exp1)*(1+trend)

You should probably nest those expressions in a ROUND function to get exact pennies. - The BY statement (as in
) which allows you to determine whether the record-in-hand is the first record for a product_type (via the automatic variable FIRST.PRODUCT_TYPE). You can use this information to eliminate the unwanted consequences of lagged values carrying over from one product_type to the next:**SET HAVE; BY PRODUCT_TYPE;**

As a first effort, don't worry about carryover from prior product_types. Just see if you can use the LAG function to generate the 2 new variables EXP1 and EXP2. Once you've got that, then return to the task of setting EXP1 to missing for the first record of each type, and first 2 recs for EXP2.

Here's the starting data set, all sasified:

```
data have;
input PROD_TYPE $4.Calendar_year ADJUSTED_PMPM :dollar9.2 Trend percent3.;
datalines;
Advo 2015 $3,304.55 7%
Advo 2016 $3,145.41 7%
Advo 2017 $3,229.17 7%
GRP 2015 $2,956.19 1%
GRP 2016 $3,238.21 1%
GRP 2017 $4,184.82 1%
IFP 2015 $4,706.53 9%
IFP 2016 $4,634.87 9%
IFP 2017 $5,308.22 9%
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi_Garg

03-20-2018 12:48 PM

I have done it this far.

If someone can help me improvise this, I'll be grateful.

```
Data test (keep=Prod_type Calendar_year Adjusted_PMPM Trend Expected_1 Expected_2);
obs1 = 1;
do while (obs1 < = nobs);
set Non_engaged NOBS=nobs;
by prod_type;
obs2= obs1-1;
set Non_engaged (rename=(Prod_type = Prod_type_1
Calendar_year = Calendar_year_1
adjusted_PMPM = adjusted_PMPM_1
trend = trend_1)) point=obs2;
Expected_1 = adjusted_PMPM_1*(1+trend);
obs3= obs2-1;
set Non_engaged (rename=(Prod_type = Prod_type_2
Calendar_year = Calendar_year_2
adjusted_PMPM = adjusted_PMPM_2
trend = trend_2)) point=obs3;
Expected_1x = adjusted_PMPM_2*(1+trend);
Expected_2 = Expected_1x*(1+trend);
obs1 + 1;
output;
end;
Run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi_Garg

03-20-2018 12:50 PM

Post your data as text, specifically what your input data will look like.

Exactly what you're starting with is helpful. Your solution appears overcomplicated at this point.

And will you have only two years or does this need to be generalized across multiple records.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

03-20-2018 12:59 PM

Hi Reeza,

Attached is the text format dataset.

Yes, data will only have three years span. At least for remaining of this year.

The code that I provided above works great all the numbers coming as result are correct, the only issue is when the Prod_type changes from one to another, it breaks.

Basically, If I had only one Prod_type, the above code works.

Thanks.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi_Garg

03-20-2018 01:24 PM

Do you need just the final value or all the intermediary values as well?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

03-20-2018 01:25 PM

Expected_1 and Expected_2 are my final values. So yes, I want these two.

Solution

03-20-2018
02:52 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi_Garg

03-20-2018 01:55 PM - edited 03-20-2018 01:57 PM

Yes, indeed your program is more complicated than needed. In particular, you should take advantage of two features of the DATA step:

- The LAG function, which allows you to access data from preceding records (actually preceding invocation of a lag queue - which in this case is the same thing. For instance you can

EXP1=lag(adjusted_pmpm)*(1+trend)

and

EXP2=lag(exp1)*(1+trend)

You should probably nest those expressions in a ROUND function to get exact pennies. - The BY statement (as in
) which allows you to determine whether the record-in-hand is the first record for a product_type (via the automatic variable FIRST.PRODUCT_TYPE). You can use this information to eliminate the unwanted consequences of lagged values carrying over from one product_type to the next:**SET HAVE; BY PRODUCT_TYPE;**

As a first effort, don't worry about carryover from prior product_types. Just see if you can use the LAG function to generate the 2 new variables EXP1 and EXP2. Once you've got that, then return to the task of setting EXP1 to missing for the first record of each type, and first 2 recs for EXP2.

Here's the starting data set, all sasified:

```
data have;
input PROD_TYPE $4.Calendar_year ADJUSTED_PMPM :dollar9.2 Trend percent3.;
datalines;
Advo 2015 $3,304.55 7%
Advo 2016 $3,145.41 7%
Advo 2017 $3,229.17 7%
GRP 2015 $2,956.19 1%
GRP 2016 $3,238.21 1%
GRP 2017 $4,184.82 1%
IFP 2015 $4,706.53 9%
IFP 2016 $4,634.87 9%
IFP 2017 $5,308.22 9%
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

03-20-2018 02:31 PM

Wow!! This was great. Never used the lag function before, so didn't know.

Here is the code that I tried.

```
Data test_2;
set Non_engaged;
Format EXP1 EXP2 dollar10.2;
by prod_type;
EXP1=lag(adjusted_pmpm)*(1+trend);
EXP2=lag(exp1)*(1+trend);
if first.prod_type then do;
EXP1 = 0;
EXP2 = 0;
end;
Run;
```

Setting EXP1 to missing for the first record for each product type is easy using First.Product_Type. __But how do you set first 2 recs for EXP2 to missing??__

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi_Garg

03-20-2018 02:43 PM

YOu can use lag function on the automatic dummies:

** if lag(first.product)=1 then exp2=0;**

But instead of setting to zero, why not set values to missing?

** if first.product then call missing(exp1,exp2);**

* if lag(first.product)=1 then call missing(exp2*);

The advantage of the call missing is that it can (1) accept a list of variable, and (2) that list can be a mixture of character and numeric variables.

Then you can also use the NMISS function on the 3 variables to determine when to calculate the average (by using the MEAN function).

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

03-20-2018 02:54 PM

Ohh I see. This is how you set the variable to missing.

I didn't know how to set the variable to missing that's why was setting them to 0.

Thanks a ton. I really appreciate all your time.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Abhi_Garg

03-20-2018 03:28 PM

Before * call missing* was added to SAS, you would use an explicit assignment statement:

- To set a numeric variable to missing: X=.;
- To set a character variable to missing CVAR='';