turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Estimate standard deviation quarter by quarter

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-11-2017 06:19 AM - edited 12-11-2017 08:24 AM

Hi.

I have a (company ID-time) panel data. Now I have to calculate standard deviation of variable COGSQ from quarter I/1983 to quarter II/1983, and then quarter I/1983 to quarter III/1983, and so on.....

So basically, for company with GVKEY 001001;

t=0 is quarter I/1983

at t=1 (QII/1983), I need to find standard deviation of X from t=0 to t=1

at t=2 (QIII/1983), std of X from t=0 to t=2

and so on,

Do you have any idea how to do this?

So the dataset is like this

gvkey | datadate | Company | COGSQ | STD |

1001 | 19830331 | A | 1.258 | NA |

1001 | 19830630 | A | 1.4 | std from QI/1983 to QII/1983 |

1001 | 19830930 | A | 1.5 | std from QI/1983 to QIII/1983 |

1001 | 19831231 | A | 1.6 | |

…….. | …… | ……. | ……. | |

1001 | 19860331 | A | 2.3 | |

1002 | 19840430 | B | 3.5 | |

1002 | 19840731 | B | 6.5 | |

….. | …… | ….. | ….. |

Accepted Solutions

Solution

12-12-2017
07:24 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peter_C

12-11-2017 10:58 AM - edited 12-11-2017 11:03 AM

my code with line breaks to make it easier to read

```
data progressive_stddevs ;
do item= 1 by 1 until( last.company ) ;
set your_data ;
by company ;
array cogX(1200) ; * 1200 months, probably enough per company? ;
cogX(item) = cogsq ;
n_cogs = n( of cogX(*) ) ;
if n_cogs > 1 then COGSTD = std( of cogX(*) ) ;
output ;
end ;
drop cogX: ;
```

run ;

the array is filled progressively until end of the company, when it will be emptied by restarting the data step iteration

The STD() function ignores empty cells in the array

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-11-2017 06:26 AM

Do you have SAS/ETS? If so check out PROC EXPAND

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to draycut

12-11-2017 07:51 AM - edited 12-11-2017 08:01 AM

Hi, I just check the PROC EXPAND but it is for transform/interpolate variables, so I dont see how this related to my case. Could you explain more for me? Please, I'm a newbie

Please note that this is different from rolling window standard deviation. Rolling window std has a fixed number of observations (fixed window range) but in this situation, the window range is larger when time progress

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-11-2017 08:02 AM

PROC EXPAND is indeed for transforming/interpolating variables and you want to transform a variable

Take a look at the Transformation Operators site at the documentation. You want to create a moving stadard deviation, so the MOVSTD operator might be worth taking a look at.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-11-2017 08:02 AM

Post some sample of your data (not in the form of a picture) if you want some code answer

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to draycut

12-11-2017 08:24 AM

Hi. I just post a sample of my data. Could you help me please?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-11-2017 09:07 AM

Ah, I understand. You can still use PROC EXPAND, but this may be simpler to do in a data step like this:

```
data have;
input gvkey$ datadate:yymmdd10. Company$ COGSQ;
quarter=qtr(datadate);
year=year(datadate);
format datadate yymmdd10.;
datalines;
1001 19830331 A 1.258
1001 19830630 A 1.4
1001 19830930 A 1.5
1001 19831231 A 1.6
1001 19860331 A 2.3
1002 19840430 B 3.5
1002 19840731 B 6.5
;
proc sort data=have;
by gvkey year quarter;
run;
data want;
set have;
by gvkey;
if first.gvkey then COGSQ_first=COGSQ;
if not first.gvkey then std_dev = std(COGSQ_first, COGSQ);
retain COGSQ_first;
drop COGSQ_first;
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-11-2017 09:37 AM

Thank you, but I am confused a little bit about the line:

if not first.gvkey then std_dev = std(COGSQ_first, COGSQ);

So assume that it is on quarter III/1983, so this line will give me std_dev of COGSQ for 3 observations: Q1, Q2, Q3/1983 or just std_dev of 2 observations:Q1 and Q3/1983?

if not first.gvkey then std_dev = std(COGSQ_first, COGSQ);

So assume that it is on quarter III/1983, so this line will give me std_dev of COGSQ for 3 observations: Q1, Q2, Q3/1983 or just std_dev of 2 observations:Q1 and Q3/1983?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-11-2017 10:52 AM

if this is only qtr/monthly intervals, it might be a good case to demonstrate array handling. SAS/IML would also simplify the resultant solution. What follows uses only a data step and assumes the monthly data arrive in month order within each company

`data progressive_stddevs ;`

do item= 1 by 1 until( last.company ) ;

set your_data ;

by company ;

array cogX(1200) ; * 1200 months, probably enough per company? ;

cogX(item) = cogsq ;

n_cogs = n( of cogX(*) ) ;

if n_cogs > 1 then COGSTD = std( of cogX(*) ) ;

output ;

end ;

drop cogX: ;

run ;

the array is filled progressively until end of the company, when it will be emptied by restarting the data step iteration

The STD() function ignores empty cells in the array

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peter_C

12-11-2017 10:53 AM

sorry I posted code from SAS Studio (so line-breaks are missing)

Shortly I'll post the code with line breaks

Shortly I'll post the code with line breaks

Solution

12-12-2017
07:24 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peter_C

12-11-2017 10:58 AM - edited 12-11-2017 11:03 AM

my code with line breaks to make it easier to read

```
data progressive_stddevs ;
do item= 1 by 1 until( last.company ) ;
set your_data ;
by company ;
array cogX(1200) ; * 1200 months, probably enough per company? ;
cogX(item) = cogsq ;
n_cogs = n( of cogX(*) ) ;
if n_cogs > 1 then COGSTD = std( of cogX(*) ) ;
output ;
end ;
drop cogX: ;
```

run ;

the array is filled progressively until end of the company, when it will be emptied by restarting the data step iteration

The STD() function ignores empty cells in the array

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peter_C

12-11-2017 09:25 PM

Thanks. I will try that. But each company has different number of observations. For example, company 1 has data from Q1/1983 to Q1/1986 (13 observations), company 2 has data Q1/1984 to QI/2000 (64 observations). So can I still use array cogX(1200)?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-11-2017 10:53 PM

You want the std of within-company quarterly changes in cost-of-goods-sold (COGSQ), in which each quarter may have a slightly different population of companies (with id variable GVKEY).

Your data are sorted by GVKEY/DATADATE. So if you calculate every quarterly change (use the DIF function, where DIF(x)=X-lag(X)), then just keep every record but the first record for each GVKEY. Also make sure datadate and lag(datadate) are exactly 3 months apart (to avoid missed quarters or misalignment due to companies changing fiscal year definitions).

Once you've made this data set (call it NEED), then run proc means on it, with class datadate. Using the ODS output statement, and the STACKODSOUTPUT option puts data set file WANT in a nice format:

```
data need /view=need;
set have;
by gvkey datadate;
cogsq_change=dif(cogsq);
months_elapsed = intck('month',lag(datadate),datadate);
if first.gvkey=0 and months_elapsed=3;
run;
ods output summary=want;
ods listing close;
proc means data=need stackodsoutput;
class datadate;
var cogsq_change;
run;
ods output close;
ods listing;
```

This will produce 5 variables for each datadate, one each for N, MEAN, STD, MIN, and MAX value of cogsq_change. The datadate identified the last date of the quarterly change. If you want more statistics, then make a list of wanted stats on the proc means statement.

The ODS LISTING CLOSE, and ODS LISTING avoids the (presumably unwanted) printing of all those stats. Instead the **ods*** output summary=want;* statement tells sas to put those results into a data set file named want.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-12-2017 05:06 AM

Reply to @trungcva112

The array can be a fixed size. That needs to be large enough for the company with the most history. I thought 1200, to allow for 100 years of monthly data. If there is less history it is not a problem.

Like all SAS statistical functions only non-missing values will be used.

At the start of each data step iteration the whole array is set missing. Each value is inserted into the array just after each set statement and a STD() calculated. Each data step iteration ends when all rows of a company have been read.

The array can be a fixed size. That needs to be large enough for the company with the most history. I thought 1200, to allow for 100 years of monthly data. If there is less history it is not a problem.

Like all SAS statistical functions only non-missing values will be used.

At the start of each data step iteration the whole array is set missing. Each value is inserted into the array just after each set statement and a STD() calculated. Each data step iteration ends when all rows of a company have been read.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-12-2017 05:09 AM

There might be a problem if a company has tlmore than one period of history. That data step would treat all data for a company as one history