## Help with a model

Frequent Contributor
Posts: 75

# Help with a model

Hello everyone,

I am working on a large database (5GB) of investors' portfolio holdings trying to implement a model. The first step is to calculate a fraction of the investors that are buyers, or else the Raw fraction of funds buying security k during month t (RawΔ"k,t " ):

RawΔ"k,t "=  (Number of funds buying security k during month t) / (Number of funds buying security k during month t + Number of funds selling security k during month t)

I can calculate this fraction  for each stock seperately but there are thousands of them. Is there anyway to do it at once, having one table with the fraction of each stock every month?

Here is the code I am using to do it for each stock:

proc sql; create table work.aa as select* from tmp2.example

where ticker= "AA";

quit;

proc sort data=aa out=new;

by ticker date sign mgrno;run;

PROC MEANS DATA=new NWAY noprint;

CLASS ticker;

VAR mgrno;

by date sign;

OUTPUT OUT=new2 N=cnt;

RUN;

data new3; set new2; where sign ne "0"; drop type freq;  run;

data sell (rename=(cnt=sell)drop=sign);set new3; where sign eq "-"; run;

proc sql; create table fraction as select* from buy as a left join sell as b on a.date=b.date and a.ticker=b.ticker;quit;

I am attaching a small sample of the data and an excel file showing how I used to do that that in excel (I also have to standardize this fraction)

Posts: 5,519

## Re: Help with a model

How about something simple like this:

proc sql;

create table fraction as

select

ticker,

intnx("MONTH",date,0) as month format=yyqd6.,

sum(sign="+")/(sum(sign="+")+sum(sign="-")) as fraction

from sasforum.CostaSg_05_2013

group by ticker, calculated month;

reset outobs=20;

select * from fraction;

quit;

PG

PG
Frequent Contributor
Posts: 75

## Re: Help with a model

That was awesome PGStats! Thanks!

The next step was to standardize this fraction which I did as following:

proc means data=fraction mean std nway noprint;

var fraction;

class month;

output out=want (drop=_

mean= std= /autoname;

run;

proc sql; create table have as

select* from fraction as a left join want as b on a.month=b.month;quit;

proc sql; create table need as select*,(fraction-fraction_Mean)/fraction_StdDEv as Standard_Fraction from have;quit;

I am attaching the file. Now the next step is the following:

For each quarter I need to run a cross-sectional (across securities) regression of the standardized fraction of investors buying security κ (Δκ,τ) in the current quarter on the standardized fraction of investors buying security κ the previous quarter (Δκ,τ-1).

So, we have this model:  Δκ,τ = βΔκ,τ-1 + ε

So for each month, I would like to have one column with the β coefficient, one column with the R2 and one column with Adj. R2.

Any ideas on that?

Posts: 5,519

## Re: Help with a model

Do you mean something like this:

proc sql;
create table fraction as
select
ticker,
intnx("QTR",date,0) as qtr format=yyqd7.,
sum(sign="+")/(sum(sign="+")+sum(sign="-")) as fraction
from sasforum.CostaSg_05_2013
group by calculated qtr, ticker
order by calculated qtr, ticker;
quit;

proc stdize data=fraction out=stdFraction sprefix=std;
by qtr;
var fraction;
run;

proc sql;
create table stdFrac2t as
select

A.ticker,

A.qtr,

A.stdFraction,

B.stdFraction as prevFraction
from

stdFraction as A inner join

stdFraction as B on A.ticker=B.ticker and intnx("QTR", A.qtr, -1)=B.qtr
order by qtr, ticker;
quit;

proc reg data=stdFrac2t outest=fracReg edf plots=none;
by qtr;
model stdFraction = prevFraction;
run;

The regression estimates and rsquares are in dataset fracReg. You don't need adjusted rsquare when estimating a single parameter (and an intercept).

PG

PG
Super User
Posts: 10,761

## Re: Help with a model

PG. I think it is a time serial data. Your REG model is not appropriate .

REG model assumed Δκ,τ and βΔκ,τ-1 are independent . But they are time serial data which means they are relational .

OP should post it at time-serial analysis forum .there are an expert named Udo will give you some advice .

Ksharp

Frequent Contributor
Posts: 75

## Re: Help with a model

Many thanks PGStats and Ksarp for your comment!

I am not familiar with the regressions in SAS and whether this is the correct specification of the model. However, the model should not have an intercept. It is an autoregressive model of order 1.

So I guess I should include / noint in the proc reg.

Frequent Contributor
Posts: 75

## Re: Help with a model

Now the final step of the model is this one:

where

K is the number of securities

σ(RawΔκ,t) is the cross-sectional standard deviation of the raw fraction of investors buying the security k in quarter t

σ(RawΔκ,t-1) is the cross-sectional standard deviation of the raw fraction of investors buying the security k in quarter t-1

RawΔκ,t bar  is the cross-sectional average raw fraction of institutions buying in quarter t

RawΔκ,t-1 bar  is the cross-sectional average raw fraction of institutions buying in quarter t-1

Dn,k,t is a dummy variable taking the value of one when investor n is a buyer of security k in quarter t and zero when fund n is a seller of security k in quarter t (in the example database a buyer is indicated as + and a seller as -)

Nk,t is the number of investors actively trading security k in quarter t

Any thoughts on this?

Super User
Posts: 10,761

## Re: Help with a model

Yes. I know. It is time-serial data analysis.

Why not post it at SAS Forecasting and Econometrics

Frequent Contributor
Posts: 75

## Re: Help with a model

I have post it there as well Ksarp, thanks.

But yes, the model must not be correct as I run some regressions manually and I got different results.

This is what PGStats initially wrote:

proc reg data=have outest=need edf plots=none;
by qtr;
model Fraction = prevFraction / noint;
run;

Based on the file (Have), the results are in the attached file (need)

I tried to check whether the results are correct. So, I did the process step-by-step. I only used the first 2 quarters 1980-N4 and 1981-N1 (deleting missing values) to see if I will get the same results (the dependent variable is the 1981_N1 and the independent is the 1980-N4); I attach the file (Test). If we run a regression in this file the results are actually different (attached file correct).

proc reg data=have outest=need edf plots=none;

model _1981N1 = _1980N4 / noint;

run;

Super User
Posts: 10,761

## Re: Help with a model

Hi. Here are almost nobody has time-serial data analysis experience at this forum.

Although I major in economic and finance, I have little time to learn these statistical knowledge.

at forum SAS Forecasting and Econometrics

there is an expert at this field named udo@sas

Frequent Contributor
Posts: 75

## Re: Help with a model

Actually,

What PGstats wrote about the model is correct! The wrong part is this one:

proc sql;
create table stdFrac2t as
select

A.ticker,

A.qtr,

A.stdFraction,

B.stdFraction as prevFraction
from

stdFraction as A inner join

stdFraction as B on A.ticker=B.ticker and intnx("QTR", A.qtr, -1)=B.qtr
order by qtr, ticker;
quit;

The problem is that it does not calculate correctly the previous fraction. The first couple of stocks are correct but then it is mixed up. Maybe because some stocks are not traded during some quarters.

So, any ideas how this can be solved?

Posts: 5,519

## Re: Help with a model

HI, I looked at the results sommarily and didn't find anything wrong. Can you give an example (ticker, quarter) that is mixed up in the results of that query?

PG
Frequent Contributor
Posts: 75

## Re: Help with a model

Hi PGStats,

For example AAPL, in 1981-1 the fraction is 0.7142857143 (the previous fraction is correct)

whereas in 1981-2 the fraction is 0.8222222 and the previous fraction is 0.295080909 (whereas it should be 0.7142857143)

Posts: 5,519

## Re: Help with a model

You are confusing the fractions and the standardized fractions. PrevFraction is the previous standardized fraction.  Here is what I get for AAPL :

TICKER      qtr  fraction  stdfraction  prevFraction

----------------------------------------------------

AAPL     1980-4         1     1.292816             .

AAPL     1981-1  0.714286     0.295081      1.292816

AAPL     1981-2  0.822222     0.678854      0.295081

AAPL     1981-3  0.636364     0.083371      0.678854

AAPL     1981-4  0.723404     0.313684      0.083371

AAPL     1982-1  0.622642     0.020717      0.313684

AAPL     1982-2  0.560976     -0.31263      0.020717

PG
Frequent Contributor
Posts: 75

## Re: Help with a model

PGStats you are right! Many thanks!

The problem was that when i standardized the fraction using:

proc stdize data=fraction out=stdFraction sprefix=std;
by qtr;
var fraction;
run;

I got this error (i don't know why this didn't work):

```proc stdize data=fraction out=stdFraction sprefix=std;
-------
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, (, ADD, DATA, EXTREME, FUZZ,
INITIAL, KEEPLEN, MAXITER, METHOD, MISSING, MULT, NMARKERS, NOMISS, NORM, OUT,
OUTSTAT, PCTLDEF, PCTLMTD, PCTLPTS, PMARKERS, PRINT, PSTAT, QMARKERS, REPLACE,
REPONLY, SNORM, UNSTD, UNSTDIZE, VARDEF.
ERROR 76-322: Syntax error, statement will be ignored.
147  by qtr;
148  var fraction;
149  run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE STDIZE used (Total process time):
real time           0.00 seconds
cpu time            0.01 seconds

```

So I skipped that and then made a mistake when changed the names in the next query.

The next step is to calculate expression M (as shown above)

For the first term I have this code:

proc freq data= tmp1.example;

tables date / out=K noprint;

run;

proc sql;

create table K1 as

select date, count,

intnx("QTR",date,0) as qtr format=yyqd7.,count-1 as K_1

from K ;

quit;

proc means data=fraction mean std nway noprint;

var fraction;

class qtr;

output out=want (drop=_ mean= std= /autoname;

run;

data a3;

set want;

set K1 (keep=K_1);

frmean_1= lag(fraction_mean);

frstddev_1=lag(fraction_stddev);

run;

data a4;

set a3;

coefficient= 1/(k_1*fraction_stddev*frstddev_1);

run;

Do you have any suggestions on making this shorter?