SAS Programming

bera00 · Posted 02-27-2018 11:57 AM

Hello all,

I would appreciate your help because I am a bit stuck in the sas programming. I have a data that involves the holdings composition of different funds (fund_id) at each quarter (stock_id , number and prices)

For each fund_id :

I would like to know for each stock held by a fund_id at each quarter , if the stock existed on a previous quarter or not (i.e. if the fund held the same stock in a previous quarter). If it is the case ( i.e. if the stock existed on a previous quarter for the same fund), I should recuperate the number and price of that stock in the previous quarter and creates for them new variables. If it's not the case I do nothing or I put zeros for the newly created variables.

my variables are as follow ( what I have):

date	fund_id	stock_id	number	price
3/31/2005	A	8916108	500	45
3/31/2005	A	226406106	20000	13.73
3/31/2005	A	25243Q205	9000	56.9
3/31/2005	A	292505104	15000	70.42
3/31/2005	A	633067103	4000	43.305
3/31/2005	A	767204100	1500	129.75
3/31/2005	A	874039100	700	32
3/31/2005	A	5458954	900	85
3/31/2005	A	874039100	1000	100
6/30/2005	A	25243Q205	2000	120
6/30/2005	A	292505104	500	150
6/30/2005	A	87403645	400	12
6/30/2005	A	645646	3000	11
6/30/2005	A	5454545	1200	13
3/31/2005	B	633067103	5000	43.305
3/31/2005	B	767204100	1500	129.75
3/31/2005	B	874039100	700	15
3/31/2005	B	77778	900	20
3/31/2005	B	121212	1000	40
3/31/2005	B	131313	2000	88
6/30/2005	B	44456	3200	45
6/30/2005	B	767204100	10000	36
6/30/2005	B	874039100	3333	100
6/30/2005	B	466666	20	14

I posted an excel sample data

What I want :

date	fund_id	stock_id	number	price	number1	price1
3/31/2005	A	8916108	500	45	0	0
3/31/2005	A	226406106	20000	13.73	0	0
3/31/2005	A	25243Q205	9000	56.9	0	0
3/31/2005	A	292505104	15000	70.42	0	0
3/31/2005	A	633067103	4000	43.305	0	0
3/31/2005	A	767204100	1500	129.75	0	0
3/31/2005	A	874039100	700	32	0	0
3/31/2005	A	5458954	900	85	0	0
3/31/2005	A	874039100	1000	100	0	0
6/30/2005	A	25243Q205	2000	120	9000	56.9
6/30/2005	A	292505104	500	150	15000	70.42
6/30/2005	A	87403645	400	12	0	0
6/30/2005	A	645646	3000	11	0	0
6/30/2005	A	5454545	1200	13	0	0
3/31/2005	B	633067103	5000	43.305	0	0
3/31/2005	B	767204100	1500	129.75	0	0
3/31/2005	B	874039100	700	15	0	0
3/31/2005	B	77778	900	20	0	0
3/31/2005	B	121212	1000	40	0	0
3/31/2005	B	131313	2000	88	0	0

Thank you in advance for your precious help.

mkeintz · Posted 02-27-2018 02:37 PM

Your data is sorted by date/fundid~~/stockid~~ (edited change: it's sorted by date/fundid, but not stockid within date/fundid. However that is not required for my suggested program to work. So don't think you need to sort the data any further than its current status).

If you have a large dataset of holdings, I would NOT sort by fundid/stock/date (only to re-sort back to original order). Instead you can use a hash object to dynamically hold the most recent date, price, and number of shares for each fund/stock combination. This code is untested:

data want (drop=_:);
  if _n_=1 then do;
    if 0 then set have 
                  have (rename=(date=_date1 number=number1 price=price1));
    declare hash h ();
      h.definekey('fund_id','stock_id');
      h.definedata('_date1','number1','price1');
      h.definedone();
  end;

  retain _last_qtr_date .;

  set have;
  by date ;
  if first.date then _last_qtr_date=intnx('qtr',date,-1,'end');

  _rc=h.find();
  if NOT(_rc=0 and _date1=_last_qtr_date) then do;
    number1=0;
    price1=.;
  end;  _rc=h.replace(key:fund_id,key:stock_id,data:date,data:number,data:price);
run;

Edited at 2:56PM, just now put in the essential _rc=h.replace .... statement.

Edited at 4:45PM. I think you will find this much faster than the PROC SQL, because the latter requires a Cartesian comparison of all dates within each fund/stock group. The advantage of PROC SQL is that it doesn't require pre-sorted data sets. But that also means it often will not take advantage of data that are pre-sorted.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Reeza · Posted 02-27-2018 12:31 PM

1. Sort by STOCK, date

2. Use LAG to retreive the previous value and/or date.

3. Check if the date is previous quarter, if so, flag as in previous quarter

4. Otherwise, set flag to 0.

Spoiler

@bera00 wrote:

Hello all,

I would appreciate your help because I am a bit stuck in the sas programming. I have a data that involves the holdings composition of different funds (fund_id) at each quarter (stock_id , number and prices)

For each fund_id :

I would like to know for each stock held by a fund_id at each quarter , if the stock existed on a previous quarter or not (i.e. if the fund held the same stock in a previous quarter). If it is the case ( i.e. if the stock existed on a previous quarter for the same fund), I should recuperate the number and price of that stock in the previous quarter and creates for them new variables. If it's not the case I do nothing or I put zeros for the newly created variables.

my variables are as follow ( what I have):

date fund_id stock_id number price

3/31/2005 A 8916108 500 45

3/31/2005 A 226406106 20000 13.73

3/31/2005 A 25243Q205 9000 56.9

3/31/2005 A 292505104 15000 70.42

3/31/2005 A 633067103 4000 43.305

3/31/2005 A 767204100 1500 129.75

3/31/2005 A 874039100 700 32

3/31/2005 A 5458954 900 85

3/31/2005 A 874039100 1000 100

6/30/2005 A 25243Q205 2000 120

6/30/2005 A 292505104 500 150

6/30/2005 A 87403645 400 12

6/30/2005 A 645646 3000 11

6/30/2005 A 5454545 1200 13

3/31/2005 B 633067103 5000 43.305

3/31/2005 B 767204100 1500 129.75

3/31/2005 B 874039100 700 15

3/31/2005 B 77778 900 20

3/31/2005 B 121212 1000 40

3/31/2005 B 131313 2000 88

6/30/2005 B 44456 3200 45

6/30/2005 B 767204100 10000 36

6/30/2005 B 874039100 3333 100

6/30/2005 B 466666 20 14

I posted an excel sample data

What I want :

date fund_id stock_id number price number1 price1

3/31/2005 A 8916108 500 45 0 0

3/31/2005 A 226406106 20000 13.73 0 0

3/31/2005 A 25243Q205 9000 56.9 0 0

3/31/2005 A 292505104 15000 70.42 0 0

3/31/2005 A 633067103 4000 43.305 0 0

3/31/2005 A 767204100 1500 129.75 0 0

3/31/2005 A 874039100 700 32 0 0

3/31/2005 A 5458954 900 85 0 0

3/31/2005 A 874039100 1000 100 0 0

6/30/2005 A 25243Q205 2000 120 9000 56.9

6/30/2005 A 292505104 500 150 15000 70.42

6/30/2005 A 87403645 400 12 0 0

6/30/2005 A 645646 3000 11 0 0

6/30/2005 A 5454545 1200 13 0 0

3/31/2005 B 633067103 5000 43.305 0 0

3/31/2005 B 767204100 1500 129.75 0 0

3/31/2005 B 874039100 700 15 0 0

3/31/2005 B 77778 900 20 0 0

3/31/2005 B 121212 1000 40 0 0

3/31/2005 B 131313 2000 88 0
0

Thank you in advance for your precious help.

wrote: Hello all, I would appreciate your help because I am a bit stuck in the sas programming. I have a data that involves the holdings composition of different funds (fund_id) at each quarter (stock_id , number and prices) For each fund_id : I would like to know for each stock held by a fund_id at each quarter , if the stock existed on a previous quarter or not (i.e. if the fund held the same stock in a previous quarter). If it is the case ( i.e. if the stock existed on a previous quarter for the same fund), I should recuperate the number and price of that stock in the previous quarter and creates for them new variables. If it's not the case I do nothing or I put zeros for the newly created variables. my variables are as follow ( what I have): date fund_id stock_id number price 3/31/2005 A 8916108 500 45 3/31/2005 A 226406106 20000 13.73 3/31/2005 A 25243Q205 9000 56.9 3/31/2005 A 292505104 15000 70.42 3/31/2005 A 633067103 4000 43.305 3/31/2005 A 767204100 1500 129.75 3/31/2005 A 874039100 700 32 3/31/2005 A 5458954 900 85 3/31/2005 A 874039100 1000 100 6/30/2005 A 25243Q205 2000 120 6/30/2005 A 292505104 500 150 6/30/2005 A 87403645 400 12 6/30/2005 A 645646 3000 11 6/30/2005 A 5454545 1200 13 3/31/2005 B 633067103 5000 43.305 3/31/2005 B 767204100 1500 129.75 3/31/2005 B 874039100 700 15 3/31/2005 B 77778 900 20 3/31/2005 B 121212 1000 40 3/31/2005 B 131313 2000 88 6/30/2005 B 44456 3200 45 6/30/2005 B 767204100 10000 36 6/30/2005 B 874039100 3333 100 6/30/2005 B 466666 20 14 I posted an excel sample data What I want : date fund_id stock_id number price number1 price1 3/31/2005 A 8916108 500 45 0 0 3/31/2005 A 226406106 20000 13.73 0 0 3/31/2005 A 25243Q205 9000 56.9 0 0 3/31/2005 A 292505104 15000 70.42 0 0 3/31/2005 A 633067103 4000 43.305 0 0 3/31/2005 A 767204100 1500 129.75 0 0 3/31/2005 A 874039100 700 32 0 0 3/31/2005 A 5458954 900 85 0 0 3/31/2005 A 874039100 1000 100 0 0 6/30/2005 A 25243Q205 2000 120 9000 56.9 6/30/2005 A 292505104 500 150 15000 70.42 6/30/2005 A 87403645 400 12 0 0 6/30/2005 A 645646 3000 11 0 0 6/30/2005 A 5454545 1200 13 0 0 3/31/2005 B 633067103 5000 43.305 0 0 3/31/2005 B 767204100 1500 129.75 0 0 3/31/2005 B 874039100 700 15 0 0 3/31/2005 B 77778 900 20 0 0 3/31/2005 B 121212 1000 40 0 0 3/31/2005 B 131313 2000 88 0 0 Thank you in advance for your precious help.

BrunoMueller · Posted 02-27-2018 01:01 PM

You can use Proc SQL and a self referencing join, where you compute the date of the previous quarter using the INTNX function.

Based on the information returned, you can fill the new variables with values from the previous quarter.

See here an example:

data have;
  infile cards dlm=",";
  seqNr + 1;
  input
    date : mmddyy10.
    fund_id	: $8.
    stock_id : $16.
    number : 8.
    price :8.
  ;
  format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;

proc sql feedback;
  select
    A.seqNr
    , A.date
    , A.fund_id
    , A.stock_id
    , A.number
    , A.price
    , B.seqNr
    , B.date
    , B.fund_id
    , B.stock_id
    , B.number
    , B.price
    , case
        when b.stock_id not is missing then b.number else 0
      end as previousNumner
    , case
        when b.stock_id not is missing then b.price else 0
      end as previousprice

  from
    have as a
    left outer join
    have as b
    on
      a.fund_id = b.fund_ID
      and a.stock_id = b.stock_id
      and b.date = intnx("qtr", a.date, -1, "E")
  order by
    a.date
    , a.fund_id
    , a.stock_id
  ;
quit;

bera00 · Posted 02-28-2018 06:15 AM

Thank you for your resonse. this method works

mkeintz · Posted 02-27-2018 02:37 PM

Your data is sorted by date/fundid~~/stockid~~ (edited change: it's sorted by date/fundid, but not stockid within date/fundid. However that is not required for my suggested program to work. So don't think you need to sort the data any further than its current status).

If you have a large dataset of holdings, I would NOT sort by fundid/stock/date (only to re-sort back to original order). Instead you can use a hash object to dynamically hold the most recent date, price, and number of shares for each fund/stock combination. This code is untested:

data want (drop=_:);
  if _n_=1 then do;
    if 0 then set have 
                  have (rename=(date=_date1 number=number1 price=price1));
    declare hash h ();
      h.definekey('fund_id','stock_id');
      h.definedata('_date1','number1','price1');
      h.definedone();
  end;

  retain _last_qtr_date .;

  set have;
  by date ;
  if first.date then _last_qtr_date=intnx('qtr',date,-1,'end');

  _rc=h.find();
  if NOT(_rc=0 and _date1=_last_qtr_date) then do;
    number1=0;
    price1=.;
  end;  _rc=h.replace(key:fund_id,key:stock_id,data:date,data:number,data:price);
run;

Edited at 2:56PM, just now put in the essential _rc=h.replace .... statement.

Edited at 4:45PM. I think you will find this much faster than the PROC SQL, because the latter requires a Cartesian comparison of all dates within each fund/stock group. The advantage of PROC SQL is that it doesn't require pre-sorted data sets. But that also means it often will not take advantage of data that are pre-sorted.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

bera00 · Posted 02-28-2018 06:16 AM

Thank you so much

this method works for my data

novinosrin · Posted 02-27-2018 06:03 PM

data have;
infile cards dlm=",";
input date : mmddyy10. fund_id	: $8. stock_id : $16.  number : 8.  price :8.;
format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;

proc sort data=have out=_have;
by fund_id stock_id date;
run;

data temp;
drop _:;
do until(last.stock_id);
set _have;
by fund_id stock_id date;
if first.stock_id then do;
number1=0;
price1=0;
end;
else if intnx('qtr',date,-1,'end')=_date1 then  do;
number1=_number1;
price1=_price1;
end;
_number1=number;
_price1=price;
_date1=date;
end;
run;

data want;
merge _have temp;
by fund_id stock_id date;
if missing(number1) and missing(price1) then do;
number1=0;
price1=0;
end;
run;

proc sort data=want out=final_want;
by fund_id date;
run;

mkeintz · Posted 02-27-2018 09:01 PM

Most mutual fund financial research projects I have seen cover 1,000's of funds, with 1,000's of stocks over 10-year plus time periods (40 quarters). A solution like this with multiple sorts is expensive in time, disk space, and disk input/output activity. It has four steps which read the full data set 4 times, writes a full data set 3 times, and invokes a proc sort 2 times.

Given that the original sort order of the data set can be utilized as is, I find it hard to imagine a situation in which this solution should be suggested for a production environment.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 02-28-2018 02:15 PM

data have;
infile cards dlm=",";
input date : mmddyy10. fund_id	: $8. stock_id : $16.  number : 8.  price :8.;
format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;


data want;
_n=0;
do n=1 by 1 until(last.fund_id);
set have;
by fund_id date;
number1=0;price1=0;
if first.date then _n+1;
array _t(25) $20;/*arbitrary subscript value*/
array _k(25,3);
if n=1 then call missing(of _t(*), of _k(*));
if _n>1 then do;
_f=whichc(stock_id,of _t(*));
	if _f>0 then do;
	if intnx('qtr',date,-1,'end')=_k(_f,3) then do;
	number1=_k(_f,1);
	price1=_k(_f,2);
	end;
end;
end;
_t(n)=stock_id;
_k(n,1)=number;
_k(n,2)=price;
_k(n,3)=date;
output;
end;
drop _: n; 
run;

SAS Programming

Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Re: Finding a stock within fund holdings in previous quarter

Compare this quarter vs previous quarter

Time-Series model using Stock Market Data

Codes for entry and exit of stocks in each quarter

SAS Macro Format YYYY/Q (-1 Previous Quarter)

Choosing the Previous Quarter End using INTNX

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...