- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I would appreciate your help because I am a bit stuck in the sas programming. I have a data that involves the holdings composition of different funds (fund_id) at each quarter (stock_id , number and prices)
For each fund_id :
I would like to know for each stock held by a fund_id at each quarter , if the stock existed on a previous quarter or not (i.e. if the fund held the same stock in a previous quarter). If it is the case ( i.e. if the stock existed on a previous quarter for the same fund), I should recuperate the number and price of that stock in the previous quarter and creates for them new variables. If it's not the case I do nothing or I put zeros for the newly created variables.
my variables are as follow ( what I have):
date | fund_id | stock_id | number | price |
3/31/2005 | A | 8916108 | 500 | 45 |
3/31/2005 | A | 226406106 | 20000 | 13.73 |
3/31/2005 | A | 25243Q205 | 9000 | 56.9 |
3/31/2005 | A | 292505104 | 15000 | 70.42 |
3/31/2005 | A | 633067103 | 4000 | 43.305 |
3/31/2005 | A | 767204100 | 1500 | 129.75 |
3/31/2005 | A | 874039100 | 700 | 32 |
3/31/2005 | A | 5458954 | 900 | 85 |
3/31/2005 | A | 874039100 | 1000 | 100 |
6/30/2005 | A | 25243Q205 | 2000 | 120 |
6/30/2005 | A | 292505104 | 500 | 150 |
6/30/2005 | A | 87403645 | 400 | 12 |
6/30/2005 | A | 645646 | 3000 | 11 |
6/30/2005 | A | 5454545 | 1200 | 13 |
3/31/2005 | B | 633067103 | 5000 | 43.305 |
3/31/2005 | B | 767204100 | 1500 | 129.75 |
3/31/2005 | B | 874039100 | 700 | 15 |
3/31/2005 | B | 77778 | 900 | 20 |
3/31/2005 | B | 121212 | 1000 | 40 |
3/31/2005 | B | 131313 | 2000 | 88 |
6/30/2005 | B | 44456 | 3200 | 45 |
6/30/2005 | B | 767204100 | 10000 | 36 |
6/30/2005 | B | 874039100 | 3333 | 100 |
6/30/2005 | B | 466666 | 20 | 14 |
I posted an excel sample data
What I want :
date | fund_id | stock_id | number | price | number1 | price1 |
3/31/2005 | A | 8916108 | 500 | 45 | 0 | 0 |
3/31/2005 | A | 226406106 | 20000 | 13.73 | 0 | 0 |
3/31/2005 | A | 25243Q205 | 9000 | 56.9 | 0 | 0 |
3/31/2005 | A | 292505104 | 15000 | 70.42 | 0 | 0 |
3/31/2005 | A | 633067103 | 4000 | 43.305 | 0 | 0 |
3/31/2005 | A | 767204100 | 1500 | 129.75 | 0 | 0 |
3/31/2005 | A | 874039100 | 700 | 32 | 0 | 0 |
3/31/2005 | A | 5458954 | 900 | 85 | 0 | 0 |
3/31/2005 | A | 874039100 | 1000 | 100 | 0 | 0 |
6/30/2005 | A | 25243Q205 | 2000 | 120 | 9000 | 56.9 |
6/30/2005 | A | 292505104 | 500 | 150 | 15000 | 70.42 |
6/30/2005 | A | 87403645 | 400 | 12 | 0 | 0 |
6/30/2005 | A | 645646 | 3000 | 11 | 0 | 0 |
6/30/2005 | A | 5454545 | 1200 | 13 | 0 | 0 |
3/31/2005 | B | 633067103 | 5000 | 43.305 | 0 | 0 |
3/31/2005 | B | 767204100 | 1500 | 129.75 | 0 | 0 |
3/31/2005 | B | 874039100 | 700 | 15 | 0 | 0 |
3/31/2005 | B | 77778 | 900 | 20 | 0 | 0 |
3/31/2005 | B | 121212 | 1000 | 40 | 0 | 0 |
3/31/2005 | B | 131313 | 2000 | 88 | 0 | 0
|
Thank you in advance for your precious help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your data is sorted by date/fundid/stockid (edited change: it's sorted by date/fundid, but not stockid within date/fundid. However that is not required for my suggested program to work. So don't think you need to sort the data any further than its current status).
If you have a large dataset of holdings, I would NOT sort by fundid/stock/date (only to re-sort back to original order). Instead you can use a hash object to dynamically hold the most recent date, price, and number of shares for each fund/stock combination. This code is untested:
data want (drop=_:);
if _n_=1 then do;
if 0 then set have
have (rename=(date=_date1 number=number1 price=price1));
declare hash h ();
h.definekey('fund_id','stock_id');
h.definedata('_date1','number1','price1');
h.definedone();
end;
retain _last_qtr_date .;
set have;
by date ;
if first.date then _last_qtr_date=intnx('qtr',date,-1,'end');
_rc=h.find();
if NOT(_rc=0 and _date1=_last_qtr_date) then do;
number1=0;
price1=.;
end; _rc=h.replace(key:fund_id,key:stock_id,data:date,data:number,data:price);
run;
Edited at 2:56PM, just now put in the essential _rc=h.replace .... statement.
Edited at 4:45PM. I think you will find this much faster than the PROC SQL, because the latter requires a Cartesian comparison of all dates within each fund/stock group. The advantage of PROC SQL is that it doesn't require pre-sorted data sets. But that also means it often will not take advantage of data that are pre-sorted.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1. Sort by STOCK, date
2. Use LAG to retreive the previous value and/or date.
3. Check if the date is previous quarter, if so, flag as in previous quarter
4. Otherwise, set flag to 0.
@bera00 wrote:
Hello all,
I would appreciate your help because I am a bit stuck in the sas programming. I have a data that involves the holdings composition of different funds (fund_id) at each quarter (stock_id , number and prices)
For each fund_id :
I would like to know for each stock held by a fund_id at each quarter , if the stock existed on a previous quarter or not (i.e. if the fund held the same stock in a previous quarter). If it is the case ( i.e. if the stock existed on a previous quarter for the same fund), I should recuperate the number and price of that stock in the previous quarter and creates for them new variables. If it's not the case I do nothing or I put zeros for the newly created variables.
my variables are as follow ( what I have):
date fund_id stock_id number price 3/31/2005 A 8916108 500 45 3/31/2005 A 226406106 20000 13.73 3/31/2005 A 25243Q205 9000 56.9 3/31/2005 A 292505104 15000 70.42 3/31/2005 A 633067103 4000 43.305 3/31/2005 A 767204100 1500 129.75 3/31/2005 A 874039100 700 32 3/31/2005 A 5458954 900 85 3/31/2005 A 874039100 1000 100 6/30/2005 A 25243Q205 2000 120 6/30/2005 A 292505104 500 150 6/30/2005 A 87403645 400 12 6/30/2005 A 645646 3000 11 6/30/2005 A 5454545 1200 13 3/31/2005 B 633067103 5000 43.305 3/31/2005 B 767204100 1500 129.75 3/31/2005 B 874039100 700 15 3/31/2005 B 77778 900 20 3/31/2005 B 121212 1000 40 3/31/2005 B 131313 2000 88 6/30/2005 B 44456 3200 45 6/30/2005 B 767204100 10000 36 6/30/2005 B 874039100 3333 100 6/30/2005 B 466666 20 14
I posted an excel sample data
What I want :
date fund_id stock_id number price number1 price1 3/31/2005 A 8916108 500 45 0 0 3/31/2005 A 226406106 20000 13.73 0 0 3/31/2005 A 25243Q205 9000 56.9 0 0 3/31/2005 A 292505104 15000 70.42 0 0 3/31/2005 A 633067103 4000 43.305 0 0 3/31/2005 A 767204100 1500 129.75 0 0 3/31/2005 A 874039100 700 32 0 0 3/31/2005 A 5458954 900 85 0 0 3/31/2005 A 874039100 1000 100 0 0 6/30/2005 A 25243Q205 2000 120 9000 56.9 6/30/2005 A 292505104 500 150 15000 70.42 6/30/2005 A 87403645 400 12 0 0 6/30/2005 A 645646 3000 11 0 0 6/30/2005 A 5454545 1200 13 0 0 3/31/2005 B 633067103 5000 43.305 0 0 3/31/2005 B 767204100 1500 129.75 0 0 3/31/2005 B 874039100 700 15 0 0 3/31/2005 B 77778 900 20 0 0 3/31/2005 B 121212 1000 40 0 0 3/31/2005 B 131313 2000 88 0 0
Thank you in advance for your precious help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use Proc SQL and a self referencing join, where you compute the date of the previous quarter using the INTNX function.
Based on the information returned, you can fill the new variables with values from the previous quarter.
See here an example:
data have;
infile cards dlm=",";
seqNr + 1;
input
date : mmddyy10.
fund_id : $8.
stock_id : $16.
number : 8.
price :8.
;
format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;
proc sql feedback;
select
A.seqNr
, A.date
, A.fund_id
, A.stock_id
, A.number
, A.price
, B.seqNr
, B.date
, B.fund_id
, B.stock_id
, B.number
, B.price
, case
when b.stock_id not is missing then b.number else 0
end as previousNumner
, case
when b.stock_id not is missing then b.price else 0
end as previousprice
from
have as a
left outer join
have as b
on
a.fund_id = b.fund_ID
and a.stock_id = b.stock_id
and b.date = intnx("qtr", a.date, -1, "E")
order by
a.date
, a.fund_id
, a.stock_id
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your resonse. this method works
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your data is sorted by date/fundid/stockid (edited change: it's sorted by date/fundid, but not stockid within date/fundid. However that is not required for my suggested program to work. So don't think you need to sort the data any further than its current status).
If you have a large dataset of holdings, I would NOT sort by fundid/stock/date (only to re-sort back to original order). Instead you can use a hash object to dynamically hold the most recent date, price, and number of shares for each fund/stock combination. This code is untested:
data want (drop=_:);
if _n_=1 then do;
if 0 then set have
have (rename=(date=_date1 number=number1 price=price1));
declare hash h ();
h.definekey('fund_id','stock_id');
h.definedata('_date1','number1','price1');
h.definedone();
end;
retain _last_qtr_date .;
set have;
by date ;
if first.date then _last_qtr_date=intnx('qtr',date,-1,'end');
_rc=h.find();
if NOT(_rc=0 and _date1=_last_qtr_date) then do;
number1=0;
price1=.;
end; _rc=h.replace(key:fund_id,key:stock_id,data:date,data:number,data:price);
run;
Edited at 2:56PM, just now put in the essential _rc=h.replace .... statement.
Edited at 4:45PM. I think you will find this much faster than the PROC SQL, because the latter requires a Cartesian comparison of all dates within each fund/stock group. The advantage of PROC SQL is that it doesn't require pre-sorted data sets. But that also means it often will not take advantage of data that are pre-sorted.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much
this method works for my data
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile cards dlm=",";
input date : mmddyy10. fund_id : $8. stock_id : $16. number : 8. price :8.;
format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;
proc sort data=have out=_have;
by fund_id stock_id date;
run;
data temp;
drop _:;
do until(last.stock_id);
set _have;
by fund_id stock_id date;
if first.stock_id then do;
number1=0;
price1=0;
end;
else if intnx('qtr',date,-1,'end')=_date1 then do;
number1=_number1;
price1=_price1;
end;
_number1=number;
_price1=price;
_date1=date;
end;
run;
data want;
merge _have temp;
by fund_id stock_id date;
if missing(number1) and missing(price1) then do;
number1=0;
price1=0;
end;
run;
proc sort data=want out=final_want;
by fund_id date;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Most mutual fund financial research projects I have seen cover 1,000's of funds, with 1,000's of stocks over 10-year plus time periods (40 quarters). A solution like this with multiple sorts is expensive in time, disk space, and disk input/output activity. It has four steps which read the full data set 4 times, writes a full data set 3 times, and invokes a proc sort 2 times.
Given that the original sort order of the data set can be utilized as is, I find it hard to imagine a situation in which this solution should be suggested for a production environment.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile cards dlm=",";
input date : mmddyy10. fund_id : $8. stock_id : $16. number : 8. price :8.;
format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;
data want;
_n=0;
do n=1 by 1 until(last.fund_id);
set have;
by fund_id date;
number1=0;price1=0;
if first.date then _n+1;
array _t(25) $20;/*arbitrary subscript value*/
array _k(25,3);
if n=1 then call missing(of _t(*), of _k(*));
if _n>1 then do;
_f=whichc(stock_id,of _t(*));
if _f>0 then do;
if intnx('qtr',date,-1,'end')=_k(_f,3) then do;
number1=_k(_f,1);
price1=_k(_f,2);
end;
end;
end;
_t(n)=stock_id;
_k(n,1)=number;
_k(n,2)=price;
_k(n,3)=date;
output;
end;
drop _: n;
run;