BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bera00
Obsidian | Level 7

Hello all,

I would appreciate your help because I am a bit stuck in the sas programming. I have a data that involves the holdings composition of different funds (fund_id) at each quarter (stock_id , number and  prices) 

For each fund_id :

I would like to know for each stock held by a fund_id  at each quarter , if the stock existed on a previous quarter or not (i.e. if the fund held the same stock in a previous quarter). If it is the case ( i.e. if the stock existed on a previous quarter for the same fund), I should recuperate the number and price of that stock  in the previous quarter and creates for them new variables. If it's not the case  I do nothing or I put zeros for the newly created variables.

 

my variables are as follow ( what I have):

 

datefund_idstock_idnumberprice
3/31/2005A891610850045
3/31/2005A2264061062000013.73
3/31/2005A25243Q205900056.9
3/31/2005A2925051041500070.42
3/31/2005A633067103400043.305
3/31/2005A7672041001500129.75
3/31/2005A87403910070032
3/31/2005A545895490085
3/31/2005A8740391001000100
6/30/2005A25243Q2052000120
6/30/2005A292505104500150
6/30/2005A8740364540012
6/30/2005A645646300011
6/30/2005A5454545120013
3/31/2005B633067103500043.305
3/31/2005B7672041001500129.75
3/31/2005B87403910070015
3/31/2005B7777890020
3/31/2005B121212100040
3/31/2005B131313200088
6/30/2005B44456320045
6/30/2005B7672041001000036
6/30/2005B8740391003333100
6/30/2005B4666662014

 

I posted an excel sample data

 

What I want :

datefund_idstock_idnumberpricenumber1price1
3/31/2005A89161085004500
3/31/2005A2264061062000013.7300
3/31/2005A25243Q205900056.900
3/31/2005A2925051041500070.4200
3/31/2005A633067103400043.30500
3/31/2005A7672041001500129.7500
3/31/2005A8740391007003200
3/31/2005A54589549008500
3/31/2005A874039100100010000
6/30/2005A25243Q2052000120900056.9
6/30/2005A2925051045001501500070.42
6/30/2005A874036454001200
6/30/2005A64564630001100
6/30/2005A545454512001300
3/31/2005B633067103500043.30500
3/31/2005B7672041001500129.7500
3/31/2005B8740391007001500
3/31/2005B777789002000
3/31/2005B12121210004000
3/31/2005B1313132000880

0

 

 

 

Thank you in advance for your precious help.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Your data is sorted by date/fundid/stockid (edited change: it's sorted by date/fundid, but not stockid within date/fundid.  However that is not required for my suggested program to work.  So don't think you need to sort the data any further than its current status).

 

If you have a large dataset of holdings, I would NOT sort by fundid/stock/date  (only to re-sort back to original order).  Instead you can use a hash object to dynamically hold the most recent date, price, and number of shares for each fund/stock combination.  This code is untested:

 

data want (drop=_:);
  if _n_=1 then do;
    if 0 then set have 
                  have (rename=(date=_date1 number=number1 price=price1));
    declare hash h ();
      h.definekey('fund_id','stock_id');
      h.definedata('_date1','number1','price1');
      h.definedone();
  end;

  retain _last_qtr_date .;

  set have;
  by date ;
  if first.date then _last_qtr_date=intnx('qtr',date,-1,'end');

  _rc=h.find();
  if NOT(_rc=0 and _date1=_last_qtr_date) then do;
    number1=0;
    price1=.;
  end;

_rc=h.replace(key:fund_id,key:stock_id,data:date,data:number,data:price); run;

 

Edited at 2:56PM,  just now put in the essential  _rc=h.replace ....    statement.

 

Edited at 4:45PM.  I think you will find this much faster than the PROC SQL, because the latter requires a Cartesian comparison of all dates within each fund/stock group.  The advantage of PROC SQL is that it doesn't require pre-sorted data sets.  But that also means it often will not take advantage of data that are pre-sorted.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
Reeza
Super User

1. Sort by STOCK, date

2. Use LAG to retreive the previous value and/or date.

3. Check if the date is previous quarter, if so, flag as in previous quarter

4. Otherwise, set flag to 0. 

 

Spoiler

@bera00 wrote:

Hello all,

I would appreciate your help because I am a bit stuck in the sas programming. I have a data that involves the holdings composition of different funds (fund_id) at each quarter (stock_id , number and  prices) 

For each fund_id :

I would like to know for each stock held by a fund_id  at each quarter , if the stock existed on a previous quarter or not (i.e. if the fund held the same stock in a previous quarter). If it is the case ( i.e. if the stock existed on a previous quarter for the same fund), I should recuperate the number and price of that stock  in the previous quarter and creates for them new variables. If it's not the case  I do nothing or I put zeros for the newly created variables.

 

my variables are as follow ( what I have):

 

date fund_id stock_id number price
3/31/2005 A 8916108 500 45
3/31/2005 A 226406106 20000 13.73
3/31/2005 A 25243Q205 9000 56.9
3/31/2005 A 292505104 15000 70.42
3/31/2005 A 633067103 4000 43.305
3/31/2005 A 767204100 1500 129.75
3/31/2005 A 874039100 700 32
3/31/2005 A 5458954 900 85
3/31/2005 A 874039100 1000 100
6/30/2005 A 25243Q205 2000 120
6/30/2005 A 292505104 500 150
6/30/2005 A 87403645 400 12
6/30/2005 A 645646 3000 11
6/30/2005 A 5454545 1200 13
3/31/2005 B 633067103 5000 43.305
3/31/2005 B 767204100 1500 129.75
3/31/2005 B 874039100 700 15
3/31/2005 B 77778 900 20
3/31/2005 B 121212 1000 40
3/31/2005 B 131313 2000 88
6/30/2005 B 44456 3200 45
6/30/2005 B 767204100 10000 36
6/30/2005 B 874039100 3333 100
6/30/2005 B 466666 20 14

 

I posted an excel sample data

 

What I want :

date fund_id stock_id number price number1 price1
3/31/2005 A 8916108 500 45 0 0
3/31/2005 A 226406106 20000 13.73 0 0
3/31/2005 A 25243Q205 9000 56.9 0 0
3/31/2005 A 292505104 15000 70.42 0 0
3/31/2005 A 633067103 4000 43.305 0 0
3/31/2005 A 767204100 1500 129.75 0 0
3/31/2005 A 874039100 700 32 0 0
3/31/2005 A 5458954 900 85 0 0
3/31/2005 A 874039100 1000 100 0 0
6/30/2005 A 25243Q205 2000 120 9000 56.9
6/30/2005 A 292505104 500 150 15000 70.42
6/30/2005 A 87403645 400 12 0 0
6/30/2005 A 645646 3000 11 0 0
6/30/2005 A 5454545 1200 13 0 0
3/31/2005 B 633067103 5000 43.305 0 0
3/31/2005 B 767204100 1500 129.75 0 0
3/31/2005 B 874039100 700 15 0 0
3/31/2005 B 77778 900 20 0 0
3/31/2005 B 121212 1000 40 0 0
3/31/2005 B 131313 2000 88 0

0

 

 

 

Thank you in advance for your precious help.


 

BrunoMueller
SAS Super FREQ

You can use Proc SQL and a self referencing join, where you compute the date of the previous quarter using the INTNX function.

Based on the information returned, you can fill the new variables with values from the previous quarter.

 

See here an example:

data have;
  infile cards dlm=",";
  seqNr + 1;
  input
    date : mmddyy10.
    fund_id	: $8.
    stock_id : $16.
    number : 8.
    price :8.
  ;
  format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;

proc sql feedback;
  select
    A.seqNr
    , A.date
    , A.fund_id
    , A.stock_id
    , A.number
    , A.price
    , B.seqNr
    , B.date
    , B.fund_id
    , B.stock_id
    , B.number
    , B.price
    , case
        when b.stock_id not is missing then b.number else 0
      end as previousNumner
    , case
        when b.stock_id not is missing then b.price else 0
      end as previousprice

  from
    have as a
    left outer join
    have as b
    on
      a.fund_id = b.fund_ID
      and a.stock_id = b.stock_id
      and b.date = intnx("qtr", a.date, -1, "E")
  order by
    a.date
    , a.fund_id
    , a.stock_id
  ;
quit;
    
bera00
Obsidian | Level 7

Thank you for your resonse. this method works

mkeintz
PROC Star

Your data is sorted by date/fundid/stockid (edited change: it's sorted by date/fundid, but not stockid within date/fundid.  However that is not required for my suggested program to work.  So don't think you need to sort the data any further than its current status).

 

If you have a large dataset of holdings, I would NOT sort by fundid/stock/date  (only to re-sort back to original order).  Instead you can use a hash object to dynamically hold the most recent date, price, and number of shares for each fund/stock combination.  This code is untested:

 

data want (drop=_:);
  if _n_=1 then do;
    if 0 then set have 
                  have (rename=(date=_date1 number=number1 price=price1));
    declare hash h ();
      h.definekey('fund_id','stock_id');
      h.definedata('_date1','number1','price1');
      h.definedone();
  end;

  retain _last_qtr_date .;

  set have;
  by date ;
  if first.date then _last_qtr_date=intnx('qtr',date,-1,'end');

  _rc=h.find();
  if NOT(_rc=0 and _date1=_last_qtr_date) then do;
    number1=0;
    price1=.;
  end;

_rc=h.replace(key:fund_id,key:stock_id,data:date,data:number,data:price); run;

 

Edited at 2:56PM,  just now put in the essential  _rc=h.replace ....    statement.

 

Edited at 4:45PM.  I think you will find this much faster than the PROC SQL, because the latter requires a Cartesian comparison of all dates within each fund/stock group.  The advantage of PROC SQL is that it doesn't require pre-sorted data sets.  But that also means it often will not take advantage of data that are pre-sorted.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
bera00
Obsidian | Level 7

Thank you so much

this method works for my data

novinosrin
Tourmaline | Level 20
data have;
infile cards dlm=",";
input date : mmddyy10. fund_id	: $8. stock_id : $16.  number : 8.  price :8.;
format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;

proc sort data=have out=_have;
by fund_id stock_id date;
run;

data temp;
drop _:;
do until(last.stock_id);
set _have;
by fund_id stock_id date;
if first.stock_id then do;
number1=0;
price1=0;
end;
else if intnx('qtr',date,-1,'end')=_date1 then  do;
number1=_number1;
price1=_price1;
end;
_number1=number;
_price1=price;
_date1=date;
end;
run;

data want;
merge _have temp;
by fund_id stock_id date;
if missing(number1) and missing(price1) then do;
number1=0;
price1=0;
end;
run;

proc sort data=want out=final_want;
by fund_id date;
run;
mkeintz
PROC Star

Most mutual fund financial research projects I have seen cover 1,000's of funds, with 1,000's of stocks over 10-year plus time periods (40 quarters).  A  solution like this with multiple sorts is expensive in time, disk space, and disk input/output activity.  It has four steps which read the full data set 4 times, writes a full data set 3 times, and invokes a proc sort 2 times. 

 

Given that the original sort order of the data set can be utilized as is, I find it hard to imagine a situation in which this solution should be suggested for a production environment.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20
data have;
infile cards dlm=",";
input date : mmddyy10. fund_id	: $8. stock_id : $16.  number : 8.  price :8.;
format date date9.;
cards;
3/31/2005,A,8916108,500,45
3/31/2005,A,226406106,20000,13.73
3/31/2005,A,25243Q205,9000,56.9
3/31/2005,A,292505104,15000,70.42
3/31/2005,A,633067103,4000,43.305
3/31/2005,A,767204100,1500,129.75
3/31/2005,A,874039100,700,32
3/31/2005,A,5458954,900,85
3/31/2005,A,874039100,1000,100
6/30/2005,A,25243Q205,2000,120
6/30/2005,A,292505104,500,150
6/30/2005,A,87403645,400,12
6/30/2005,A,645646,3000,11
6/30/2005,A,5454545,1200,13
3/31/2005,B,633067103,5000,43.305
3/31/2005,B,767204100,1500,129.75
3/31/2005,B,874039100,700,15
3/31/2005,B,77778,900,20
3/31/2005,B,121212,1000,40
3/31/2005,B,131313,2000,88
6/30/2005,B,44456,3200,45
6/30/2005,B,767204100,10000,36
6/30/2005,B,874039100,3333,100
6/30/2005,B,466666,20,14
;


data want;
_n=0;
do n=1 by 1 until(last.fund_id);
set have;
by fund_id date;
number1=0;price1=0;
if first.date then _n+1;
array _t(25) $20;/*arbitrary subscript value*/
array _k(25,3);
if n=1 then call missing(of _t(*), of _k(*));
if _n>1 then do;
_f=whichc(stock_id,of _t(*));
	if _f>0 then do;
	if intnx('qtr',date,-1,'end')=_k(_f,3) then do;
	number1=_k(_f,1);
	price1=_k(_f,2);
	end;
end;
end;
_t(n)=stock_id;
_k(n,1)=number;
_k(n,2)=price;
_k(n,3)=date;
output;
end;
drop _: n; 
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1309 views
  • 7 likes
  • 5 in conversation