Hallo, I would like to do some calculation using the lag(1) value. I have something like this, but lag function did not work here
Proc sql noprint;
Create table want as
Select date, id, sum(value1) as value1,
sum(value2*weight)/sum(weight) as value2,
sum(lag1(weight)*return)/sum(lag1(weight)) as return
from have group by id, date;
quit;
How can I make this work? Thank you
You can't use the lag function with proc sql.
The lag function is not available in proc SQL. Proc SQL performs operations on unordered sets of data.
By LAG1 did you mean the previous observation or the previous date?
PG
yes, by lag I mean the previous date (t-1).
If you want to match the value with the return on the previous day then try this (untested) :
Proc sql;
Create table want as
select t.id, t.date, t.value1, t.value2, r.return
from
(Select id, date,
sum(value1) as value1,
sum(weight*value2)/sum(weight) as value2
from have group by id, date) as t
left join
(Select id, date,
sum(weight*return)/sum(weight) as return
from have group by id, date) as r
on t.id=r.id and t.date=intnx('DAY',r.date,1);
quit;
This assumes that date is a SAS date. If date is a datetime variable then replace 'DAY' by 'DTDAY'.
PG
the left join - part did not work well. because the weight and return are all from previous day. The data I have are monthly data I should say previous obs above I think.
The return is this month return and the weight is last month weight of different assets
return (t) = sum(weight (t-1)*return(t))/sum(weight(t-1))
Post some sample data and the result you want from that data.
The data look like this.
ID date value return portfolio
I want to group them into the portfolio level so the monthly return of the portfolio would be the weighted return of the individual asset.
The result table would be:
portfolio date value return
200075 01/31/1997 138.449 .
200075 | 02/28/1997 | 138.249 | (0.149*0.005199 + 138.3*0.006061)/(138.3+0.149). |
...
And
1. where does the 'weight' variable come from?
2. is the one value for return, R, a missing value?
3. will those always be the two ids?
4. will there always be two and only two ids?
1. the weight is the value column
2. the R is the missing return, missing value.
3. No there are more different ids (about 15000 of them)
4. and the number of ids varies per portfolio. there are about 2500 different portfolios
Then I'm confused. Your formula indicates variables value1, value2, weight and return.
From your example, value1 and value2 are the two records for a given date. The formula takes the sum of those, divided by another variable labeled weight. I don't understand the difference between value and weight. Similarly, each value has a separate return.
Please reshow your example using the variable names from the file, namely: id, date, value, return and portfolio
You refers to the first post?
it would be this:
Proc sql noprint;
Create table want as
Select date, portfolio, sum(value) as value,
sum(lag1(value)*return)/sum(lag1(value)) as return
from have group by portfolio, date;
quit;
return (t) = sum(value(t-1)*return(t))/sum(value(t-1))
so if I don't take the lag grouping those individual assets into portfolio works fine. But only for the return I have to take the lag ( previous month) value
Still not sure I correctly understand what you want, but the following appears to match what you want. Someone more proficient than me with proc sql can consolidate the code if it is in the right direction:
data have;
informat date mmddyy10.;
format date date9.;
input id date value return portfolio;
cards;
029946 1/31/1997 0.149 . 200075
029947 1/31/1997 138.300 0.010499 200075
029946 2/28/1997 0.276 0.005199 200075
029947 2/28/1997 137.973 0.006061 200075
029946 3/31/1997 0.376 -0.029741 200075
029947 3/31/1997 132.154 -0.030120 200075
029946 4/30/1997 0.481 0.012467 200075
029947 4/30/1997 133.955 0.014311 200075
029946 5/30/1997 0.595 0.054529 200075
029947 5/30/1997 141.063 0.054674 200075
029946 6/30/1997 0.693 0.030025 200075
029947 6/30/1997 143.641 0.030518 200075
;
proc sql;
create table need as
select l.portfolio,l.id, l.date, l.return as lreturn,
l.value as lvalue, r.return as rreturn,
r.value as rvalue
from
(select portfolio,id, date,return,value
from have ) as l
left join
(select portfolio,id, date, value,return
from have) as r
on l.portfolio=r.portfolio and
l.id=r.id and
intnx('month',r.date,1,'b')<=l.date<=
intnx('month',r.date,1,'e')
order by portfolio,date,id
;
create table want as
select portfolio, date,
sum(rvalue*lreturn)/sum(rvalue) as return
from need
group by portfolio,date
;
quit;
To make the calculation you want you need to get the basis value (what you were calling weight) onto the record with the return value. SQL is not a good tool for this since it does not natively support the concept of ordered records. Assuming that the ID variable identifies particular asset you need to process you data sorted by PORTFOLIO ID DATE to be able to take the previous value of the asset.
data returns ;
set have;
by portfolio id date ;
basis = lag(value);
if first.id then basis = .;
run;
Now you can do calculations using variables that are on the same observation.
With the example data your code results in the same values as the first of the two proc sql runs I suggested. Specifically, my lreturn=your return, my rvalue=your basis, and my lvalue= your value.
I agree with you that SQL shouldn't be used if one needs to know the order of one's data but, in this case, I don't think order wasn't necessary .. only the values of date, id and porfolio.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.