Hi everyone,
I have a daily data as following
Permno Date (t) Return (r)
10006 31/12/1926 0.00322
10006 01/01/1927 0.00322
10006 02/01/1927 0.00322
10006 03/01/1927 0.00322
20010 31/12/1926 0.00322
20010 01/01/1927 0.00322
20010 02/01/1927 0.00322
20010 03/01/1927 0.00322
78900 31/12/1926 0.00322
78900 01/01/1927 0.00322
78900 02/01/1927 0.00322
78900 03/01/1927 0.00322
I have to filter the data return if rt or rt-1 > 100% and (1 + rt-1)(1+ rt) - 1 < 20%, then both rt and rt-1 are set equal to a missing value. Additionally, any daily return greater than 200% is set to missing.
I really have no idea how I can run if ... then when 2 variables in the same column (rt and rt-1) and combine "or" and " and". Maybe the question is stupid but could you please help me with this.
Thank you very much,
Ha
Instead of using lag, you might be better off using a merge with a firstobs=2 parameter to look ahead at the next obs. If the current and next obs fullfill the criterion, set N_MISS to 2. Before decrementing N_MISS by 1, check if it is positive. If so set the current r to missing.
data have;
input permno date ddmmyy10. r percent7. ;
format date date9. r percent8.2;
datalines;
10006 31/12/1926 101%
10006 01/01/1927 -50%
10006 02/01/1927 0.32%
10006 03/01/1927 200%
run;
data want (drop=next_: n_miss);
merge have
have (firstobs=2 keep=permno r rename=(permno=next_p r=next_r));
if permno=next_p and (r>1 or next_r>1) and (((1+r)*(1+next_r)-1)<.2) then n_miss=2;
if n_miss>0 or r>=2 then r=.;
n_miss+(-1);
run;
rt-1 is a variable here? What does rt-1 represent?
Hi daycut,
Thank you for your question.
rt-1 is the return of the date right before t.
For example: For 10006, return of 1/1/1927 is rt and return of 31/12/1926 is r t-1.
It will help if you can also show what the result of this would look like.
"Filter" often means to subset the data but the way you say "set to missing" makes it seem that you may not actually want to subset the data.
Also since you have a requirement involving rt > 100% and 200% you should show an example in the input and what the output looks like with those.
Hi @ballardw,
Maybe it was a mistake of origin, I try to replicate this for screening data. Let me explain what I want to do here with one ID.
ID Date (t) Return (r) in percentage
10006 31/12/1926 101% (rt-1)
10006 01/01/1927 -50% (rt)
10006 02/01/1927 0.32%
10006 03/01/1927 200%
Then rt-1=101% > 100% and (1 + rt-1)(1+ rt) - 1=0.5%< 20% => rt and rt-1 are set to be missing.
Another is return at date 03/01/1927 = 200% --> set to be missing.
The result I expect is:
ID Date (t) Return (r) in percentage
10006 31/12/1926 .(rt-1)
10006 01/01/1927 . (rt)
10006 02/01/1927 0.32%
10006 03/01/1927 .
@yotsuba88 in your example data R is a constant (always=.0032), i assume that this is not actually the case. perhaps if it varied a bit you could include an actual calculation of how you arrive at the 101%. I think that you want to look at the use of the LAG function to 'remember' the previous value of R.
is the percentage calculated something like this????
value=(1 + lagr)*(1+ r);
I assume not since adding 1 to these small will tend to yield a value over 1. Help us with the calculation please.
@ArtC Thank you for your suggestion of using lag.
1. For R, my mistake when I typed all R is 0.0032., but I change a little bit as below. 101% is an example for my table (it is already in data), my task is to screen the outlier as formula or rt is greater than 200%.
ID Date (t) Return (r) in percentage
10006 31/12/1926 101% (rt-1)
10006 01/01/1927 -50% (rt)
10006 02/01/1927 0.32%
10006 03/01/1927 200%
2. And the value is
value= (1+lagr)*(1+r)-1, so it can not be more than 1.
However, I still dont know how to set r and rt-1 missing if they met the condition.
Thank you very much.
Instead of using lag, you might be better off using a merge with a firstobs=2 parameter to look ahead at the next obs. If the current and next obs fullfill the criterion, set N_MISS to 2. Before decrementing N_MISS by 1, check if it is positive. If so set the current r to missing.
data have;
input permno date ddmmyy10. r percent7. ;
format date date9. r percent8.2;
datalines;
10006 31/12/1926 101%
10006 01/01/1927 -50%
10006 02/01/1927 0.32%
10006 03/01/1927 200%
run;
data want (drop=next_: n_miss);
merge have
have (firstobs=2 keep=permno r rename=(permno=next_p r=next_r));
if permno=next_p and (r>1 or next_r>1) and (((1+r)*(1+next_r)-1)<.2) then n_miss=2;
if n_miss>0 or r>=2 then r=.;
n_miss+(-1);
run;
I came up with an array solution, but I like the one by @mkeintz much better.
Thank you for your consideration, I also tried @mkeintz solution and it can run well.
@mkeintz I can run screening for my data now. I really appreciate it. Best regards.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.