BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yotsuba88
Quartz | Level 8

Hi everyone,

I have a daily data as following

Permno    Date (t)       Return (r)

10006    31/12/1926   0.00322

10006    01/01/1927   0.00322

10006    02/01/1927   0.00322

10006    03/01/1927   0.00322

20010    31/12/1926   0.00322

20010    01/01/1927   0.00322

20010    02/01/1927   0.00322

20010    03/01/1927   0.00322 

 

 

78900    31/12/1926   0.00322

78900    01/01/1927   0.00322

78900    02/01/1927   0.00322

 

78900   03/01/1927    0.00322

 

I have to filter the data return if rt or rt-1 > 100% and (1 + rt-1)(1+ rt) - 1 < 20%, then both rt and rt-1 are set equal to a missing value. Additionally, any daily return greater than 200% is set to missing. 

I really have no idea how I can run if ... then when 2 variables in the same column (rt and rt-1) and combine "or" and " and". Maybe the question is stupid but could you please help me with this. 

 

Thank you very much,

Ha

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Instead of using lag, you might be better off using a merge with a firstobs=2 parameter to look ahead at the next obs.  If the current and next obs fullfill the criterion, set N_MISS to 2.  Before decrementing N_MISS by 1, check if it is positive.  If so set the current r to missing.

 

data have;
  input permno date ddmmyy10. r percent7. ;
  format date date9.  r percent8.2;
datalines;
10006 31/12/1926 101%
10006 01/01/1927 -50%
10006 02/01/1927 0.32%
10006 03/01/1927 200%
run;

data want (drop=next_: n_miss);
  merge have 
        have (firstobs=2 keep=permno r rename=(permno=next_p r=next_r));

  if permno=next_p and (r>1 or next_r>1) and (((1+r)*(1+next_r)-1)<.2) then n_miss=2;
  if n_miss>0 or r>=2 then r=.;
  n_miss+(-1);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

rt-1 is a variable here? What does rt-1 represent?

yotsuba88
Quartz | Level 8

Hi daycut,

 

Thank you for your question. 

 

rt-1 is the return of the date right before t. 

For example: For 10006, return of 1/1/1927 is rt and return of 31/12/1926 is r t-1. 

ballardw
Super User

It will help if you can also show what the result of this would look like.

"Filter" often means to subset the data but the way you say "set to missing" makes it seem that you may not actually want to subset the data.

Also since you have a requirement involving rt > 100% and 200% you should show an example in the input and what the output looks like with those.

yotsuba88
Quartz | Level 8

Hi @ballardw,

 

Maybe it was a mistake of origin, I try to replicate this for screening data. Let me explain what I want to do here with one ID. 

    ID          Date (t)       Return (r) in percentage

10006    31/12/1926   101% (rt-1)

10006    01/01/1927   -50% (rt)

10006    02/01/1927   0.32%

10006    03/01/1927   200%

 

Then rt-1=101% > 100% and (1 + rt-1)(1+ rt) - 1=0.5%< 20% => rt and rt-1 are set to be missing.

Another is return at date 03/01/1927 = 200% --> set to be missing. 

The result I expect is: 

    ID          Date (t)       Return (r) in percentage

10006    31/12/1926   .(rt-1)

10006    01/01/1927   . (rt)

10006    02/01/1927   0.32%

10006    03/01/1927  .

 

ArtC
Rhodochrosite | Level 12

@yotsuba88 in your example data R is a constant (always=.0032), i assume that this is not actually the case.  perhaps if it varied a bit you could include an actual calculation of how you arrive at the 101%. I think that you want to look at the use of the LAG function to 'remember' the previous value of R.

 

is the percentage calculated something like this????

value=(1 + lagr)*(1+ r);

I assume not since adding 1 to these small will tend to yield a value over 1.  Help us with the calculation please.

yotsuba88
Quartz | Level 8

@ArtC Thank you for your suggestion of using lag.

1. For R, my mistake when I typed all R is 0.0032., but I change a little bit as below. 101% is an example for my table (it is already in data), my task is to screen the outlier as formula or rt is greater than 200%. 

 

    ID          Date (t)       Return (r) in percentage

10006    31/12/1926   101% (rt-1)

10006    01/01/1927   -50% (rt)

10006    02/01/1927   0.32%

10006    03/01/1927   200%

 

2. And the value is

value= (1+lagr)*(1+r)-1, so it can not be more than 1. 

However, I still dont know how to set r and rt-1 missing if they met the condition. 

 

Thank you very much.

mkeintz
PROC Star

Instead of using lag, you might be better off using a merge with a firstobs=2 parameter to look ahead at the next obs.  If the current and next obs fullfill the criterion, set N_MISS to 2.  Before decrementing N_MISS by 1, check if it is positive.  If so set the current r to missing.

 

data have;
  input permno date ddmmyy10. r percent7. ;
  format date date9.  r percent8.2;
datalines;
10006 31/12/1926 101%
10006 01/01/1927 -50%
10006 02/01/1927 0.32%
10006 03/01/1927 200%
run;

data want (drop=next_: n_miss);
  merge have 
        have (firstobs=2 keep=permno r rename=(permno=next_p r=next_r));

  if permno=next_p and (r>1 or next_r>1) and (((1+r)*(1+next_r)-1)<.2) then n_miss=2;
  if n_miss>0 or r>=2 then r=.;
  n_miss+(-1);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ArtC
Rhodochrosite | Level 12

I came up with an array solution, but I like the one by @mkeintz much better.

yotsuba88
Quartz | Level 8

Thank you for your consideration, I also tried @mkeintz solution and it can run well. 

 

yotsuba88
Quartz | Level 8

@mkeintz I can run screening for my data now. I really appreciate it. Best regards. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1292 views
  • 3 likes
  • 5 in conversation