turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Daily return filters

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-27-2017 05:54 AM - edited 05-01-2017 01:05 AM

Hi everyone,

I have a daily data as following

Permno Date (t) Return (r)

10006 31/12/1926 0.00322

10006 01/01/1927 0.00322

10006 02/01/1927 0.00322

10006 03/01/1927 0.00322

20010 31/12/1926 0.00322

20010 01/01/1927 0.00322

20010 02/01/1927 0.00322

20010 03/01/1927 0.00322

78900 31/12/1926 0.00322

78900 01/01/1927 0.00322

78900 02/01/1927 0.00322

78900 03/01/1927 0.00322

I have to filter the data return if rt or rt-1 > 100% and (1 + rt-1)(1+ rt) - 1 < 20%, then both rt and rt-1 are set equal to a missing value. Additionally, any daily return greater than 200% is set to missing.

I really have no idea how I can run if ... then when 2 variables in the same column (rt and rt-1) and combine "or" and " and". Maybe the question is stupid but could you please help me with this.

Thank you very much,

Ha

Accepted Solutions

Solution

05-01-2017
04:55 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yotsuba88

05-01-2017 01:03 AM

Instead of using lag, you might be better off using a merge with a firstobs=2 parameter to look ahead at the next obs. If the current and next obs fullfill the criterion, set N_MISS to 2. Before decrementing N_MISS by 1, check if it is positive. If so set the current r to missing.

```
data have;
input permno date ddmmyy10. r percent7. ;
format date date9. r percent8.2;
datalines;
10006 31/12/1926 101%
10006 01/01/1927 -50%
10006 02/01/1927 0.32%
10006 03/01/1927 200%
run;
data want (drop=next_: n_miss);
merge have
have (firstobs=2 keep=permno r rename=(permno=next_p r=next_r));
if permno=next_p and (r>1 or next_r>1) and (((1+r)*(1+next_r)-1)<.2) then n_miss=2;
if n_miss>0 or r>=2 then r=.;
n_miss+(-1);
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yotsuba88

04-27-2017 05:57 AM

rt-1 is a variable here? What does rt-1 represent?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to draycut

04-27-2017 06:10 AM

Hi daycut,

Thank you for your question.

rt-1 is the return of the date right before t.

For example: For 10006, return of 1/1/1927 is rt and return of 31/12/1926 is r t-1.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yotsuba88

04-27-2017 10:01 AM

It will help if you can also show what the result of this would look like.

"Filter" often means to subset the data but the way you say "set to missing" makes it seem that you may not actually want to subset the data.

Also since you have a requirement involving rt > 100% and 200% you should show an example in the input and what the output looks like with those.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

04-27-2017 10:57 PM - edited 05-01-2017 12:53 AM

Hi @ballardw,

Maybe it was a mistake of origin, I try to replicate this for screening data. Let me explain what I want to do here with one ID.

ID Date (t) Return (r) in percentage

**10006 31/12/1926 101% (rt-1)**

**10006 01/01/1927 -50% (rt)**

10006 02/01/1927 0.32%

__10006 03/01/1927 200%__

Then rt-1=101% > 100% and (1 + rt-1)(1+ rt) - 1=0.5%< 20% => rt and rt-1 are set to be missing.

Another is return at date 03/01/1927 = 200% --> set to be missing.

The result I expect is:

ID Date (t) Return (r) in percentage

**10006 31/12/1926 .(rt-1)**

**10006 01/01/1927 . (rt)**

10006 02/01/1927 0.32%

__10006 03/01/1927 .__

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yotsuba88

04-28-2017 01:56 AM - edited 04-28-2017 02:12 AM

@yotsuba88 in your example data R is a constant (always=.0032), i assume that this is not actually the case. perhaps if it varied a bit you could include an actual calculation of how you arrive at the 101%. I think that you want to look at the use of the LAG function to 'remember' the previous value of R.

is the percentage calculated something like this????

`value=(1 + lagr)*(1+ r);`

I assume not since adding 1 to these small will tend to yield a value over 1. Help us with the calculation please.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ArtC

04-30-2017 10:52 PM

@ArtC Thank you for your suggestion of using lag.

1. For R, my mistake when I typed all R is 0.0032., but I change a little bit as below. 101% is an example for my table (it is already in data), my task is to screen the outlier as formula or rt is greater than 200%.

ID Date (t) Return (r) in percentage

**10006 31/12/1926 101% (rt-1)**

**10006 01/01/1927 -50% (rt)**

10006 02/01/1927 0.32%

__10006 03/01/1927 200%__

2. And the value is

value= (1+lagr)*(1+r)-1, so it can not be more than 1.

However, I still dont know how to set r and rt-1 missing if they met the condition.

Thank you very much.

Solution

05-01-2017
04:55 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yotsuba88

05-01-2017 01:03 AM

Instead of using lag, you might be better off using a merge with a firstobs=2 parameter to look ahead at the next obs. If the current and next obs fullfill the criterion, set N_MISS to 2. Before decrementing N_MISS by 1, check if it is positive. If so set the current r to missing.

```
data have;
input permno date ddmmyy10. r percent7. ;
format date date9. r percent8.2;
datalines;
10006 31/12/1926 101%
10006 01/01/1927 -50%
10006 02/01/1927 0.32%
10006 03/01/1927 200%
run;
data want (drop=next_: n_miss);
merge have
have (firstobs=2 keep=permno r rename=(permno=next_p r=next_r));
if permno=next_p and (r>1 or next_r>1) and (((1+r)*(1+next_r)-1)<.2) then n_miss=2;
if n_miss>0 or r>=2 then r=.;
n_miss+(-1);
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

05-01-2017 02:02 AM - edited 05-01-2017 02:03 AM

I came up with an array solution, but I like the one by @mkeintz much better.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ArtC

05-01-2017 04:56 AM

Thank you for your consideration, I also tried @mkeintz solution and it can run well.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

05-01-2017 04:57 AM

@mkeintz I can run screening for my data now. I really appreciate it. Best regards.