BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nici
Obsidian | Level 7

Hello everyone,

 

I'm on SAS version 9.4. I've had troubles with a (I assume) very common problem when analyzing returns. However, I wasn't able to find an appropriate solution for my problem. The problem is that both datasets have multiple repeating variables.

 

I have two datasets:

 

Data A)

ID          Permno           Date        

---------------------------------------

A                 1               19900205

A                 2               19900205

B                 3               20000305

B                 3               20050505

.......

 

(The point is that some IDs have multiple Permnos and thus, different permnos might have the same dates)

 

Data B) This set includes the daily returns for all Permnos

 

Permno             Date                 Ret

-----------------------------------------------------------

      1               19900202           0,01

      1               19900203           0,01

      1               19900204           0,01

      1               19900205           0,02

      1               19900206           0,03

      1               19900207           0,01

     ....                  ....                    ....

      2               20000810           0,01

      2               20000811           0,01

      2               20000812           0,01

      2               20000813           0,01

      2               20000814           0,01

      3               20000302           0,04

      3               20000303           0,04

      3               20000304           0,04

      3               20000305           0,02

      3               20000306           0,01

    .....                  ......                  .....

      3               20050501            0,01

      3               20050502            0,01

      3               20050503            0,01

      3               20050504            0,01

    ....                  .....                     .....

 

I would like to receive the following output:

 

 ID                  PERMNO                    Date                     Ret

  A                        1                        19900204                0,01

  A                        1                        19900205                0,02

  A                        1                        19900206                0,03

  B                        3                        20000304                0,04

  B                        3                        20000305                0,02

  B                        3                        20000306                0,01

  B                        3                        20050504                0,01

  B                        3                        20050505                0,01

  B                        3                        20050505                0,01

 

 

So this would be much easier if I could just merge the two sets based on Permno and then just select the dates I actually want. This is not working however since the merging process is not possible when both sets have repetitions of the by variable. I could just merge for permno and date but then I would not receive the surrounding two dates as well. Is there an easy way to do this or do I have merge for permno and date and then find another way to get the surrounding dates.

 

I'm sorry, I'm pretty new to SAS and thus do not know what the best way of visualizing my problems is. If you need any more information please let me new.

 

Else, thank you very much in advance.

 

Have a great day.

 

BR

Nici

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Your earlier answers indicate that your dates are true SAS dates.  As you have seen, it's not necessary for them to have the same formats in order to merge them  I justed wanted to be sure that if I added 1 to a date value, that it would get me to the next day (regardless of crossing month boundaries or year boundaries).

 

Here's a mildly intuitive approach that should be flexible enough.  It turns the second data set into a format, and then runs through the first data set using the format to find matching values.

 

Starting with the second data set:

 

data fmt;
   retain fmtname '$returns';
   set B end=done;
   start = catx(' ', permo, date);
   label = ret;
output;
if done;
hlo='O';
label=' ';
output; run; proc format cntlin=fmt; run;

This gives you a format that translates the combination of PERMO and DATE into a value for RET.  The last piece of the format translates unrecognized values into a blank.  Depending on what is actually contained in RET, you may have to deal with commas later.  That's a relatively small puzzle piece.  The bigger piece is that the results from a format always return a character string.  So we will need to convert those to numeric.  So ....

 

data want;
   set A;
   text = put( catx(' ', permo, date), $returns.);
   ret_today = input(text, 8.);   /* might need to change 8. if there are commas! */
   * Get return from next available date, limited to 4 days of search;
   do k = 1 to 4 until (text > ' ');
      text = put( catx(' ', permo, date + k), $returns.);
   end;
   if text > ' ' then ret_tomorrow = input(text, 8.);
   * Get return from previous available date, limited to 4 days of search;
   do k = 1 to 4 until (text > ' ');
      text = put( catx(' ', permo, date - k), $returns.);
   end;
   if text > ' ' then ret_yesterday = input(text, 8.);
   drop k;
run;

This is untested code, so it might need some tweaking.  It's probably a good strategy to get just the first four lines of this DATA step working first (getting RET_TODAY), before adding the rest.  Also note, there's probably a way to do this using hashing, but my hashing skills aren't good enough!  Good luck.

 

 

View solution in original post

6 REPLIES 6
Nici
Obsidian | Level 7

Thank you for your fast reply.

 

Unfortunately this is not the solution I was looking for. The thread you referenced is about matching the closest date in two datasets where dates are not matching.

Probably I did not explain my problem understandingly.

In my datasets the dates are matching, the matching itself is not the problem. Rather would I want the merge process to not only include the matching dates but also include the observation with the preceeding and subsequent date as well.

Basically:

 

merge by permno, date, date+1, date-1.

 

Hopefully I have explained my problem sufficiently this time.

 

Thank you for your help

Astounding
PROC Star

First, you've done a pretty good job so far realizing what the issues are.

 

Now the questions. 

 

What is in the DATE variable?  Is it a true SAS date, or is it an 8-digit number?  (I'm aware that as a newbie, you may not know.  If that's the case, run a PROC CONTENTS and report back on the variable's format.)

 

Do you always want to go one day forward and one day back, or could there be breaks in the data (such as going back from a Monday to the previous Friday)?

 

Would it be just as convenient for you to get 3 variables on just one observation (INT_TODAY, INT_PREVIOUS, INT_NEXT)?

 

The reference that @VDD provided shows how SQL can be a good approach.  But it's mildly complex and isn't the only approach.  So clarifying the problem can only help.

Nici
Obsidian | Level 7

Thank you for your reply.

 

date format from the second Dataset is (CRSP Data):

L: 8

Format: YYMMDDN8.

Informat: YYMMDD6.

[Numeric]

 

from the first one the format is (my sample, imported from excel):

L: 8

Format: Date9.

Informat: 8.

[Numeric]

 

in a previous step, I've already tried to change the second format to YYMMDDN8. which worked but then I always received a weird Error when I "opened" that dataset. The Error was something like "Error: There was a problem with the format and it was changed to Best."

Thus I tried to just merge without adjusting the formats and it seemed to work properly.

Is it not possible to merge by date without matching the format?

 

You brought up an amazing point that I did not think about before. Since I'm working with stock data, there will be no entries for the weekend. Thus it will not be able to go by Date+1, Date-1 since for a Monday, the date-1 will not be there. Therefore, the step you described of going back from a Monday to a previous Friday is inevitable.

 

If the variables INT_Today, INT_Tomorrow, INT_Yesterday contained the return for the respective dates this would be just fine as well I guess.

 

To bring my problem in context a little bit. I basically have one dataset which is my sample, with companies and a respective cut-off date. Then I have another dataset that contains stock data for all companies. I now want the stock data for the respective companies around the cut-off date.

 

I have an idea on how to do this but it seams extremely inconvenient.

 

I could just merge the whole two sets based on the company identifier and the date. Then have a dummy variable which shows 1 if the company has been in my sample (I don't really know yet how to do this but I think I might be able to find out). Afterwards delete all observations where dummy<>1 (or =0). This would lead to a dataset where I have all the stock data for just my sample companies. Afterwards I would have to put another dummy for the observation that equals the cut-off date and from there try to include another dummy that marks the previous and subsequent dates as +1 or -1.

As I said this seams extremely inconvenient however or do you think it would be a viable approach?

 

Thank you again for your time and help.

 

BR

Nici

Astounding
PROC Star

Your earlier answers indicate that your dates are true SAS dates.  As you have seen, it's not necessary for them to have the same formats in order to merge them  I justed wanted to be sure that if I added 1 to a date value, that it would get me to the next day (regardless of crossing month boundaries or year boundaries).

 

Here's a mildly intuitive approach that should be flexible enough.  It turns the second data set into a format, and then runs through the first data set using the format to find matching values.

 

Starting with the second data set:

 

data fmt;
   retain fmtname '$returns';
   set B end=done;
   start = catx(' ', permo, date);
   label = ret;
output;
if done;
hlo='O';
label=' ';
output; run; proc format cntlin=fmt; run;

This gives you a format that translates the combination of PERMO and DATE into a value for RET.  The last piece of the format translates unrecognized values into a blank.  Depending on what is actually contained in RET, you may have to deal with commas later.  That's a relatively small puzzle piece.  The bigger piece is that the results from a format always return a character string.  So we will need to convert those to numeric.  So ....

 

data want;
   set A;
   text = put( catx(' ', permo, date), $returns.);
   ret_today = input(text, 8.);   /* might need to change 8. if there are commas! */
   * Get return from next available date, limited to 4 days of search;
   do k = 1 to 4 until (text > ' ');
      text = put( catx(' ', permo, date + k), $returns.);
   end;
   if text > ' ' then ret_tomorrow = input(text, 8.);
   * Get return from previous available date, limited to 4 days of search;
   do k = 1 to 4 until (text > ' ');
      text = put( catx(' ', permo, date - k), $returns.);
   end;
   if text > ' ' then ret_yesterday = input(text, 8.);
   drop k;
run;

This is untested code, so it might need some tweaking.  It's probably a good strategy to get just the first four lines of this DATA step working first (getting RET_TODAY), before adding the rest.  Also note, there's probably a way to do this using hashing, but my hashing skills aren't good enough!  Good luck.

 

 

Nici
Obsidian | Level 7

Wow!!! This is crazy. You are truly an amazing human being. Thank you so much.

 

To be honest, I don't know how my university expected me to come up with something like this.

 

Little side note, I've posted in this forum only once before today. Back then I needed to use SAS for a seminar and was struggling. When I checked my old forum post (in the hope that I could make use of the results) I saw that you were actually the one who helped me back in the day as well :).

It is amazing how helpful you are.

 

Thank you again and have a great evening

 

 

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3249 views
  • 4 likes
  • 3 in conversation