Hi everyone,
I have a dataset like this (sample data). This dataset is organized by permno (stock identification), last date, and date. Variable "date" are the dates within the 2-year window from last date. For example, for the last date=31/7/2001, dates are ranging from 31/7/1999 to 31/7/2001. Symbol are stock symbols for each date
Permno | Last date | Date | SYMBOL |
12345 | 31/07/2001 | 31/07/1999 | A |
12345 | 31/07/2001 | 01/08/1999 | A |
12345 | 31/07/2001 | 02/08/1999 | A |
12345 | 31/07/2001 | ………… | A |
12345 | 31/07/2001 | 15/02/2000 | A |
12345 | 31/07/2001 | 16/02/2000 | A1 |
12345 | 31/07/2001 | ……… | A1 |
12345 | 31/07/2001 | 21/04/2001 | A1 |
12345 | 31/07/2001 | 22/04/2001 | A2 |
12345 | 31/07/2001 | ………. | A2 |
12345 | 31/07/2001 | 26/06/2001 | A2 |
12345 | 31/07/2001 | …… | …… |
12356 | 15/09/2009 | ………. | B |
Now I would like a dataset that summarizes the begining dates and enddates of a stock symbol. The output should look like this.
It shows that permno 12345 has a symbol = A from 31/7/1999 to 15/2/2000, and A1 from 16/2/2000 to 21/4/2001 and so on
Permno | Last date | Begdates | Enddates | SYMBOL |
12345 | 31/07/2001 | 31/07/1999 | 15/02/2000 | A |
12345 | 31/07/2001 | 16/02/2000 | 21/04/2001 | A1 |
12345 | 31/07/2001 | 22/04/2001 | 26/06/2001 | A2 |
12356 | 15/09/2009 | ……… | ………… | ……… |
Could anyone show me how to do it? Thank you very much
Maybe adding "notsorted" option to the by-statement does the trick.
Anyone has some ideas?
Post some example data as data-step, so that we have something to work with.
That is untested code, assuming that the source-dataset is sorted by permno and symbol:
data work.want;
set work.source;
by permno symbol;
length BegDate EndDate 8;
retain BegDate;
/*assuming taht Date is a sas date */
format BegDate Enddate ddmmyys10.;
if first.symbol then do;
BegDate = Date;
end;
if last.symbol then do;
EndDate = Date;
output;
end;
drop date;
run;
Thanks. But different permno could have the same symbol. For example, the first observation of permno 12356 could be the same as the last observation of permno 12345. So how to deal with it?
Sorry, don't get it. Can you post example data?
Have you already destroyed the data by blanking out most values for PERMNO and LAST_DATE, or do you have values for those on every observation?
Hi. I posted a new sample data for clarification. Hope it clarifies now
And again the interesting part is blanked out.
Hi. So the data is like this
Permno | Last date | Date | SYMBOL |
12345 | 31/07/2001 | 31/07/1999 | A |
12345 | 31/07/2001 | 01/08/1999 | A |
12345 | 31/07/2001 | 02/08/1999 | A |
12345 | 31/07/2001 | 03/08/1999 | Missing |
12345 | 31/07/2001 | 15/02/2000 | Missing |
12345 | 31/07/2001 | 16/02/2000 | A1 |
12345 | 31/07/2001 | 17/02/2000 | A1 |
12345 | 31/07/2001 | 21/04/2001 | A1 |
12345 | 31/07/2001 | 22/04/2001 | A2 |
12345 | 31/07/2001 | 23/04/2001 | A2 |
12345 | 31/07/2001 | 26/06/2001 | Missing |
12345 | 31/07/2001 | 27/06/2001 | Missing |
12345 | 31/07/2002 | 28/06/2001 | A2 |
12345 | 31/07/2002 | 31/07/2001 | A2 |
98513 | 15/09/2009 | 15/09/2007 | A2 |
98513 | 16/09/2009 | 16-09-07 | A2 |
98513 | 17/09/2009 | 21-05-08 | B |
98513 | 18/09/2009 | 22-05-08 | B |
98513 | 19/09/2009 | 15-09-09 | B |
Notice that there are 2 features that make it more difficult (I highlight above):
- There are random observations with missing symbols
- 2 different permnos could have the same symbol.
So when I use the code with first.symbol (I have to sort data with permno, symbol). The output is like this:
Permno | Begdate | Enddate | Symbol |
12345 | 31/07/1999 | 02/08/1999 | A |
12345 | 03/08/1999 | 27/06/2001 | Missing |
12345 | 28/06/2001 | 15/09/2007 | A2 |
Which is not right, because Permno 12345 does not have missing symbol all the way from 3/8/1999 to 27/6/2001. And it could not differentiate symbol A2 of permno 12345 with A2 of 98513.
The output should look like this
Permno | Begdate | Enddate | Symbol |
12345 | 31/07/1999 | 02/08/1999 | A |
12345 | 03/08/1999 | 15/02/2000 | Missing |
12345 | 16-02-00 | 21-04-01 | A1 |
12345 | 22-04-01 | 23-04-01 | A2 |
12345 | 26-06-01 | 27-06-01 | Missing |
12345 | 28-06-01 | 31-07-01 | A2 |
98513 | 15-09-07 | 16-09-07 | A2 |
98513 | 21-05-08 | 15-09-09 | B |
Providing data as data-step makes it easier to work on a solution. I don't think anybody is so bored that she/he enjoys copying a table and making the necessary changes so that it can be used as dataset.
Maybe adding "notsorted" option to the by-statement does the trick.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.