Hi SAS community,
Have an awesome week!
I face an issue when dealing with extreme data as described below:
I have the data as below:
Type date p_us up_us nosh trading_vol amihud
134495 01JAN1987 . . . 0 .
134495 02JAN1987 . . . . .
134495 05JAN1987 . . . . .
.
.
134495 27DEC2019 1.02 1.02 14632 . .
134495 30DEC2019 0.99 0.99 14632 . .
134495 31DEC2019 0.99 0.99 14632 . .
.
.
992438 01JAN1987 . . . 0 .
992438 02JAN1987 . . . . .
992438 05JAN1987 . . . . .
.
.
992438 27DEC2019 0.01 0.01 9000 . .
992438 30DEC2019 0.01 0.01 9000 . .
992438 31DEC2019 0.01 0.01 9000 . .
While Type is the name of stocks (character variable- around 8,000 firms), date is ddmmyyyy9. (daily data from 1/1/1987 to 31/12/2019), and others are numeric variables.
It is my code when I calculate the average of "amihud" variable based on firm-year level:
*I got year by using the code year= year(date);
proc sort data=amihud_;
by Type year;
run;
proc means data=amihud_ mean noprint;
var amihud;
by Type year;
output out=amihud_f n=obs mean=amihud;
run;
And I have results as below:
Type year _TYPE_ _FREQ_ obs amihud
134495 1987 0 261 0 .
134495 1988 0 261 0 .
134495 1989 0 260 0 .
134495 1990 0 261 0 .
.
.
134495 2016 0 261 15 0.0002618879
134495 2017 0 260 20 0.0002209701
134495 2018 0 261 4 0.0002583137
134495 2019 0 261 0 .
.
.
992438 1987 0 261 0 .
992438 1988 0 261 0 .
992438 1989 0 260 0 .
992438 1990 0 261 0 .
992438 1991 0 261 0 .
992438 1992 0 262 0 .
992438 1993 0 261 72 0.0000721906
.
.
992438 2017 0 260 0 .
992438 2018 0 261 0 .
992438 2019 0 261 0 .
But now, I need to add more conditions to my calculation including: (1) removing stock-year observations if this stock has p_us at the end of the
previous year in the top or bottom 1% of the cross section; (2) drop stock-year observations if this stock does not have data on variable "nosh" at the end of previous year; and (3) excluding any year with fewer than 10 stocks (column Type).
Many thanks and best regards.
So if that's your intput, what would you want as the output? Does your shown data have all the use cases you've described below?
1) removing stock-year observations if this stock has p_us at the end of the
previous year in the top or bottom 1% of the cross section; (what defines a 'cross section')?
(2) drop stock-year observations if this stock does not have data on variable "nosh" at the end of previous year; and (3) excluding any year with fewer than 10 stocks (column Type).
Hi @Reeza ,
Thank you for your comment, I try to make it more clear as below:
1. "Cross section" means that " we examine the values of different subjects (distinct Types in my case) at the same time (year in my required output)". For example, in my case, in a specific year, for example, regarding year 1988, I will sort the p_us on 31 Dec of the year 1987 of all stocks. And then, in the year 1988, I will exclude the stocks that have the p_us in year 1987 in top or bottom 1% of p_us values on 31dec of year 1987 of the whole sample. And doing similar progress to all other years (1987-2019).
And the data above does not contain the use cases I described above, I just simply describe my dataset structure.
I hope it is a little bit clearer now.
@Phil_NZ wrote:
But now, I need to add more conditions to my calculation including: (1) removing stock-year observations if this stock has p_us at the end of the
previous year in the top or bottom 1% of the cross section; (2) drop stock-year observations if this stock does not have data on variable "nosh" at the end of previous year; and (3) excluding any year with fewer than 10 stocks (column Type).
So you want (is this right?):
Notice I am interpreting your phrase "end of previous year" as 31dec of year t-1, even if there is no stock trading on 31dec.
And by "excluding any year with fewer than 10 stocks", do you mean
Hi @mkeintz and @Reeza .
Sorry for getting your wires crossed.
So, my aim here is to have the results of the average of "amihud" as calculated by the code above from the given dataset, but with considering these three more conditions apart from my average calculation above. I mean, I want to have the results similar to the results above, but taking considering to the three given further conditions.
Regarding condition (1) and (2), it is similar to what @mkeintz mentioned, but I need to adjust a little bit about the order of the variables:
1. Exclude the stock-year observation if, in this year, the stock had P_US on 31dec of year t-1 in the top or bottom 1% of P_US values on 31dec of year t-1 of all stocks, or "amihud" of this stock in this year is in top 1% of "amihud" of all stock in this year.
2. Exclude the stock-year observation if, in this year, the stocks have missing NOSH on 31dec of year t-1
3. And regarding the condition (3): exclude any year if, in this year, there are fewer than 10 stocks (10 distinct values of TYPE, as @mkeintz mentioned, thank you for your clarification again) satisfying these two above conditions.
I also intend to import my whole data co0ntain all the cases as @Reeza mentioned but our forum just allows a maximum of 5 MB of attachments, so I am not able to do so. But the structure of the data is quite similar like I describe above. I am wondering if we can post a google drive link or the like to our forum.
Many thanks and best regards.
We don't need (and wouldn't want) the complete data. You can easily satisfy @Reeza 's request by providing a subset of, say 30 TYPES for 2 or 3 years. It's certainly a lot easier to help someone with coding when there is data to use for testing our responses.
Thank you very much @mkeintz and @Reeza for your suggestions and explanation,
After receiving your explanation, I included a sample of 80 distinct firms in my dataset with the daily data from 1/1/2016 to 31/12/2019 (the reason to choose such period but not the earlier period is because there are more observations during the later period of time that we can have a more visible look on the dataset).
I attached as the datatype sas7bdat, I am not sure whether it suits you, please let me know if any adjustments required.
Hi @Reeza , thank you for your concern, and I can say this dataset is not from WDRS, and it is okay to be shared. By the way, it is just a very small part of my dataset.
Offtrack a little bit, based on the data given above, do you know how can I adjust the code to achieve three other columns in the output called p_us,up_us, and nosh where p_us, up_us, and nosh are the value of p_us, up_us, and nosh of the last day of the year associated with amihud.
To be more clear, from my data input (file data_almihud_) attached below, when running the code as below:
*I got year by using the code year= year(date);
proc sort data=amihud_;
by Type year;
run;
proc means data=amihud_ mean noprint;
var amihud;
by Type year;
output out=amihud_f n=obs mean=amihud;
run;
I have the result as below:
Type year _TYPE_ _FREQ_ obs amihud
134495 2016 0 261 15 0.0002618879
134495 2017 0 260 20 0.0002209701
134495 2018 0 261 4 0.0002583137
134495 2019 0 261 0 .
.
.
992438 2017 0 260 0 .
992438 2018 0 261 0 .
992438 2019 0 261 0 .
I am wondering what should I add into this code to have three other columns in the output called p_us,up_us, and nosh where p_us, up_us, and nosh are the values of p_us, up_us, and nosh of the last day of every year from the input data(associated with variable amihud)
I mean, I want the output as below
Type year _TYPE_ _FREQ_ obs amihud p_us up_us nosh
134495 2016 0 261 15 0.0002618879
134495 2017 0 260 20 0.0002209701
134495 2018 0 261 4 0.0002583137
134495 2019 0 261 0 .
.
.
992438 2017 0 260 0 .
992438 2018 0 261 0 .
992438 2019 0 261 0 .
I suggest this way of doing because I think that after we have such results containing variables "amihud", "p_us", "up_us", and "nosh", it is easier and clearer to generate the code satisfying the three conditions I mentioned above.
Many thanks and crossing my fingers.
There is too much mission creep here.
Hi everyone, under your encouragement and else, I found out the way out to make my dataset more clear and obtaining "clear" data as below.
So, my dataset "have" is as attached below (containing 2794 distinct companies (column TYPE)).
A quick description is as below:
Type year obs amihud up_us nosh vo p_us
13016K 1987 0 . . . . .
13016K 1988 0 . . . . .
13016K 1989 0 . . . . .
13016K 1990 0 . . . . .
13016K 1991 0 . . . . .
13016K 1992 0 . . . . .
13016K 1993 0 . . . . .
13016K 1994 0 . . . . .
13016K 1995 0 . . . . .
13016K 1996 0 . . . . .
13016K 1997 0 . . . . .
13016K 1998 0 . . . . .
13016K 1999 0 . . . . .
13016K 2000 0 . 0.2945 40120 0.3 42.9638
13016K 2001 0 . 0.1945 40120 . 28.3783
13016K 2002 0 . 0.1802 45558 0.7 26.2883
13016K 2003 0 . 0.2712 45908 . 39.5715
13016K 2004 0 . 0.1568 45908 . 22.8729
13016K 2005 0 . 0.0337 45908 . 4.9228
13016K 2006 0 . 0.0394 45908 1.4 5.7496
13016K 2007 0 . 0.0483 97085 . 7.4443
13016K 2008 0 . 0.0349 100233 . 5.3736
13016K 2009 0 . 0.0351 120065 . 5.5743
13016K 2010 0 . 0.0205 152711 . 3.2582
13016K 2011 0 . 0.0195 308196 0.5 3.8958
13016K 2012 0 . 0.0093 308196 . 1.8687
13016K 2013 0 . 0.0018 308196 . 0.3579
13016K 2014 0 . 0.0368 315410 8.9 0.3683
13016K 2015 0 . 0.0255 1077004 . 0.2546
13016K 2016 4 0.0005137639 0.2498 159721 20.3 0.2498
13016K 2017 9 0.0000546129 0.1799 161321 198.4 0.1799
13016K 2018 2 0.0002143573 0.1619 161321 50 0.1619
13016K 2019 2 0 0.1687 177464 . 0.1687
130347 1987 0 . . . . .
130347 1988 0 . . . . .
130347 1989 0 . . . . .
130347 1990 0 . . . . .
130347 1991 0 . . . . .
130347 1992 0 . . . . .
130347 1993 0 . . . . .
130347 1994 47 0.0000738433 0.7 43242 22.5 0.64
130347 1995 35 0.0000662663 0.65 43242 . 0.59
130347 1996 25 0.0000750143 1.08 43242 566.1 0.99
130347 1997 44 0.0001311861 1.11 66105 . 1.11
So, now, the questions (under the bits of help in clarification of @mkeintz and @Reeza ) become:
From the dataset "have" added, we:
Thanks in advance and best regards.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.