Hi SAS community,
Today I faced a problem when trying to exclude the non-trading days from my dataset.
My dataset is described as below:
Type date R
131712 01JAN1987 .
131712 02JAN1987 40
131712 05JAN1987 80
131712 06JAN1987 60
131712 07JAN1987 .
.
.
131712 27DEC2019 .
131712 30DEC2019 .
131712 31DEC2019 .
.
.
28829X 01JAN1987 121.32
28829X 02JAN1987 32.37
28829X 05JAN1987 115.19
.
.
28829X 27DEC2019 .
28829X 30DEC2019 .
28829X 31DEC2019 .
.
.
The first column is the label of the companies (character type) (I have around 10,000 stocks sorted ascendingly), the second column is about the date (ddmmyyyy9.) (daily data from 1/1/1987 to 31/12/2019), and the third column is a numeric variable named "return index" (R)accordingly.
My concern here is that: a day on which more than 90% of stocks have zero returns is non-trading days, and then I need to exclude these non-trading days.
Many thanks in advance.
data have;
infile cards truncover;
informat ID $8. date date9.;
format date date9.;
input ID date value;
cards;
131712 01JAN1987 .
131712 02JAN1987 40
131712 05JAN1987 80
131712 06JAN1987 60
131712 07JAN1987 .
131712 27DEC2019 .
131712 30DEC2019 .
131712 31DEC2019 .
28829X 01JAN1987 121.32
28829X 02JAN1987 32.37
28829X 05JAN1987 115.19
28829X 27DEC2019 .
28829X 30DEC2019 .
28829X 31DEC2019 .
131713 01JAN1987 .
131714 01JAN1987 .
;;;;
run;
*count missing and not missing;
proc means data=have noprint nway;
class date;
var value;
output out=nMissing NMISS=NMISS;
run;
*convert to a percent;
data pctMissing;
set nMissing;
PCT_MISS = NMISS/_freq_;
run;
*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select *
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);
quit;
/*slightly more efficient way to do the means and pctMissing data set
*creates a view with a flag as 0/1 for missing which can then be used in proc means to calculate the percentage*/
data temp / view=temp;
set have;
if missing(value) then flag = 1; else flag=0;
run;
proc means data=temp noprint nway;
class date;
var flag;
output out=pctMissing mean=PCT_MISS;
run;
*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select *
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);
quit;
Hi @Reeza , thank you for your reply,
I am not sure about how to code for this question, but from the idea, I postulate one possible solution for this concern is to create a new column (obs_per_day_0) and count through the column R (+1 when R =0). Then after that, we compare obs_per_day_0 with the total number of companies in my dataset. If the number in each day of the column obs_per_day_0 is higher than 90% of the total number, I will exclude this day from the sample.
Many thanks.
data have;
infile cards truncover;
informat ID $8. date date9.;
format date date9.;
input ID date value;
cards;
131712 01JAN1987 .
131712 02JAN1987 40
131712 05JAN1987 80
131712 06JAN1987 60
131712 07JAN1987 .
131712 27DEC2019 .
131712 30DEC2019 .
131712 31DEC2019 .
28829X 01JAN1987 121.32
28829X 02JAN1987 32.37
28829X 05JAN1987 115.19
28829X 27DEC2019 .
28829X 30DEC2019 .
28829X 31DEC2019 .
131713 01JAN1987 .
131714 01JAN1987 .
;;;;
run;
*count missing and not missing;
proc means data=have noprint nway;
class date;
var value;
output out=nMissing NMISS=NMISS;
run;
*convert to a percent;
data pctMissing;
set nMissing;
PCT_MISS = NMISS/_freq_;
run;
*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select *
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);
quit;
/*slightly more efficient way to do the means and pctMissing data set
*creates a view with a flag as 0/1 for missing which can then be used in proc means to calculate the percentage*/
data temp / view=temp;
set have;
if missing(value) then flag = 1; else flag=0;
run;
proc means data=temp noprint nway;
class date;
var flag;
output out=pctMissing mean=PCT_MISS;
run;
*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select *
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);
quit;
Thank you @Reeza , it works successfully to me.
I learn quite a bit from your solution and your "more efficient" way to get there.
data want;
set have;
if not missing(R);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.