Hi SAS community,
Today I faced a problem when trying to exclude the non-trading days from my dataset.
My dataset is described as below:
Type date R
131712 01JAN1987 .
131712 02JAN1987 40
131712 05JAN1987 80
131712 06JAN1987 60
131712 07JAN1987 .
.
.
131712 27DEC2019 .
131712 30DEC2019 .
131712 31DEC2019 .
.
.
28829X 01JAN1987 121.32
28829X 02JAN1987 32.37
28829X 05JAN1987 115.19
.
.
28829X 27DEC2019 .
28829X 30DEC2019 .
28829X 31DEC2019 .
.
.
The first column is the label of the companies (character type) (I have around 10,000 stocks sorted ascendingly), the second column is about the date (ddmmyyyy9.) (daily data from 1/1/1987 to 31/12/2019), and the third column is a numeric variable named "return index" (R)accordingly.
My concern here is that: a day on which more than 90% of stocks have zero returns is non-trading days, and then I need to exclude these non-trading days.
Many thanks in advance.
data have;
infile cards truncover;
informat ID $8. date date9.;
format date date9.;
input ID date value;
cards;
131712 01JAN1987 .
131712 02JAN1987 40
131712 05JAN1987 80
131712 06JAN1987 60
131712 07JAN1987 .
131712 27DEC2019 .
131712 30DEC2019 .
131712 31DEC2019 .
28829X 01JAN1987 121.32
28829X 02JAN1987 32.37
28829X 05JAN1987 115.19
28829X 27DEC2019 .
28829X 30DEC2019 .
28829X 31DEC2019 .
131713 01JAN1987 .
131714 01JAN1987 .
;;;;
run;
*count missing and not missing;
proc means data=have noprint nway;
class date;
var value;
output out=nMissing NMISS=NMISS;
run;
*convert to a percent;
data pctMissing;
set nMissing;
PCT_MISS = NMISS/_freq_;
run;
*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select *
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);
quit;
/*slightly more efficient way to do the means and pctMissing data set
*creates a view with a flag as 0/1 for missing which can then be used in proc means to calculate the percentage*/
data temp / view=temp;
set have;
if missing(value) then flag = 1; else flag=0;
run;
proc means data=temp noprint nway;
class date;
var flag;
output out=pctMissing mean=PCT_MISS;
run;
*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select *
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);
quit;
Hi @Reeza , thank you for your reply,
I am not sure about how to code for this question, but from the idea, I postulate one possible solution for this concern is to create a new column (obs_per_day_0) and count through the column R (+1 when R =0). Then after that, we compare obs_per_day_0 with the total number of companies in my dataset. If the number in each day of the column obs_per_day_0 is higher than 90% of the total number, I will exclude this day from the sample.
Many thanks.
data have;
infile cards truncover;
informat ID $8. date date9.;
format date date9.;
input ID date value;
cards;
131712 01JAN1987 .
131712 02JAN1987 40
131712 05JAN1987 80
131712 06JAN1987 60
131712 07JAN1987 .
131712 27DEC2019 .
131712 30DEC2019 .
131712 31DEC2019 .
28829X 01JAN1987 121.32
28829X 02JAN1987 32.37
28829X 05JAN1987 115.19
28829X 27DEC2019 .
28829X 30DEC2019 .
28829X 31DEC2019 .
131713 01JAN1987 .
131714 01JAN1987 .
;;;;
run;
*count missing and not missing;
proc means data=have noprint nway;
class date;
var value;
output out=nMissing NMISS=NMISS;
run;
*convert to a percent;
data pctMissing;
set nMissing;
PCT_MISS = NMISS/_freq_;
run;
*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select *
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);
quit;
/*slightly more efficient way to do the means and pctMissing data set
*creates a view with a flag as 0/1 for missing which can then be used in proc means to calculate the percentage*/
data temp / view=temp;
set have;
if missing(value) then flag = 1; else flag=0;
run;
proc means data=temp noprint nway;
class date;
var flag;
output out=pctMissing mean=PCT_MISS;
run;
*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select *
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);
quit;
Thank you @Reeza , it works successfully to me.
I learn quite a bit from your solution and your "more efficient" way to get there.
data want;
set have;
if not missing(R);
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.