BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi SAS community,

Today I faced a problem when trying to exclude the non-trading days from my dataset.

My dataset is described as below:

Type date R

131712	01JAN1987	.
131712	02JAN1987	40
131712	05JAN1987	80
131712	06JAN1987	60
131712	07JAN1987	.
.
.
131712  27DEC2019	.
131712  30DEC2019	.
131712  31DEC2019	.
.
.
28829X  01JAN1987   121.32
28829X  02JAN1987   32.37
28829X  05JAN1987   115.19
.
.
28829X  27DEC2019	.
28829X  30DEC2019	.
28829X  31DEC2019	.
.
.

The first column is the label of the companies (character type) (I have around 10,000 stocks sorted ascendingly), the second column is about the date (ddmmyyyy9.) (daily data from 1/1/1987 to 31/12/2019), and the third column is a numeric variable named "return index" (R)accordingly.

 

My concern here is that: a day on which more than 90% of stocks have zero returns is non-trading days, and then I need to exclude these non-trading days.

Many thanks in advance.

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data have;
infile cards truncover;
informat ID $8. date date9.;
format date date9.;
input ID  date   value;
cards;
131712 01JAN1987	.
131712 02JAN1987	40
131712 05JAN1987	80
131712 06JAN1987	60
131712 07JAN1987	.
131712 27DEC2019	.
131712 30DEC2019	.
131712 31DEC2019	.
28829X 01JAN1987   121.32
28829X 02JAN1987   32.37
28829X 05JAN1987   115.19
28829X 27DEC2019	.
28829X 30DEC2019	.
28829X 31DEC2019	.
131713 01JAN1987	.
131714 01JAN1987	.
;;;;
run;

*count missing and not missing;
proc means data=have noprint nway;
class date;
var value;
output out=nMissing NMISS=NMISS;
run;

*convert to a percent;
data pctMissing;
set nMissing;
PCT_MISS = NMISS/_freq_;
run;

*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select * 
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);

quit;

/*slightly more efficient way to do the means and pctMissing data set
*creates a view with a flag as 0/1 for missing which can then be used in proc means to calculate the percentage*/
data temp / view=temp;
set have;

if missing(value) then flag = 1; else flag=0;
run;

proc means data=temp noprint nway;
class date;
var flag;
output out=pctMissing mean=PCT_MISS;
run;

*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select * 
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);

quit;

View solution in original post

5 REPLIES 5
Reeza
Super User
Can you show what you've tried so far?
Phil_NZ
Barite | Level 11

Hi @Reeza , thank you for your reply,

I am not sure about how to code for this question, but from the idea, I postulate one possible solution for this concern is to create a new column (obs_per_day_0) and count through the column R (+1 when R =0). Then after that, we compare obs_per_day_0 with the total number of companies in my dataset. If the number in each day of the column obs_per_day_0 is higher than 90% of the total number, I will exclude this day from the sample.

 

Many thanks.

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Reeza
Super User
data have;
infile cards truncover;
informat ID $8. date date9.;
format date date9.;
input ID  date   value;
cards;
131712 01JAN1987	.
131712 02JAN1987	40
131712 05JAN1987	80
131712 06JAN1987	60
131712 07JAN1987	.
131712 27DEC2019	.
131712 30DEC2019	.
131712 31DEC2019	.
28829X 01JAN1987   121.32
28829X 02JAN1987   32.37
28829X 05JAN1987   115.19
28829X 27DEC2019	.
28829X 30DEC2019	.
28829X 31DEC2019	.
131713 01JAN1987	.
131714 01JAN1987	.
;;;;
run;

*count missing and not missing;
proc means data=have noprint nway;
class date;
var value;
output out=nMissing NMISS=NMISS;
run;

*convert to a percent;
data pctMissing;
set nMissing;
PCT_MISS = NMISS/_freq_;
run;

*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select * 
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);

quit;

/*slightly more efficient way to do the means and pctMissing data set
*creates a view with a flag as 0/1 for missing which can then be used in proc means to calculate the percentage*/
data temp / view=temp;
set have;

if missing(value) then flag = 1; else flag=0;
run;

proc means data=temp noprint nway;
class date;
var flag;
output out=pctMissing mean=PCT_MISS;
run;

*get list of dates where percent missing is over 90%;
proc sql;
create table filteredData as
select * 
from have where date not in (select date from pctMissing where PCT_MISS > 0.5);

quit;
Phil_NZ
Barite | Level 11

Thank you @Reeza , it works successfully to me.

 

I learn quite a bit from your solution and your "more efficient" way to get there.

 
 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Shmuel
Garnet | Level 18
data want;
 set have;
     if not missing(R);
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1279 views
  • 1 like
  • 3 in conversation