BookmarkSubscribeRSS Feed
NewSASuser2018
Obsidian | Level 7

Hello sas community!

 

Many thanks again for beeing very helpful regarding my last problem. The solution worked marvelously!

 

My issue is the following. I have a large dataset containing ultra high frequency data (tick data):

 

Time                                     RateBid               RateAsk .....
01.01.2015:17:12:12.445   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:32.565   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:40.685   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:14:59.895   1.32473               1.32487
01.01.2015:17:14:59.895   1.86743               1.97473

 

An example.csv is attached below. I have already removed many obvious data anomalies and now want to filter for outliers as suggested in the literature.

 

1) For quotes that have the same time stamp (in HH:MM: SS : SSS precision) I want to detect duplicates (see the last to lines of my illustration above). For each of these duplicates I want to keep the median of RateBid and RateAsk for the respective timestamps respectively. So something like that:

 

Time                                     RateBid               RateAsk .....
01.01.2015:17:12:12.445   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:32.565   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:40.685   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:14:59.895   medRateBid         medRateAsk
 

My attempt was this:

 

proc sql;
   title 'Duplicate Rows in DUPLICATES Table';
   select *, count(*) as Count
      from example
      group by Time
      having count(*) > 1;
run;

So I was able to get at least see what timestamps feature duplicates. Online I found many solutions for detecting and deleting duplicates using proc sort and e.g. the noupkey option, however, I did not find any that suits my purpose.

 

 

As a second step I want to perform a manupulation regarding which I struggle understanding sas's syntax.

 

For each day delete the entry for which the spread is more than 50 times the median spread on that day. My attempt is this:

 

data example;
set example;
spread = RateAsk - RateBid;
run;

data example;
set example;
t=datepart(Time);
t1=timepart(Time);
format t date9. t1 time.;
run;

data example;
set example;
by t;
medspread= median(spread);
run;

data example;
set example;
if spread > 50*medspread then delete;
run;

 

 

This approch is erroneous, however. I don't get an error message but sas does not coumpute the median of the variable "spread" for each day. I must have got the indexing wrong.

After this step, to make sure, I want to delete all entries for which the so called mid-quote deviated by more than 10 mean absolute deviations from a rolling centered median (excluding the observation under consideration) of the 50 observations around the one considered.  Here to be honest, I cannot figure out how to construct such a measure in sas. Generally I am used to working with matlab where I would just construct a loop and use the runnig index to index intro the matrix elements. I know, however, that this is not the way to go in sas (that I know).

 

I hope you can help me with my issue. Thank you very much in advance!

 

Kind regars

 

2 REPLIES 2
NewSASuser2018
Obsidian | Level 7
Any help with any Problem as they are technicall 3 is greatly appreciated!

Best
NewSASuser2018
Obsidian | Level 7

As an update,

 

I seem to have figured out my first issue:

 

proc sql;
create table want as
select *, median(RateBid) as median_bid, median(RateAsk) as median_ask
from work.example
group by Time;
quit;


data want;
  set want;
  by Time;
  if last.Time;   
run;

so calculate the median for all observations (which is just the observation if there are no duplicates) and then I select the last one of ech observation. So for all time stamps only one observation (either the observation itself or the median) remains. Feel free to correct this, if I overlook something!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 970 views
  • 0 likes
  • 1 in conversation