BookmarkSubscribeRSS Feed
NewSASuser2018
Obsidian | Level 7

Hello sas community!

 

Many thanks again for beeing very helpful regarding my last problem. The solution worked marvelously!

 

My issue is the following. I have a large dataset containing ultra high frequency data (tick data):

 

Time                                     RateBid               RateAsk .....
01.01.2015:17:12:12.445   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:32.565   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:40.685   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:14:59.895   1.32473               1.32487
01.01.2015:17:14:59.895   1.86743               1.97473

 

An example.csv is attached below. I have already removed many obvious data anomalies and now want to filter for outliers as suggested in the literature.

 

1) For quotes that have the same time stamp (in HH:MM: SS : SSS precision) I want to detect duplicates (see the last to lines of my illustration above). For each of these duplicates I want to keep the median of RateBid and RateAsk for the respective timestamps respectively. So something like that:

 

Time                                     RateBid               RateAsk .....
01.01.2015:17:12:12.445   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:32.565   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:40.685   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:14:59.895   medRateBid         medRateAsk
 

My attempt was this:

 

proc sql;
   title 'Duplicate Rows in DUPLICATES Table';
   select *, count(*) as Count
      from example
      group by Time
      having count(*) > 1;
run;

So I was able to get at least see what timestamps feature duplicates. Online I found many solutions for detecting and deleting duplicates using proc sort and e.g. the noupkey option, however, I did not find any that suits my purpose.

 

 

As a second step I want to perform a manupulation regarding which I struggle understanding sas's syntax.

 

For each day delete the entry for which the spread is more than 50 times the median spread on that day. My attempt is this:

 

data example;
set example;
spread = RateAsk - RateBid;
run;

data example;
set example;
t=datepart(Time);
t1=timepart(Time);
format t date9. t1 time.;
run;

data example;
set example;
by t;
medspread= median(spread);
run;

data example;
set example;
if spread > 50*medspread then delete;
run;

 

 

This approch is erroneous, however. I don't get an error message but sas does not coumpute the median of the variable "spread" for each day. I must have got the indexing wrong.

After this step, to make sure, I want to delete all entries for which the so called mid-quote deviated by more than 10 mean absolute deviations from a rolling centered median (excluding the observation under consideration) of the 50 observations around the one considered.  Here to be honest, I cannot figure out how to construct such a measure in sas. Generally I am used to working with matlab where I would just construct a loop and use the runnig index to index intro the matrix elements. I know, however, that this is not the way to go in sas (that I know).

 

I hope you can help me with my issue. Thank you very much in advance!

 

Kind regars

 

2 REPLIES 2
NewSASuser2018
Obsidian | Level 7
Any help with any Problem as they are technicall 3 is greatly appreciated!

Best
NewSASuser2018
Obsidian | Level 7

As an update,

 

I seem to have figured out my first issue:

 

proc sql;
create table want as
select *, median(RateBid) as median_bid, median(RateAsk) as median_ask
from work.example
group by Time;
quit;


data want;
  set want;
  by Time;
  if last.Time;   
run;

so calculate the median for all observations (which is just the observation if there are no duplicates) and then I select the last one of ech observation. So for all time stamps only one observation (either the observation itself or the median) remains. Feel free to correct this, if I overlook something!

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1085 views
  • 0 likes
  • 1 in conversation