DATA Step, Macro, Functions and more

Log differnces of time series pairs

Reply
Contributor
Posts: 29

Log differnces of time series pairs

To determine the log differences of all possible pairs of n time series I have to calc. n!/(n-2)!2! pairs. The time series dataset comes in the form of a data set with three columns: date, TimeSeriesID, Value

I store the number of times series in the macro var nseries. then i create the pairs data set:

data pairs(keep=stockSmiley Happy;
do i=1 to %eval(&nseries-1);
do j=i+1 to &nseries;
ID1=i;
ID2=j;
output;
end;
end;
run;

Using SQL i create the log differences, but the log complains about not being able to optimize because of the (intentionally) created Cartesian product.
proc sql noprint;
create table logp as
select a.*
,log(b.Value/c.Value) as y
from (
select a.*
,b.date
from pairs a
,(select distinct date from TimeSeriesDS) b
)a
left join TimeSeriesDS b
on a.ID1=b.TimeSeriesID
and a.date=b.date
left join TimeSeriesDS c
on a.ID2=c.TimeSeriesID
and a.date=c.date
order by ID1,ID2,date
;
quit;

This SQL code takes some time and the output data set is huge! Is there something better I can do? (tried transposing the time series dataset, but couldn't find a way to deal with the pairs calculation)
Respected Advisor
Posts: 4,173

Re: Log differnces of time series pairs

Hi

Not that I understand too much about time series but from what you describe:

If you have to calculate all possible pairs over all time series then is this close to a cartesian product and it doesn't take much for the result set becoming huge.

As for the number of pairs: Wouldn't that just be "n(n-1)"

The following code is based on this assumption:

data have;
format date date9.;
id+1;
do TimeSeriesID=1 to 3;
do date='01Mar2010'd to '02Mar2010'd;
value=floor(ranuni(1)*10);
output;
end;
end;
run;

proc sql;
select log(l.Value/r.Value) as y
,l.value as l_value, r.value as r_value
,l.TimeSeriesID as L_TimeSeriesID
,r.TimeSeriesID as R_TimeSeriesID
,l.date as l_date,r.date as r_date
from have as l,have as r
where not(l.TimeSeriesID = r.TimeSeriesID and l.date = r.date)
;
quit;


HTH
Patrick
Contributor
Posts: 29

Re: Log differnces of time series pairs

The combination between Timeseries1 and Timerseries2 is in my case the same as the pair between Timeseries2 and Timeseries1. n(n-1) would give too many pairs ( in your counting the order of combination matters).

In order to not make the data set huge, I thought of storing all the time series in one data set with many columns (just one row for every date, and the all the time series as columns, ie. apply proc transpose to my original data set). The ugly thing about this seems the subsequent code, where you probably don't come around of a lot of marco loops... Maintaining SAS code with a lot of macro loops is in my experience more painful - (probably because I am not a good coder?).
Respected Advisor
Posts: 4,173

Re: Log differnces of time series pairs

Hi

"The combination between Timeseries1 and Timerseries2 is in my case the same as the pair between Timeseries2 and Timeseries1"

Think I understand. So it would be "n(n-1)/2".

I added a abs() function around the log() so that it won't matter which way round the pair is constructed.

Let me know if I got it right this time.

data have;
format date date9.;
do TimeSeriesID=1 to 3;
do date='01Mar2010'd to '02Mar2010'd;
id+1;
value=floor(ranuni(1)*10);
output;
end;
end;
run;

proc sql;
select
l.id as l_id,r.id as r_id,
abs(log(l.Value/r.Value)) as y
,l.value as l_value, r.value as r_value
,l.TimeSeriesID as L_TimeSeriesID
,r.TimeSeriesID as R_TimeSeriesID
,l.date as l_date,r.date as r_date
from have as l,have as r
where l.id < r.id
;
quit;


"In order to not make the data set huge, I thought of storing all the time series in one data set with many columns"

It's in the end the number of datapoints which counts and defines the storage requirements. If you organise them in a few rows but many columns or in many rows and only a few columns has not that much of an effect.

I would suggest that you organise your data in a way which suits your needs best.

HTH
Patrick
Respected Advisor
Posts: 4,173

Re: Log differnces of time series pairs

Creating "n(n-1)/2" pairs the following code eventually performs better than a SQL approach:

data have;
format date date9.;
do TimeSeriesID=1 to 3;
do date='01Mar2010'd to '02Mar2010'd;
id+1;
value=floor(ranuni(1)*10);
output;
end;
end;
run;

data want;
set have nobs=nobs;
do i=_n_+1 to nobs;
set have(keep=id value rename=(id=id2 value=value2)) point=i;
y=abs(log(value/value2));
output;
end;
run;

proc print data=want;
run;
Contributor
Posts: 29

Re: Log differnces of time series pairs

n(n-1)/2 is correct. I think there is a small misunderstanding. I want to calculate the log difference at every date for all pairs. So, for your last sql-example result I would put an "and l.date=r.date" in the WHERE statement of the sql code. At the end I want a log difference for all pairs (3) at every date (2), which should give me a data set with 6 rows.
With the additional code in the where statement I get the same result.

Since data steps still have some mysteries for me, I am still starring at your last submitted the code... thanks for all the effort you put in your answers!
Ask a Question
Discussion stats
  • 5 replies
  • 179 views
  • 0 likes
  • 2 in conversation