BookmarkSubscribeRSS Feed
niemannk
Calcite | Level 5
To determine the log differences of all possible pairs of n time series I have to calc. n!/(n-2)!2! pairs. The time series dataset comes in the form of a data set with three columns: date, TimeSeriesID, Value

I store the number of times series in the macro var nseries. then i create the pairs data set:

data pairs(keep=stock:);
do i=1 to %eval(&nseries-1);
do j=i+1 to &nseries;
ID1=i;
ID2=j;
output;
end;
end;
run;

Using SQL i create the log differences, but the log complains about not being able to optimize because of the (intentionally) created Cartesian product.
proc sql noprint;
create table logp as
select a.*
,log(b.Value/c.Value) as y
from (
select a.*
,b.date
from pairs a
,(select distinct date from TimeSeriesDS) b
)a
left join TimeSeriesDS b
on a.ID1=b.TimeSeriesID
and a.date=b.date
left join TimeSeriesDS c
on a.ID2=c.TimeSeriesID
and a.date=c.date
order by ID1,ID2,date
;
quit;

This SQL code takes some time and the output data set is huge! Is there something better I can do? (tried transposing the time series dataset, but couldn't find a way to deal with the pairs calculation)
5 REPLIES 5
Patrick
Opal | Level 21
Hi

Not that I understand too much about time series but from what you describe:

If you have to calculate all possible pairs over all time series then is this close to a cartesian product and it doesn't take much for the result set becoming huge.

As for the number of pairs: Wouldn't that just be "n(n-1)"

The following code is based on this assumption:

data have;
format date date9.;
id+1;
do TimeSeriesID=1 to 3;
do date='01Mar2010'd to '02Mar2010'd;
value=floor(ranuni(1)*10);
output;
end;
end;
run;

proc sql;
select log(l.Value/r.Value) as y
,l.value as l_value, r.value as r_value
,l.TimeSeriesID as L_TimeSeriesID
,r.TimeSeriesID as R_TimeSeriesID
,l.date as l_date,r.date as r_date
from have as l,have as r
where not(l.TimeSeriesID = r.TimeSeriesID and l.date = r.date)
;
quit;


HTH
Patrick
niemannk
Calcite | Level 5
The combination between Timeseries1 and Timerseries2 is in my case the same as the pair between Timeseries2 and Timeseries1. n(n-1) would give too many pairs ( in your counting the order of combination matters).

In order to not make the data set huge, I thought of storing all the time series in one data set with many columns (just one row for every date, and the all the time series as columns, ie. apply proc transpose to my original data set). The ugly thing about this seems the subsequent code, where you probably don't come around of a lot of marco loops... Maintaining SAS code with a lot of macro loops is in my experience more painful - (probably because I am not a good coder?).
Patrick
Opal | Level 21
Hi

"The combination between Timeseries1 and Timerseries2 is in my case the same as the pair between Timeseries2 and Timeseries1"

Think I understand. So it would be "n(n-1)/2".

I added a abs() function around the log() so that it won't matter which way round the pair is constructed.

Let me know if I got it right this time.

data have;
format date date9.;
do TimeSeriesID=1 to 3;
do date='01Mar2010'd to '02Mar2010'd;
id+1;
value=floor(ranuni(1)*10);
output;
end;
end;
run;

proc sql;
select
l.id as l_id,r.id as r_id,
abs(log(l.Value/r.Value)) as y
,l.value as l_value, r.value as r_value
,l.TimeSeriesID as L_TimeSeriesID
,r.TimeSeriesID as R_TimeSeriesID
,l.date as l_date,r.date as r_date
from have as l,have as r
where l.id < r.id
;
quit;


"In order to not make the data set huge, I thought of storing all the time series in one data set with many columns"

It's in the end the number of datapoints which counts and defines the storage requirements. If you organise them in a few rows but many columns or in many rows and only a few columns has not that much of an effect.

I would suggest that you organise your data in a way which suits your needs best.

HTH
Patrick
Patrick
Opal | Level 21
Creating "n(n-1)/2" pairs the following code eventually performs better than a SQL approach:

data have;
format date date9.;
do TimeSeriesID=1 to 3;
do date='01Mar2010'd to '02Mar2010'd;
id+1;
value=floor(ranuni(1)*10);
output;
end;
end;
run;

data want;
set have nobs=nobs;
do i=_n_+1 to nobs;
set have(keep=id value rename=(id=id2 value=value2)) point=i;
y=abs(log(value/value2));
output;
end;
run;

proc print data=want;
run;
niemannk
Calcite | Level 5
n(n-1)/2 is correct. I think there is a small misunderstanding. I want to calculate the log difference at every date for all pairs. So, for your last sql-example result I would put an "and l.date=r.date" in the WHERE statement of the sql code. At the end I want a log difference for all pairs (3) at every date (2), which should give me a data set with 6 rows.
With the additional code in the where statement I get the same result.

Since data steps still have some mysteries for me, I am still starring at your last submitted the code... thanks for all the effort you put in your answers!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 895 views
  • 0 likes
  • 2 in conversation