turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Log differnces of time series pairs

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-23-2010 04:23 AM

To determine the log differences of all possible pairs of n time series I have to calc. n!/(n-2)!2! pairs. The time series dataset comes in the form of a data set with three columns: date, TimeSeriesID, Value

I store the number of times series in the macro var nseries. then i create the pairs data set:

data pairs(keep=stock;

do i=1 to %eval(&nseries-1);

do j=i+1 to &nseries;

ID1=i;

ID2=j;

output;

end;

end;

run;

Using SQL i create the log differences, but the log complains about not being able to optimize because of the (intentionally) created Cartesian product.

proc sql noprint;

create table logp as

select a.*

,log(b.Value/c.Value) as y

from (

select a.*

,b.date

from pairs a

,(select distinct date from TimeSeriesDS) b

)a

left join TimeSeriesDS b

on a.ID1=b.TimeSeriesID

and a.date=b.date

left join TimeSeriesDS c

on a.ID2=c.TimeSeriesID

and a.date=c.date

order by ID1,ID2,date

;

quit;

This SQL code takes some time and the output data set is huge! Is there something better I can do? (tried transposing the time series dataset, but couldn't find a way to deal with the pairs calculation)

I store the number of times series in the macro var nseries. then i create the pairs data set:

data pairs(keep=stock;

do i=1 to %eval(&nseries-1);

do j=i+1 to &nseries;

ID1=i;

ID2=j;

output;

end;

end;

run;

Using SQL i create the log differences, but the log complains about not being able to optimize because of the (intentionally) created Cartesian product.

proc sql noprint;

create table logp as

select a.*

,log(b.Value/c.Value) as y

from (

select a.*

,b.date

from pairs a

,(select distinct date from TimeSeriesDS) b

)a

left join TimeSeriesDS b

on a.ID1=b.TimeSeriesID

and a.date=b.date

left join TimeSeriesDS c

on a.ID2=c.TimeSeriesID

and a.date=c.date

order by ID1,ID2,date

;

quit;

This SQL code takes some time and the output data set is huge! Is there something better I can do? (tried transposing the time series dataset, but couldn't find a way to deal with the pairs calculation)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-25-2010 12:49 AM

Hi

Not that I understand too much about time series but from what you describe:

If you have to calculate all possible pairs over all time series then is this close to a cartesian product and it doesn't take much for the result set becoming huge.

As for the number of pairs: Wouldn't that just be "n(n-1)"

The following code is based on this assumption:

data have;

format date date9.;

id+1;

do TimeSeriesID=1 to 3;

do date='01Mar2010'd to '02Mar2010'd;

value=floor(ranuni(1)*10);

output;

end;

end;

run;

proc sql;

select log(l.Value/r.Value) as y

,l.value as l_value, r.value as r_value

,l.TimeSeriesID as L_TimeSeriesID

,r.TimeSeriesID as R_TimeSeriesID

,l.date as l_date,r.date as r_date

from have as l,have as r

where not(l.TimeSeriesID = r.TimeSeriesID and l.date = r.date)

;

quit;

HTH

Patrick

Not that I understand too much about time series but from what you describe:

If you have to calculate all possible pairs over all time series then is this close to a cartesian product and it doesn't take much for the result set becoming huge.

As for the number of pairs: Wouldn't that just be "n(n-1)"

The following code is based on this assumption:

data have;

format date date9.;

id+1;

do TimeSeriesID=1 to 3;

do date='01Mar2010'd to '02Mar2010'd;

value=floor(ranuni(1)*10);

output;

end;

end;

run;

proc sql;

select log(l.Value/r.Value) as y

,l.value as l_value, r.value as r_value

,l.TimeSeriesID as L_TimeSeriesID

,r.TimeSeriesID as R_TimeSeriesID

,l.date as l_date,r.date as r_date

from have as l,have as r

where not(l.TimeSeriesID = r.TimeSeriesID and l.date = r.date)

;

quit;

HTH

Patrick

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-26-2010 03:11 AM

The combination between Timeseries1 and Timerseries2 is in my case the same as the pair between Timeseries2 and Timeseries1. n(n-1) would give too many pairs ( in your counting the order of combination matters).

In order to not make the data set huge, I thought of storing all the time series in one data set with many columns (just one row for every date, and the all the time series as columns, ie. apply proc transpose to my original data set). The ugly thing about this seems the subsequent code, where you probably don't come around of a lot of marco loops... Maintaining SAS code with a lot of macro loops is in my experience more painful - (probably because I am not a good coder?).

In order to not make the data set huge, I thought of storing all the time series in one data set with many columns (just one row for every date, and the all the time series as columns, ie. apply proc transpose to my original data set). The ugly thing about this seems the subsequent code, where you probably don't come around of a lot of marco loops... Maintaining SAS code with a lot of macro loops is in my experience more painful - (probably because I am not a good coder?).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-26-2010 06:30 AM

Hi

"The combination between Timeseries1 and Timerseries2 is in my case the same as the pair between Timeseries2 and Timeseries1"

Think I understand. So it would be "n(n-1)/2".

I added a abs() function around the log() so that it won't matter which way round the pair is constructed.

Let me know if I got it right this time.

data have;

format date date9.;

do TimeSeriesID=1 to 3;

do date='01Mar2010'd to '02Mar2010'd;

id+1;

value=floor(ranuni(1)*10);

output;

end;

end;

run;

proc sql;

select

l.id as l_id,r.id as r_id,

abs(log(l.Value/r.Value)) as y

,l.value as l_value, r.value as r_value

,l.TimeSeriesID as L_TimeSeriesID

,r.TimeSeriesID as R_TimeSeriesID

,l.date as l_date,r.date as r_date

from have as l,have as r

where l.id < r.id

;

quit;

"In order to not make the data set huge, I thought of storing all the time series in one data set with many columns"

It's in the end the number of datapoints which counts and defines the storage requirements. If you organise them in a few rows but many columns or in many rows and only a few columns has not that much of an effect.

I would suggest that you organise your data in a way which suits your needs best.

HTH

Patrick

"The combination between Timeseries1 and Timerseries2 is in my case the same as the pair between Timeseries2 and Timeseries1"

Think I understand. So it would be "n(n-1)/2".

I added a abs() function around the log() so that it won't matter which way round the pair is constructed.

Let me know if I got it right this time.

data have;

format date date9.;

do TimeSeriesID=1 to 3;

do date='01Mar2010'd to '02Mar2010'd;

id+1;

value=floor(ranuni(1)*10);

output;

end;

end;

run;

proc sql;

select

l.id as l_id,r.id as r_id,

abs(log(l.Value/r.Value)) as y

,l.value as l_value, r.value as r_value

,l.TimeSeriesID as L_TimeSeriesID

,r.TimeSeriesID as R_TimeSeriesID

,l.date as l_date,r.date as r_date

from have as l,have as r

where l.id < r.id

;

quit;

"In order to not make the data set huge, I thought of storing all the time series in one data set with many columns"

It's in the end the number of datapoints which counts and defines the storage requirements. If you organise them in a few rows but many columns or in many rows and only a few columns has not that much of an effect.

I would suggest that you organise your data in a way which suits your needs best.

HTH

Patrick

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-26-2010 09:46 AM

Creating "n(n-1)/2" pairs the following code eventually performs better than a SQL approach:

data have;

format date date9.;

do TimeSeriesID=1 to 3;

do date='01Mar2010'd to '02Mar2010'd;

id+1;

value=floor(ranuni(1)*10);

output;

end;

end;

run;

data want;

set have nobs=nobs;

do i=_n_+1 to nobs;

set have(keep=id value rename=(id=id2 value=value2)) point=i;

y=abs(log(value/value2));

output;

end;

run;

proc print data=want;

run;

data have;

format date date9.;

do TimeSeriesID=1 to 3;

do date='01Mar2010'd to '02Mar2010'd;

id+1;

value=floor(ranuni(1)*10);

output;

end;

end;

run;

data want;

set have nobs=nobs;

do i=_n_+1 to nobs;

set have(keep=id value rename=(id=id2 value=value2)) point=i;

y=abs(log(value/value2));

output;

end;

run;

proc print data=want;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-26-2010 11:35 AM

n(n-1)/2 is correct. I think there is a small misunderstanding. I want to calculate the log difference at every date for all pairs. So, for your last sql-example result I would put an "and l.date=r.date" in the WHERE statement of the sql code. At the end I want a log difference for all pairs (3) at every date (2), which should give me a data set with 6 rows.

With the additional code in the where statement I get the same result.

Since data steps still have some mysteries for me, I am still starring at your last submitted the code... thanks for all the effort you put in your answers!

With the additional code in the where statement I get the same result.

Since data steps still have some mysteries for me, I am still starring at your last submitted the code... thanks for all the effort you put in your answers!