turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- loop on multiple tables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 11:46 AM

Hello,

I have a problem that might be solved be a loop (I guess).

I must calculate the square root of the sommation of the differences (square) between the weights invested between two banks. The application of the equation 1 at the page 5: http://www.ccfr.org.cn/cicf2010/papers/20091210013047.pdf

I have 20 banks. I have 20 tables with 11 columns. One to identify the years and ten others to identify the weights invested in each of the ten industries (the sommation on one row of the ten weights is equal to 1, of course).

The structure of 20 tables is the same. I must take each pair of banks to apply the equation. My result should be a number for each year for two banks compared.

Can you help me with the code or its structure?

Thanks a lot.

Accepted Solutions

Solution

10-30-2013
01:17 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-30-2013 01:17 PM

In this case, a slight tweak to the example code i've listed in post#7 will get you the desired output.

Sadly, proc distance requires a character variable for ID and doesn't compute square root so it will have to be done in 3 steps (would be 4 if your single dataset wasn't sorted by year/id)

step1 create a character id

data have;

set have;

bank_id="_"||put(ultimateparentid, best8.);

run;

step2 calculate distance

proc distance data=have out=want method=sqeuclid;

var interval (w_; /* use w_: to represent the group of all variables starting with w_*/

id bank_id;

by year;

run;

step3 calculate square root

data want;

set want;

array dist {*} _: ; /* as proc distance creates a square matrix with variables named after bank_id and we used _ to preceed all numeric bank ID, we can use _: to index all such created variables through an array */

do over dist;

if dist NE . then dist=sqrt(dist); /* I believe this is the appropriate syntax for do over but its been a while, at worse, you can use do i=1 to dim(dist) and index dist{i} inside the loop */

end;

run;

You will notice a variable containing all _id and 20 variables _id with a lower triangular matrix of calculated distances. it's up to you to convert that in a different format if you want.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 11:49 AM

How many observations per table?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 11:49 AM

You should also post a sample of what your data looks like and the sample output...I can read the paper but don't actually want to

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 11:58 AM

I will.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 11:55 AM

18 rows, 11 columns - the column of the years is just to identify.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 12:36 PM

At first I thought "IML JOB" but since you don't want it output in a matrix format and just want the lower(or upper) triangle of calculations, it is probably not worth learning proc iml. Esp. since a big chunk of the work is data manipulation and not quite the element-wise squared difference of vectors. Anyway, here's how I would tackle the problem:

STEP 1. Preprocess your 20 per bank files to obtain 18 per year files and change the year variable to a bank code or name variable. This can be done reasonably easily using the IN= dataset option in a single data step like

data y1995 y1996 y1997 /* ... */ y2013;

set bank1(IN=b1) bank2(in=b1) /* ... */ bank20(in=b20);

array bcode {*} b1-b20;

case (yearvariable);

when (1995) then output y1995;

/*...*/

when (2013 then output y2013;

/* you can use an otherwise statement for outstanding year data for validation */

end;

run;

STEP 2 - Calculate each statistic and output that to a new dataset with bank-indicator variables

/* use macro to loop over over the 18 new datasets, providing the logic example below */

data want1995(keep=b1-b20 outputstat);

array outer {30} _temporary_;

array dv {*} b1-b20 w1-w10 /* there should be exactly 30 variables, the 20 bank indicators and the 10 weights adjust weight variable names accordingly */

do i=1 to outerobs;

set y1995(nobs=outerobs drop=yearvariable);

do k=1 to 30;

outer{k}=dv{k}; /*store outer record in a temporary array to have it available while looping over all subsequent rows in the file */

end;

do j=(i+1) to innerobs;

set y1995(nobs=innerobs);

do k=1 to 20;

dv{k}=max(dv{k}, outer{k}); /*replace one column of 0 with a 1 to indicate which 2 banks contribute to the output calculation */

end;

temp=0; /*initialize Sum of Squares*/

do k=21 to 30;

temp=temp+(dv{k}-outer{k})**2;

end;

outputstat=sqrt(temp);

output;

end;

end;

run;

This is all untested but it provides a logical path on how to handle the looping and data manipulation. The resulting wantYYYY dataset will have n(n-1)/2 rows for each possible pairs with 20 "bank indicators" variables, 18 of which are set to 0, 2 of which are set to 1 and a single output statistic.

Hope this helps

Vince

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 12:54 PM

Wouldn't there be a way to set all the datasets together and then use by groups and proc distance with squared euclidean distance?

This would avoid loops and makes it much easier to understand.

Need data though

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 01:52 PM

I didn't know about that procedure but it appears you can. I haven't figured out a way to have the square root calculated directly in the procedure but it can calculate the SS easily for the lower triangle by group.

Something such as:

data bankset1;

input year w1 w2 w3 w4 w5;

datalines;

2000 0.1 0.2 0.2 0.4 0.1

2001 0.2 0.1 0.2 0.4 0.1

2002 0.5 0.1 0.1 0.1 0.2

;

run;

data bankset2;

input year w1 w2 w3 w4 w5;

datalines;

2000 0.2 0.1 0.1 0.5 0.1

2001 0.2 0.2 0.2 0.2 0.2

2002 0.3 0.3 0.1 0.1 0.2

;

run;

data bankset3;

input year w1 w2 w3 w4 w5;

datalines;

2000 0.2 0.2 0.2 0.3 0.1

2001 0.7 0.1 0.1 0.0 0.1

2002 0.5 0.1 0.3 0.0 0.1

;

run;

%macro m();

data have;

%do i=1 %to 3;

set bankset&i.;

output;

%end;

%mend;

%m();

proc sort data=have;

by year bank_id;

run;

proc distance data=have out=want method=sqeuclid;

var interval (w1-w5);

id bank_id;

by year;

run;

just gotta calculate square root!

Thanks Reeza for the lesson

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 02:02 PM

You don't need the macro either Vince, assuming SAS 9.2+

data have;

set bankset1-bankset3 indsname=source;

bank_id=source;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-30-2013 12:33 PM

Thanks Vince and Reeza for your answers.

Sorry for the delay. I rebuild a new table with all data.

I take a printscreen to show you what it looks like.

My table has 360 rows (20 banks x 18 years) and 12 columns.

I create a column of ids (UltimateParentID). So, my code must do this:

(Legend: Variable, UltimateParentID, year)

square root [((W_AG,7052,1995)-(W_AG,7135,1995))^2+ ((W_MIN,7052,1995)-(W_MIN,7135,1995))^2+((W_CON,7052,1995)-(W_CON,7135,1995))^2+((W_TRAN,7052,1995)-(W_TRAN,7135,1995))^2+

((W_WHOLE,7052,1995)-(W_WHOLE,7135,1995))^2+((W_RE,7052,1995)-(W_RE,7135,1995))^2+((W_SER,7052,1995)-(W_SER,7135,1995))^2+((W_FIN,7052,1995)-(W_FIN,7135,1995))^2+

((W_PUBLIC,7052,1995)-(W_PUBLIC,7135,1995))^2+((W_Autre,7052,1995)-(W_Autre,7135,1995))^2)]

This calculation must be repeated between all group of two banks for each year.

n(n-1)/2=190 results per year

I made clarifications. I think my problem was incorrectly formulated.

Solution

10-30-2013
01:17 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-30-2013 01:17 PM

In this case, a slight tweak to the example code i've listed in post#7 will get you the desired output.

Sadly, proc distance requires a character variable for ID and doesn't compute square root so it will have to be done in 3 steps (would be 4 if your single dataset wasn't sorted by year/id)

step1 create a character id

data have;

set have;

bank_id="_"||put(ultimateparentid, best8.);

run;

step2 calculate distance

proc distance data=have out=want method=sqeuclid;

var interval (w_; /* use w_: to represent the group of all variables starting with w_*/

id bank_id;

by year;

run;

step3 calculate square root

data want;

set want;

array dist {*} _: ; /* as proc distance creates a square matrix with variables named after bank_id and we used _ to preceed all numeric bank ID, we can use _: to index all such created variables through an array */

do over dist;

if dist NE . then dist=sqrt(dist); /* I believe this is the appropriate syntax for do over but its been a while, at worse, you can use do i=1 to dim(dist) and index dist{i} inside the loop */

end;

run;

You will notice a variable containing all _id and 20 variables _id with a lower triangular matrix of calculated distances. it's up to you to convert that in a different format if you want.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-04-2013 10:54 AM

Thanks a lot Vince.

I tried your codes. I have a problem with the step 3 like you seem to expect.

''over'' is underlined. The number 184 appears below and it tells me that ''The instruction DO OVER cannot be operated with tables that have explicit inferior indexes''.

I tried to figure out by myself. It's my first complex code so I didn't find an answer yet. If you have an idea to make it work, thank you again!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-04-2013 11:02 AM

I'm sorry about it. Oddly enough the short documentation I have about the DO OVER statement requires an array defined through explicit variables range which is what _: is supposed to define. Hopefully it is not an issue with _: picking up variable classes like _numeric_ or _character_ (but I would doubt it). Nonetheless, you can simply replace the entire do; end; block by the equivalent block below where all array referencing has been explicitely done with variable i. You may want to add a drop i; statement at the end of your datastep to remove it from your final dataset.

do i=1 to dim(dist);

if dist{i} NE . then dist{i}=sqrt(dist{i});

end;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-04-2013 11:14 AM

Thank you so much! It works!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-04-2013 12:51 PM

I think if you use Method=euclid rather than method=sqeuclid; you can remove that last step.

Euclidean Distance measurement: