turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- sum of columns in data set multiplied by indexed m...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-31-2012 09:30 AM

I need to calculate sum of columns multiplied by proper macro variable.

columns are a, x, b, m etc (names can be different and the number of columns may change)

each of columns have corresponding macro value &m1, &m2, &m3 .... &mn (n- number of columns = number of macro vairables)

i need to calculate sum for each row: a*&m1 + x*&m2 + b*&m3 + m*&m4 + ..... and write results to result table

Can anybody help.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tom12122

05-31-2012 10:10 AM

One way would be by using arrays.

Zafer

data new;

a = 1;

b = 2;

c = 3;

d = 4;

e = 5;

output;

a = 2;

b = 3;

c = 5;

d = 6;

e = 1;

output;

run;

%let M1=1;

%let M2=2;

%let M3=3;

%let M4=4;

%let M5=5;

%let vars = a b c d e;

data x;

set new;

array _a_

dim = DIM(_a_);

do i=1 to dim;

Result = SUM(Result, _a_**SYMGET(CATS('M',i)));*

end;

drop dim i;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tom12122

05-31-2012 01:16 PM

This is what PROC SCORE can do. This examples uses the data macro vars from Alpay's example.

**data** coef;

if **0** then set new(keep=&vars);

array _n

length _TYPE_ _NAME_ $**8**;

retain _TYPE_ 'SCORE' _NAME_ 'RESULT';

do i = **1** to dim(_n);

_n* = symgetN(cats(*'M',i));

end;

output;

stop;

drop i;

**run**;

**Proc** **print**;

**run**;

**Proc** **score** data=new score=coef out=scored;

var &vars;

**run**;

**proc** **print**;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to data_null__

06-01-2012 02:01 AM

NULL, symgetN() is very low efficient , that means your code is very slow . I prefer to make a marco varible to hold such expression .

data _null_; set sashelp.vcolumn(keep=libname name memname type where=(libname='SASHELP' and memname='CLASS' and type='num')) end=last; length list $ 4000; retain list; list=catx(',',list,cats(name,'*','&m ',_n_)); if last then call symputx('list',list); run; %put %superq(list); %let M1=1; %let M2=2; %let M3=3; data want; set sashelp.class; sum=sum(&list ); run;

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-01-2012 09:35 AM

Ksharp wrote:

NULL, symgetN() is very low efficient , that means your code is very slow . I prefer to make a marco varible to hold such expression .

The argument that my program is inefficient appears to be based on your misunderstanding of the data step to create COEF. I believe if you study it more closely you will see SYMGETN would need to be very slow indeed to have any significant impact on performance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to data_null__

06-03-2012 11:27 PM

NULL,

No, I am not misunderstanding . I know how slow the symgetn() would be at any situation .

You put it on every single variable and every single observation which make your code very slow .

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-04-2012 06:28 AM

Yes it is obvious you don't understand because you don't know what you're talking about.

How many observations do you think COEF has?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-04-2012 08:27 AM

Excerpts from above:

%let M1=1;

%let M2=2;

%let M3=3;

%let M4=4;

%let M5=5;

%let vars = a b c d e;

This is what PROC SCORE can do. This examples uses the data macro vars from Alpay's example.

datacoef;if

0then set new(keep=&vars);array _n

length _TYPE_ _NAME_ $**8**;

retain _TYPE_ 'SCORE' _NAME_ 'RESULT';

do i = **1** to dim(_n);

_n* = symgetN(cats(*'M',i));

end;

output;

stop;

drop i;

**run**;

So, reviewing the data coef datastep:

1) No records are read from set new. It's only used to set the PDV variable attributes for &vars (a b c d e).

2) Create an array for a b c d e.

3) Create additional variables _TYPE_='SCORE' and _NAME_='RESULT'

4) Loop over the 5 element array, setting a b c d e to &m1, &m2, ..., &m5 respectively.

5) Output the record

6) Stop the data step (actually not needed but makes the code crystal clear).

So, there is only one loop through the data step, and symgetn is called 5 times. No records are read from an input dataset.

I gotta agree with Data: where do you think the poor performance would lie? Your statement that symgetn is slow is in fact correct, in principle, but not in this instance due to the construction of the data step.

On the other hand, your code:

data _null_;

set sashelp.vcolumn(keep=libname name memname type where=(libname='SASHELP' and memname='CLASS' and type='num')) end=last;

length list $ 4000;

retain list;

list=catx(',',list,cats(name,'*','&m ',_n_));

if last then call symputx('list',list);

run;

will need to process *every column* from *every dataset* in *every allocated library*! And if some of those libraries are RDBMS libraries it will perform even more poorly. Even though you're applying a where clause, the sashelp view will not use indexes, and in general will perform badly. Honestly, allocate 20 libraries or so, with say 100 datasets per library, with say an average of 20 variables per dataset, and see how sashelp.vcolumn will perform.

If you're going to use your approach, at least consider using PROC CONTENTS against your desired dataset, i.e. proc contents data=sashelp.class out=columns, then use your approach.

But of the two approaches shown, yours and Data's, I believe Data's will perform much faster.

Regards,

Scott