Hello, I would like to multiply two datasets A and B that has equal dimensions T X N which can both can be a large number.
- N can be very large (as of 100.000 randomized numbers). The first variable name is equation which is the only variable. Variable name 2:N has numerical name and its not necessarily 1 to 100.000 - it could be 2, 10, 200,... so on.
- Both datasets are similarly structured. They have the same ordering of variable names and observations.
- I am currently using SAS 9.4.
An example of the two datasets is:
DATASET A
Variable names Equation 2 10
1 0 20
2 10 1
3 2 5
Dataset B
Variable names Equation 2 10
1 0 1
2 1 0
3 0 1
Dataset want
Variable names Equation 2 10
1 0 20
2 10 0
3 0 5
Thank you for your time.
If
then there is a relatively compact way to do this in a DATA step:
data a b;
input equation $1. v1 v2 @; output a;
input v1 v2; output b;
datalines;
1 0 20 0 1
2 10 1 1 0
3 2 5 0 1
run;
data prod;
set a b;
by equation;
array vars _numeric_;
do over vars;
vars=vars*lag(vars);
end;
if last.equation=1 ;
run;
This program assumes that each equation is present in both data sets. But if any equation is present in only one data set, then you have to protect against outputting false results for a single-member equation, contaminated by a lagged value from the prior equation:
data prod;
set a b;
by equation;
array vars _numeric_;
do over vars;
vars=vars*lag(vars);
end;
if first.equation=1 and last.equation=1 then call missing(of vars{*});
if last.equation=1 ;
run;
By the way, if equation is also a numeric variable it's relatively straightforward to exclude it from the product calculation (i.e. to exclude if from the array declaration.
Do you have PROC IML?
Is the data really in SAS? You can't have variable names that are the number 2.
I'm thinking this could be done easily in Microsoft Excel.
I do have proc IML. I believe the variables are written as e.g. '10'n (but its not what comes out when i use proc contents). It could be done in Excel, however dataset A and B change daily so it would be tedious to repeat it in excel. I would rather have a code in SAS
proc iml;
use a var {n2 n10};
read all into a;
use b var {n2 n10};
read all into b;
mult=a#b;
create out from mult;
append from mult;
quit;
This is sloppy but it should work if your variables are in order:
I added the prefix n to your variables, because like @PaigeMiller said, you can't start variables with a numeric.
%let firstV = n2;
%let lastV = n10;
data a;
Equation = 1; n2 = 0; n10 = 20; output;
Equation = 2; n2 = 10; n10 = 1; output;
Equation = 3; n2 = 5; n10 = 5; output;
run;
data b;
Equation = 1; n2 = 0; n10 = 1; output;
Equation = 2; n2 = 1; n10 = 0; output;
Equation = 3; n2 = 0; n10 = 1; output;
run;
proc sql noprint ;
select cats(name,'=new_',name) into :renames separated by ' '
from dictionary.columns
where libname='WORK' and memname='B'
;
quit;
proc datasets nolist library=work;
modify b;
rename &renames;
run;
quit;
data want (drop=new_Equation i new_&firstV.--new_&lastV.);
merge a b;
array colA{*} &firstV.--&lastV.;
array colB{*} new_&firstV.--new_&lastV.;
do i=1 to dim(colA);
colA{i} = colA{i}*colB{i};
end;
run;
just change the first and last variables in the beginning to the first and last in your data.
If your datasets are strictly the same size, you could go this way:
data a;
equation = 1; '2'n = 10; '10'n = 20; output;
equation = 2; '2'n = 15; '10'n = 24; output;
equation = 3; '2'n = 20; '10'n = 40; output;
run;
data b;
equation = 1; '2'n = 1; '10'n = 0; output;
equation = 2; '2'n = 0; '10'n = 1; output;
equation = 3; '2'n = 1; '10'n = 0; output;
run;
proc sql;
select num_numeric into :vars
from dictionary.tables
where libname="WORK" and memname="A";
quit;
data c;
set a;
array v _numeric_;
array x {&vars.} _temporary_;
do i = 1 to &vars.;
x{i} = v{i};
end;
set b;
do i = 2 to &vars.;
v{i} = v{i}*x{i};
end;
drop i;
run;
proc print data=c noobs; run;
equation 2 10 1 10 0 2 0 24 3 20 0
If
then there is a relatively compact way to do this in a DATA step:
data a b;
input equation $1. v1 v2 @; output a;
input v1 v2; output b;
datalines;
1 0 20 0 1
2 10 1 1 0
3 2 5 0 1
run;
data prod;
set a b;
by equation;
array vars _numeric_;
do over vars;
vars=vars*lag(vars);
end;
if last.equation=1 ;
run;
This program assumes that each equation is present in both data sets. But if any equation is present in only one data set, then you have to protect against outputting false results for a single-member equation, contaminated by a lagged value from the prior equation:
data prod;
set a b;
by equation;
array vars _numeric_;
do over vars;
vars=vars*lag(vars);
end;
if first.equation=1 and last.equation=1 then call missing(of vars{*});
if last.equation=1 ;
run;
By the way, if equation is also a numeric variable it's relatively straightforward to exclude it from the product calculation (i.e. to exclude if from the array declaration.
Cheers - works exactly as intended!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.