BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tricky
Calcite | Level 5

Hello, I would like to multiply two datasets A and B that has equal dimensions T X N which can both can be a large number.

 

- N can be very large (as of 100.000 randomized numbers). The first variable name is equation which is the only variable. Variable name 2:N has numerical name and its not necessarily 1 to 100.000 - it could be 2, 10, 200,... so on.

 

- Both datasets are similarly structured. They have the same ordering of variable names and observations.

 

- I am currently using SAS 9.4.

 

An example of the two datasets is:

 

DATASET A                                                                                  

  Variable names     Equation     2        10                          

                                 1               0        20                                            

                                 2               10        1                                             

                                 3               2          5

 

Dataset B

  Variable names     Equation     2        10                           

                                 1               0          1                                           

                                 2               1          0                                            

                                 3               0          1

                                     

 And i wish to get a dataset that has multiplied by elements:

 

Dataset want

  Variable names     Equation     2         10                           

                                 1               0         20                                         

                                 2               10         0                                           

                                 3               0           5

 

Thank you for your time.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If

  1. The variables to be multiplied comprise all the numeric variables (let's say EQUATION is a character variable), and
  2. They are identically named in A and B

then there is a relatively compact way to do this in a DATA step:

 

data a b;
  input equation $1. v1 v2 @;   output a;
  input v1 v2;  output b;   
datalines;
1  0 20  0 1
2 10  1  1 0
3  2  5  0 1
run;


data prod;
  set a b;
  by equation;
  array vars _numeric_;
  do over vars;
    vars=vars*lag(vars);
  end;
  if last.equation=1 ;
run;

  1. The SET statement with a BY statement interleaves observations from A and B.  I.e. the equation 1 from A is followed by equation 1 from B, etc, then A2, B2, A3, B3, etc.

  2. Multiplying vars*lag(vars) effectively produces products of values from A by values from the corresponding B.  Just be sure to output only the last record (the "B" record) for each equation.  This prevents outputting the A value for one equation multiplied by the B value from the prior equation.

 

This program assumes that each equation is present in both data sets.  But if any equation is present in only one data set, then you have to protect against outputting false results for a single-member equation, contaminated by a lagged value from the prior equation:

 

data prod;
  set a b;
  by equation;
  array vars _numeric_;
  do over vars;
    vars=vars*lag(vars);
  end;
  if first.equation=1 and last.equation=1 then call missing(of vars{*});
  if last.equation=1 ;
run;

By the way, if equation is also a numeric variable it's relatively straightforward to exclude it from the product calculation (i.e. to exclude if from the array declaration.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Do you have PROC IML?

 

Is the data really in SAS? You can't have variable names that are the number 2.

 

I'm thinking this could be done easily in Microsoft Excel.

--
Paige Miller
Tricky
Calcite | Level 5

I do have proc IML. I believe the variables are written as e.g. '10'n (but its not what comes out when i use proc contents). It could be done in Excel, however dataset A and B change daily so it would be tedious to repeat it in excel. I would rather have a code in SAS Smiley Happy

 

PaigeMiller
Diamond | Level 26
proc iml;
    use a var {n2 n10};
	read all into a;
	use b var {n2 n10};
	read all into b;
	mult=a#b;
	create out from mult;
	append from mult;
quit;

 

--
Paige Miller
JeffMaggio
Obsidian | Level 7

This is sloppy but it should work if your variables are in order:

 

I added the prefix n to your variables, because like @PaigeMiller said, you can't start variables with a numeric. 

 

%let firstV = n2;
%let lastV = n10;
data a;
Equation = 1; n2 = 0; n10 = 20; output;
Equation = 2; n2 = 10; n10 = 1; output;
Equation = 3; n2 = 5; n10 = 5; output;
run;

data b;
Equation = 1; n2 = 0; n10 = 1; output;
Equation = 2; n2 = 1; n10 = 0; output;
Equation = 3; n2 = 0; n10 = 1; output;
run;

proc sql noprint ;
select cats(name,'=new_',name) into :renames separated by ' '
  from dictionary.columns
  where libname='WORK' and memname='B'
  ;
quit;
 
proc datasets nolist library=work;
  modify b;
    rename &renames;
  run;
quit;

data want (drop=new_Equation i new_&firstV.--new_&lastV.);
merge a b;
array colA{*} &firstV.--&lastV.;
array colB{*} new_&firstV.--new_&lastV.;
do i=1 to dim(colA);
  colA{i} = colA{i}*colB{i};
end;
run;

just change the first and last variables in the beginning to the first and last in your data.

PGStats
Opal | Level 21

If your datasets are strictly the same size, you could go this way:

 

data a;
equation = 1; '2'n = 10; '10'n = 20; output;
equation = 2; '2'n = 15; '10'n = 24; output;
equation = 3; '2'n = 20; '10'n = 40; output;
run;

data b;
equation = 1; '2'n = 1; '10'n = 0; output;
equation = 2; '2'n = 0; '10'n = 1; output;
equation = 3; '2'n = 1; '10'n = 0; output;
run;

proc sql;
select num_numeric into :vars
from dictionary.tables 
where libname="WORK" and memname="A";
quit;

data c;
set a;
array v _numeric_;
array x {&vars.} _temporary_;
do i = 1 to &vars.;
	x{i} = v{i};
	end;
set b;
do i = 2 to &vars.;
	v{i} = v{i}*x{i};
	end;
drop i;
run;

proc print data=c noobs; run;
equation 	2 	10
1 	10 	0
2 	0 	24
3 	20 	0
PG
mkeintz
PROC Star

If

  1. The variables to be multiplied comprise all the numeric variables (let's say EQUATION is a character variable), and
  2. They are identically named in A and B

then there is a relatively compact way to do this in a DATA step:

 

data a b;
  input equation $1. v1 v2 @;   output a;
  input v1 v2;  output b;   
datalines;
1  0 20  0 1
2 10  1  1 0
3  2  5  0 1
run;


data prod;
  set a b;
  by equation;
  array vars _numeric_;
  do over vars;
    vars=vars*lag(vars);
  end;
  if last.equation=1 ;
run;

  1. The SET statement with a BY statement interleaves observations from A and B.  I.e. the equation 1 from A is followed by equation 1 from B, etc, then A2, B2, A3, B3, etc.

  2. Multiplying vars*lag(vars) effectively produces products of values from A by values from the corresponding B.  Just be sure to output only the last record (the "B" record) for each equation.  This prevents outputting the A value for one equation multiplied by the B value from the prior equation.

 

This program assumes that each equation is present in both data sets.  But if any equation is present in only one data set, then you have to protect against outputting false results for a single-member equation, contaminated by a lagged value from the prior equation:

 

data prod;
  set a b;
  by equation;
  array vars _numeric_;
  do over vars;
    vars=vars*lag(vars);
  end;
  if first.equation=1 and last.equation=1 then call missing(of vars{*});
  if last.equation=1 ;
run;

By the way, if equation is also a numeric variable it's relatively straightforward to exclude it from the product calculation (i.e. to exclude if from the array declaration.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tricky
Calcite | Level 5

Cheers - works exactly as intended! 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 930 views
  • 2 likes
  • 5 in conversation