BookmarkSubscribeRSS Feed
arsenalist
Calcite | Level 5

I have two data sets which look like this:

id1   id2   Z


1   1   z11

1   2   z12

1   3   z13

2   1   z21

2   2   z22

2   3   z23

3   1   z31

3   2   z32

3   3   z33

and

id   X   Y


1   x1   y1

2   x2   y2

3   x3   y3

I am new to SAS but my programming skills are very good when it comes to Stata and R.

Can someone please help me write a code in SAS that will create a new data set as such:

id   W


1   w1

2   w2

3   w3

where w1 = (z11*x1+z12*x2+z13*x3) / (z11*y1+z12*y2+z13*y3)

Forget about the fact that you can factor out the z's. The weighting doesn't exactly work like that, I've just simplified it.

Thank you!

ps. I have asked the exact same question on stackexchange before realizing there is a SAS community here. I hope that's okay.

9 REPLIES 9
Reeza
Super User

AFAIK there is no issue with cross posting. But if you do receive an answer in one it would be polite to mark the question answered in both and put the answer in both locations.

Here's one way using temporary arrays. I'm not sure it's correct given your SO code which makes me wonder if the process is more involved, but perhaps it will help you get started. Although Table2 has an ID variable it doesn't appear to be an ID, more of a row number, is that correct?

data t1;

input id1 id2 Z ;

cards;

1   1   8

1   2   9

1   3   10

2   1   11

2   2   12

2   3   13

3   1   14

3   2   15

3   3   16

;

run;

data t2;

input id x  y ;

cards;

1   1   4

2   2   5

3   3   5

;

run;

proc transpose data=t1 out=t1_f prefix=z;

by id1;

var z;

id id2;

run;

data want;

*load factors into temporary arrays to multiply;

if _n_=1 then do i=1 to 3;

set t2;

array _x(3) _temporary_;

array _y(3) _temporary_;

_x(i)=x;

_y(i)=y;

end;

*Start with actual data flipped;

set t1_f;

array z(3) z1-z3;

*initialize to 0;

num=0;

den=0;

*Calculate numerator/denominator;

do i=1 to 3;

num=num+z(i)*_x(i);

den=den+z(i)*_y(i);

end;

*Calculate final weight;

w=num/den;

run;

arsenalist
Calcite | Level 5

Thank you very much Reeza for such a quick response.

Would this wirj if the data set is very large? I have 20,000 panels with about 20,000 obs in each panel.

arsenalist
Calcite | Level 5

By the way, this is the code I already have but I don't think it's optimized for a large data set (20,000 x 20,000) since it creates a temporary table for each panel. I did that because I didn't want to lose the work already done in case the computer shuts down or something I can start where I left off basically.

arsenalist
Calcite | Level 5

Here is the code:

%macro ratio_m(id);
proc sql;
select id1, sum(x*z) as wx
  from table2 as P right join table1 as N
  on input(N.id2,8.)=P.id
  group by id1;
create table temp2
as
select orig_id, sum(y*z) as wy
  from table2 as P right join table1 as N
  on input(N.id2,8.)=P.id
  group by id1;
%if %sysfunc(exist(final)) %then %do;
  drop table final;
%end;
create table final
as
select &i as indexRow, one.*, two.*, (wx/wy) as ratio
  from temp1 as one right join temp2 as two
  on one.id1=two.id1;
quit;
%mend;


%macro main;
  %if %sysfunc(exist(Index_dsn))=1 %then %do;
  %if %sysfunc(exist(final_overall))=1 %then %do;
  proc sql noprint;
  select max(indexRow) into :indexRow from final_overall
  ;%let indexRow=%sysfunc(sum(&indexRow,1)); %put &indexRow;

  select max(indexRow) into :indexFinal from Index_dsn
  ;%let indexFinal=&indexFinal; %put &indexFinal;
  ;quit;
  %put Previous calculation ends on row &indexRow ;

  %end;
  %else %if %sysfunc(exist(final_overall))=0 %then %do;
  %put "First Iteration: Dataset final_overall does not exist";

  proc sql noprint;
  select max(indexRow) into :indexFinal from Index_dsn;
  ;
  create table final_overall
  (indexRow num
  ,id1 char(10)
  ,ratio num);
  ;quit;
  %let indexFinal=&indexFinal; %let indexRow=1;
  %end;
  %end;
  %else %if %sysfunc(exist(Index_dsn))=0 %then %do;
  %put "Datasets Index_dsn and final_overall dont exist";

  proc sql noprint;
  create table Index_dsn AS
  select distinct id1 from table1;
  ;quit;
  data Index_dsn; set Index_dsn; indexRow+1; run;
  proc sql noprint;
  select max(indexRow) into :indexFinal from Index_dsn;
  ;
  %let indexFinal=&indexFinal; %let indexRow=1;
  create table final_overall
  (indexRow num
  ,id1 char(10)
  ,ratio num);
  ;quit;
  %end;

  %do i=&indexRow %to &indexFinal;
  proc sql;
  select id1 into :id from Index_dsn where indexRow=&i
  ; %let id=&id;
  quit;
  %put &id;
  %ratio_m (&id);
  proc append base=.final_overall data=final force; run;
  %end;

%mend;


options nomlogic;
%main

Reeza
Super User

Perhaps post what you have originally and desire instead, that looks overly complicated.

For weight calculations you may be better off using a combination of proc means with the weight statement or if you really need matrix math then going into IML.

Ksharp
Super User

Yeah. Agree with Reeza.

Since you have good programming skill about R , you should take a look at SAS/IML code, because They are all Matrix language .

And Better Post it at IML forum . I saw you are doing some INNER product of matrix , that is matrix operation . Not good for data step, but good for IML and R. The result is in table WANT .

Code: Program

data t1;
input id1 id2 Z ;
cards;
1 1 8
1 2 9
1 3 10
2 1 11
2 2 12
2 3 13
3 1 14
3 2 15
3 3 16
;
run;


data t2;
input id x  y ;
cards;
1 1 4
2 2 5
3 3 5
;
run;
proc iml;
use t1;read all var _all_ into z;close;
use t2;read all var {x y} into xy;close;
zz=full(z[,{3 1 2}]);
inner_product=zz*xy;
id=1:nrow(inner_product);
want=id`||(inner_product[,1]/inner_product[,2]);
create want from want[c={id w}];
append from want;
close;
quit;

Xia Keshan

arsenalist
Calcite | Level 5

Thank you all. I don't think the IML solutions work with the size of my data set which is not very wide(4-5 variables) but is very long (20,000 panels and 20,000 obs in each roughly, not balanced).

Ksharp
Super User

Did you run my IML code. This code is designed for

"(4-5 variables) but is very long (20,000 panels and 20,000 obs "

Reeza
Super User

One thing that's nice about SAS is that the size of datasets, minus mismatched matrices, don't matter too match.  The same code that works on 100 obs works on 1000000,

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2367 views
  • 0 likes
  • 3 in conversation