Statistical programming, matrix languages, and more

Populating a new column in an existing dataset

Reply
Contributor
Posts: 24

Populating a new column in an existing dataset

I need to write the contents of a 1-column matrix into a new column in an existing dataset. Is there a way of doing this in IML such that the values being written into the new column line up with the corresponding content in the old data? (The matrix and the old dataset have the same number of rows).

Trusted Advisor
Posts: 1,277

Re: Populating a new column in an existing dataset

You write the contents of the 1-column matrix to a SAS data set (IML commands to use are CREATE and APPEND). Then in a SAS data step, you merge the original data set and the data set created from PROC IML.

SAS Super FREQ
Posts: 3,232

Re: Populating a new column in an existing dataset

I agree with @PaigeMiller. SAS stores data sets rowwise, so to add a column you actually need to create a new data set.  You can append rows to an existing data set, but not columns.

 

You could do this operation entirely in IML, but it will be more efficient to use the MERGE statement in the DATA step to combine the data (unless all the data are already in IML vectors or matrices).

Contributor
Posts: 24

Re: Populating a new column in an existing dataset

In other words, with the  MERGE statement, the rows in two datasets will line up? (The new dataset is calculated based on the old one, so has the same number of rows, but does not share a common field with it.)

Sorry for these kinds of questions, I'm utterly new to SAS and IML.

SAS Super FREQ
Posts: 3,232

Re: Populating a new column in an existing dataset

Yes. Here is a simple example that you can study and play with.

data Have;
do i = 1 to 10;
   x = i + 1;
   y = -2*i -4;
   output;
end;
run;

proc iml;
newVar = T(10:1);  /* 10, 9, 8,..., 1 */
create Calc var "newVar";
append;
close Calc;
quit;

data Want;
merge Have Calc;
run;
 
proc print data=Want;
run;
Contributor
Posts: 24

Re: Populating a new column in an existing dataset

Into the Calc dataset, I also want to read the variable "key_num" (or whatever I'll call the common field from the Have dataset) for use in the "BY" statement in the later MERGE. How do I make sure that the values in "newVar" line up with the corresponding values in "key_num"?

Trusted Advisor
Posts: 1,277

Re: Populating a new column in an existing dataset

You create your matrix in IML from the two columns in your original data set, key_num and the variable of interest.

 

Then when you use IML to create a new SAS data set, it will have key_num and the computed column. If there is any need to sort in IML, you'd want to sort both key_num and the data.

Contributor
Posts: 24

Re: Populating a new column in an existing dataset

How do I perform a computation (for example, elementwise multiplication) on only that column of the matrix which holds the variable of interest? In other words, how do I avoid changing the key value during computations?

Trusted Advisor
Posts: 1,277

Re: Populating a new column in an existing dataset

It is the programmers responsibility to make sure the rows line up. SAS will not guarantee this. If you have done nothing in either IML or outside of IML to change the orders of the rows, then they should line up.

 

You can guarantee that the rows will line up if you have some sort of row identifier variable, then you can merge the datasets using a BY statement and including this row identifier varialbe in the BY statement. For this to work, you need to have the row identifier in IML as well as in the original data set.

Contributor
Posts: 24

Re: Populating a new column in an existing dataset

[ Edited ]

So, to guarantee that MERGE works properly, do I need the new dataset to have two columns: the new content and the common field (to use in the "BY" statement) from the old dataset?

Trusted Advisor
Posts: 1,277

Re: Populating a new column in an existing dataset

[ Edited ]

Either that, or by confirming that your code has not changed the order of the data in IML and has not changed the order of the data in your data step.

SAS Super FREQ
Posts: 3,232

Re: Populating a new column in an existing dataset

Are you sure that you need IML for this problem?  Perhaps the DATA step would be easier for you,

 

If you want to learn IML, here are some tips for getting started with IML programming.  I think after you complete steps 1-3 you should be able to make a good first attempt at this problem.

Contributor
Posts: 24

Re: Populating a new column in an existing dataset

I, too, am wondering whether IML is how this is done, since I'm not really doing anything that necessitates matrices (all my calculations are elementwise). What I need to do is multiply every value in a column by a certain number; raise every value in a column to a certain power; multiply each value by the corresponding value in another column; and obtain a sum of all the values in a column. Can all of these be done with the Data step?

Super User
Super User
Posts: 5,975

Re: Populating a new column in an existing dataset


dvtarasov wrote:

I, too, am wondering whether IML is how this is done, since I'm not really doing anything that necessitates matrices (all my calculations are elementwise). What I need to do is multiply every value in a column by a certain number; raise every value in a column to a certain power; multiply each value by the corresponding value in another column; and obtain a sum of all the values in a column. Can all of these be done with the Data step?


So let's place the two constants into macro variables to make it easier to modify later.

%let coeffient=1.5;
%let power=3 ;

Now it sounds like you want to do something like this:

data want ;
  set have ;
  new_var = &coefficient*VAR1 * (VAR2 ** &power);
run;

proc means data=want sum;
  var new_var ;
run;
Contributor
Posts: 24

Re: Populating a new column in an existing dataset

Thanks! I've experimented with this way of doing things and it seems to work for me. I am wondering, however: is it possible, for the OUTPUT statement of PROC MEANS, to produce, not a new dataset, but just a new variable that stores the sum?
Post a Question
Discussion Stats
  • 17 replies
  • 960 views
  • 0 likes
  • 4 in conversation