Hi All,
An hour and a half on Google Search didn't get me anywhere with this basic question
on referencing data elements:
I want to modify a variable in a sas dataset based on the values
of another variable.
Lets call the vars myA and myB which are both numeric and are 5000x1 cloumns.
The following program shows what i need but doesn't do the job.
Data new_data;
set old_data;
Do i=2 to 5000;
If MyB(i)=0 then MyA(i)=MyA(i-1);
end;
drop i;
run;
I appreciate any help!
Thanks Lawrence for pointing out the lag function, LinLin for your timely response and art297 for the detailed procedure.
art297's answer would probably help but since the actual number of
records in my dataset was +72000 I wasn't sure if it was the best thing to do. I also needed something more
straightforward.
After a long search on the web, I figured out how
to create a matrix from the data set, do my operations and
transform it back into a data set which does the thing i needed to do.
Here's my code:
Proc IML;
use old_dataset var{myB, myA};
read all into my_matrix;
Do i=2 to 72676;
If my_matrix[i,1]=0 then my_matrix[i,2]=my_matrix[i-1,2];
end;
Create new_dataset from my_matrix[colname={'myB' 'myA'}];
append from my_matrix;
Quit;
the above code uses IML. To check if you have it as part of your
SAS package run the code below:
* determine what SAS compoentes are installed;
Proc Setinit;
Run;
this only works if "5000x1 cloumns" means 5000 records.
Data new_data;
set old_data;
If MyB=0 then MyA=MyA-1;
run;
Thanks for your reply, but it doesn't seem to work. and yes i meant 5000 records.
The data looks like this:
MyA MyB
------- ------
4 5
2 5
. 0
. 0
. 0
1 3
1 2
1 2
The operation is supposed to replace the missing values with 2.
I also tried if MyA=. then MyA=MyA-1;
It didn't work either.
Moein,
This is a classic problem for using the LAG function (available in the DATA step, but not in SQL). Search for
lag function
in Documentaion on support.sas.com.
It goes something like this (untested):
DATA new;
SET old_data;
prevA=lag(mya);
if myb=0 then mya=preva;
RUN;
There are a lot of nuances to the LAG statement, so be sure to read all the documention.
Doc Muhlbaier
Duke
EDIT: Opps. I missed one of the nuances. This will get the first missing, but not the subsequent ones. You will probably need to conditionally RETAIN some values to get it to work.
Message was edited by: Lawrence Muhlbaier
data old;
input a myb;
cards;
4 5
2 5
. 0
. 0
. 0
1 3
1 2
1 2
;
run;
data new(drop=b:) ;
set old;
b1=lag1(a);b2=lag2(a);b3=lag3(a) ;
if a eq . and b1 ne . then a=b1;
else if a eq . and b1 eq . and b2 ne . then a=b2;
else if a eq . and b1 eq . and b2 eq . and b3 ne . then a=b3;
run;
Linlin, I didn't use that approach as we don't know the maximum number of consecutive missing values in the file.
Thank you Art!
Since you only have 5,000 records, I would use a rather brute force approach:
/*use proc summary to transpose the records to one long record*/
proc summary nway data=old_data missing;
output out = transposed (drop=_type_ _freq_)
idgroup(out[5000](MyA MyB)=);
run;
data new_data (keep=MyA MyB);
set transposed;
/*put all of your data into 2 arrays*/
array aMyA(*) MyA:;
array aMyB(*) MyB:;
/*do all of your computations*/
do i=2 to dim(aMyA);
if aMyB(i)=0 then aMyA(i)=aMyA(i-1);
end;
/*output your new records*/
do i=1 to dim(aMyA);
MyA=aMyA(i);
MyB=aMyB(i);
output;
end;
run;
Thanks Lawrence for pointing out the lag function, LinLin for your timely response and art297 for the detailed procedure.
art297's answer would probably help but since the actual number of
records in my dataset was +72000 I wasn't sure if it was the best thing to do. I also needed something more
straightforward.
After a long search on the web, I figured out how
to create a matrix from the data set, do my operations and
transform it back into a data set which does the thing i needed to do.
Here's my code:
Proc IML;
use old_dataset var{myB, myA};
read all into my_matrix;
Do i=2 to 72676;
If my_matrix[i,1]=0 then my_matrix[i,2]=my_matrix[i-1,2];
end;
Create new_dataset from my_matrix[colname={'myB' 'myA'}];
append from my_matrix;
Quit;
the above code uses IML. To check if you have it as part of your
SAS package run the code below:
* determine what SAS compoentes are installed;
Proc Setinit;
Run;
Definitely go with proc IML. It will do the task in a fraction of a second. The method I proposed wouldn't work anyhow, as it can only handle 100 records. Proc transpose, in itself, would have taken over a second just to do one transposition, and two would be needed.
I have to review your code (since I'm not sufficiently familiar with IML), but it definitely appears to be a nice way of accomplishing the task.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.