In the scenario where in my SAS dataset have series of same consecutive(count > 2) column values on the same row, I want to keep first and the last column values and replace with blank the ones between first and last. . Is there a way to do this using ARRAYS?
example
id var1 var2 var3 var4 var5 var6
xx 0 1 1 1 1 0
Desired Output
id var1 var2 var3 var4 var5 var6
xx 0 1 1 0
Here is a simple approach.
data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
;
data want(drop=i c);
set have;
array var {*} var1-var6;
do i=1 to dim(var);
if var[i]=1 then c+1;
else if var[i]=0 then c=0;
if c>2 then var[i-1]=.;
end;
run;
I might suggest use of a special missing such as .R to indicate the values you removed instead of the generic missing. Then if you have other values that are missing for another reason such as not collected or calculated and not available you can tell latter why specific values are missing.
There are 27 special missing values available: .A to .Z and ._
So you could keep track of a number of different reasons. The special missing are excluded from calculations unless using an option that indicates using the missing values.
Please try:
data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;
data want;
set have;
array x var1-var6;
do i = 1 to dim(x) - 2;
if not missing(x{i}) then do j = i + 2 to dim(x);
if x{i} = x{j} and x{j-1} = x{j} then call missing(x{j-1});
else leave;
end;
end;
drop i j;
run;
proc print data=want noobs; run;
id var1 var2 var3 var4 var5 var6
xx 0 1 . . 1 0
xy 1 2 1 . 1 0
xz 0 . . . . 0
Hi @anurak Prodigy Genius stats aka @PGStats code made me think to find an alternative. See if this helps
data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;
data want;
set have;
array x var1-var6;
_c=1;
do _n_=2 to dim(x);
if x(_n_)=x(_n_-1) then _c=sum(_c,1);
else _c=1;
if _c>2 then x(_n_-1)=.;
end;
drop _c;
run;
@anurak wrote:
In the scenario where in my SAS dataset have series of same consecutive(count > 2) column values on the same row, I want to keep first and the last column values and replace with blank the ones between first and last. . Is there a way to do this using ARRAYS?
Please note that numeric variables can't be blank, missing numeric variables contain a single dot. You can change the way missing numeric values are displayed by using
options missing= " ";
I recommend to use this only for data export, to avoid confusion.
Assuming there is no missing value in original data.
data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;
data want;
set have;
array x{*} var:;
do i=2 to dim(x)-1;
if x{i-1} in (. 1) and x{i+1} in (. 1) and x{i}=1 then x{i}=.;
end;
drop i;
run;
proc print;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.