Remove series of values using arrays

anurak · Posted 03-24-2020 06:01 PM

In the scenario where in my SAS dataset have series of same consecutive(count > 2) column values on the same row, I want to keep first and the last column values and replace with blank the ones between first and last. . Is there a way to do this using ARRAYS?

example

id var1 var2 var3 var4 var5 var6

xx 0 1 1 1 1 0

Desired Output

id var1 var2 var3 var4 var5 var6

xx 0 1 1 0

PeterClemmensen · Posted 03-24-2020 06:10 PM

Here is a simple approach.

data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
;

data want(drop=i c);
   set have;
   array var {*} var1-var6;
   do i=1 to dim(var);
      if      var[i]=1 then c+1;
      else if var[i]=0 then c=0;
      if c>2 then var[i-1]=.;
   end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

ballardw · Posted 03-25-2020 11:07 AM

I might suggest use of a special missing such as .R to indicate the values you removed instead of the generic missing. Then if you have other values that are missing for another reason such as not collected or calculated and not available you can tell latter why specific values are missing.

There are 27 special missing values available: .A to .Z and ._

So you could keep track of a number of different reasons. The special missing are excluded from calculations unless using an option that indicates using the missing values.

PGStats · Posted 03-24-2020 06:34 PM

Please try:

data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;

data want;
set have;
array x var1-var6;
do i = 1 to dim(x) - 2;
	if not missing(x{i}) then do j = i + 2 to dim(x);
		if x{i} = x{j} and x{j-1} = x{j} then call missing(x{j-1});
		else leave;
		end;
	end;
drop i j;
run;

proc print data=want noobs; run;

id 	var1 	var2 	var3 	var4 	var5 	var6
xx 	0 	1 	. 	. 	1 	0
xy 	1 	2 	1 	. 	1 	0
xz 	0 	. 	. 	. 	. 	0

PG

novinosrin · Posted 03-24-2020 07:47 PM

Hi @anurak Prodigy Genius stats aka @PGStats code made me think to find an alternative. See if this helps

data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;

data want;
 set have;
 array x var1-var6;
 _c=1;
 do _n_=2 to dim(x);
  if x(_n_)=x(_n_-1) then _c=sum(_c,1);
  else _c=1;
  if _c>2 then x(_n_-1)=.;
 end;
 drop _c;
run;

andreas_lds · Posted 03-25-2020 02:12 AM

@anurak wrote:

In the scenario where in my SAS dataset have series of same consecutive(count > 2) column values on the same row, I want to keep first and the last column values and replace with blank the ones between first and last. . Is there a way to do this using ARRAYS?

Please note that numeric variables can't be blank, missing numeric variables contain a single dot. You can change the way missing numeric values are displayed by using

options missing= " ";

I recommend to use this only for data export, to avoid confusion.

Ksharp · Posted 03-25-2020 06:59 AM

Assuming there is no missing value in original data.

data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;
data want;
 set have;
 array x{*} var:;
 do i=2 to dim(x)-1;
  if x{i-1} in (. 1) and x{i+1} in (. 1) and x{i}=1 then x{i}=.;
 end;
 drop i;
run;
proc print;run;

Remove series of values using arrays

Re: Remove series of values using arrays

Re: Remove series of values using arrays

Re: Remove series of values using arrays

Re: Remove series of values using arrays

Re: Remove series of values using arrays

Re: Remove series of values using arrays

SAS Innovate 2025: Call for Content

Classroom Training Available!