BookmarkSubscribeRSS Feed
anurak
Fluorite | Level 6

In the scenario where in my SAS dataset have series of same consecutive(count > 2) column values on the same row, I want to keep first and the last column values and replace with blank the ones between first and last. . Is there a way to do this using ARRAYS?

 

example

 

id var1 var2 var3 var4 var5 var6

xx 0     1       1      1      1      0

 

Desired Output

 

id var1 var2 var3 var4 var5 var6

xx 0     1                        1      0

 

 

 

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

Here is a simple approach.

 

data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
;

data want(drop=i c);
   set have;
   array var {*} var1-var6;
   do i=1 to dim(var);
      if      var[i]=1 then c+1;
      else if var[i]=0 then c=0;
      if c>2 then var[i-1]=.;
   end;
run;
ballardw
Super User

I might suggest use of a special missing such as .R to indicate the values you removed instead of the generic missing. Then if you have other values that are missing for another reason such as not collected or calculated and not available you can tell latter why specific values are missing.

There are 27 special missing values available: .A to .Z and ._ 

So you could keep track of a number of different reasons. The special missing are excluded from calculations unless using an option that indicates using the missing values.

PGStats
Opal | Level 21

Please try:

 

data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;

data want;
set have;
array x var1-var6;
do i = 1 to dim(x) - 2;
	if not missing(x{i}) then do j = i + 2 to dim(x);
		if x{i} = x{j} and x{j-1} = x{j} then call missing(x{j-1});
		else leave;
		end;
	end;
drop i j;
run;

proc print data=want noobs; run;

 

id 	var1 	var2 	var3 	var4 	var5 	var6
xx 	0 	1 	. 	. 	1 	0
xy 	1 	2 	1 	. 	1 	0
xz 	0 	. 	. 	. 	. 	0

 

PG
novinosrin
Tourmaline | Level 20

Hi @anurak  Prodigy Genius stats aka @PGStats  code made me think to find an alternative. See if this helps

data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;

data want;
 set have;
 array x var1-var6;
 _c=1;
 do _n_=2 to dim(x);
  if x(_n_)=x(_n_-1) then _c=sum(_c,1);
  else _c=1;
  if _c>2 then x(_n_-1)=.;
 end;
 drop _c;
run;


andreas_lds
Jade | Level 19

@anurak wrote:

In the scenario where in my SAS dataset have series of same consecutive(count > 2) column values on the same row, I want to keep first and the last column values and replace with blank the ones between first and last. . Is there a way to do this using ARRAYS?

 


Please note that numeric variables can't be blank, missing numeric variables contain a single dot. You can change the way missing numeric values are displayed by using

options missing= " ";

I recommend to use this only for data export, to avoid confusion.

Ksharp
Super User

Assuming there is no missing value in original data.

 

data have;
input id $ var1-var6;
datalines;
xx 0 1 1 1 1 0
xy 1 2 1 1 1 0
xz 0 0 0 0 0 0
;
data want;
 set have;
 array x{*} var:;
 do i=2 to dim(x)-1;
  if x{i-1} in (. 1) and x{i+1} in (. 1) and x{i}=1 then x{i}=.;
 end;
 drop i;
run;
proc print;run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 888 views
  • 1 like
  • 7 in conversation