## Subset data based on the contents of group dummy variables

Solved
Super Contributor
Posts: 383

# Subset data based on the contents of group dummy variables

[ Edited ]

I'm trying to separate the data 'have' to

output_1 if any dummy variable in z1-z7 series contains '1'

else

output_0 if  z1-z7 series are all '0'

as shown below.

However, my current code using array results are wrong as you can see from wrong_0 and wrong_1 resulting datasets.

Any hints? what am I doing wrong here? I placed 'end' before and after output with no success. Thanks in advance!

``````data have;
input id \$ z1 z2 z3 z4 z5 z6 z7;
cards;
a 0 0 0 0 0 1 1
b 0 0 0 0 0 0 0
c 1 0 0 0 0 0 0
d 0 1 1 0 0 0 0
e 0 0 0 0 0 0 1
f 0 0 0 0 0 0 0
;

data output_0;
input id \$ z1 z2 z3 z4 z5 z6 z7;
cards;
b 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0
;

data output_1;
input id \$ z1 z2 z3 z4 z5 z6 z7;
cards;
a 0 0 0 0 0 1 1
c 1 0 0 0 0 0 0
d 0 1 1 0 0 0 0
e 0 0 0 0 0 0 1
;

data wrong_1; set have; /*output data has N=6 rows instead 4*/
array m z:;
do over m;
if m in ('1') then output;
end;
run;

data wrong_0; set have; /*output data has N=36 rows instead 2*/
array m z:;
do over m;
if m in ('0') then output;
end;
run;

``````

Accepted Solutions
Solution
‎07-16-2018 11:09 AM
Super User
Posts: 2,068

## Re: Subset data based on the contents of group dummy variables

[ Edited ]

if you are using array, your syntax is the issue

if sum(of m)>1

should be

if sum(of m(*))>1

and if you are using in operator

data wrong_1 wrong_0;
set have;
array m z1-z7;
if 1 in m then output wrong_1;
else output wrong_0;
run;

All Replies
Super User
Posts: 2,068

## Re: Subset data based on the contents of group dummy variables

``````data have;
input id \$ z1 z2 z3 z4 z5 z6 z7;
cards;
a 0 0 0 0 0 1 1
b 0 0 0 0 0 0 0
c 1 0 0 0 0 0 0
d 0 1 1 0 0 0 0
e 0 0 0 0 0 0 1
f 0 0 0 0 0 0 0
;

data output_0  output_1;
set have;
if max(of z1-z7)>0 then output output_0;
else output output_1;
run;``````
Super User
Posts: 2,068

## Re: Subset data based on the contents of group dummy variables

using var list z:

``````data output_0  output_1;
set have;
if max(of z:)>0 then output output_0;
else output output_1;
run;``````
Posts: 3,288

## Re: Subset data based on the contents of group dummy variables

[ Edited ]

Well this is certainly confusing

output_0 if any dummy variable in z1-z7 series contains '1'

else

output_1 if  z1-z7 series are all '0'

because your code does the exact opposite.

However, this should get the data separated properly, except for the confusion stated above.

``````data wrong_1 wrong_0;
set have;
if sum(of z:)>1 then output wrong_1;
else output wrong_0;
run;``````
--
Paige Miller
Super Contributor
Posts: 383

## Re: Subset data based on the contents of group dummy variables

@PaigeMiller

Sorry for a confusion, I will correct that. Below is the error I got.

ERROR: Array subscript out of range at line 377 column 15.

374  data p.wrong_1 p.wrong_0;
375      set p.have;
376      array m z1-z7;
377      if sum(of m)>1 then output p.wrong_1;
378      else output p.wrong_0;
379  run;

ERROR: Array subscript out of range at line 377 column 15.
id=a z1=0 z2=0 z3=0 z4=0 z5=0 z6=1 z7=1 _I_=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set P.HAVE.
WARNING: The data set P.WRONG_1 may be incomplete.  When this step was stopped there were 0
observations and 8 variables.
WARNING: Data set P.WRONG_1 was not replaced because this step was stopped.
WARNING: The data set P.WRONG_0 may be incomplete.  When this step was stopped there were 0
observations and 8 variables.
WARNING: Data set P.WRONG_0 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time           0.42 seconds
cpu time            0.01 seconds

Solution
‎07-16-2018 11:09 AM
Super User
Posts: 2,068

## Re: Subset data based on the contents of group dummy variables

[ Edited ]

if you are using array, your syntax is the issue

if sum(of m)>1

should be

if sum(of m(*))>1

and if you are using in operator

data wrong_1 wrong_0;
set have;
array m z1-z7;
if 1 in m then output wrong_1;
else output wrong_0;
run;

Posts: 3,288

## Re: Subset data based on the contents of group dummy variables

ERROR: Array subscript out of range at line 377 column 15.

374  data p.wrong_1 p.wrong_0;
375      set p.have;
376      array m z1-z7;
377      if sum(of m)>1 then output p.wrong_1;
378      else output p.wrong_0;
379  run;

I showed simpler code that you have changed, causing the error. Run my exact code without changes.

--
Paige Miller
Super Contributor
Posts: 383