DATA Step, Macro, Functions and more

Subset data based on the contents of group dummy variables

Accepted Solution Solved
Reply
Super Contributor
Posts: 383
Accepted Solution

Subset data based on the contents of group dummy variables

[ Edited ]

I'm trying to separate the data 'have' to

 

output_1 if any dummy variable in z1-z7 series contains '1'

else

output_0 if  z1-z7 series are all '0'

 

as shown below.

 

However, my current code using array results are wrong as you can see from wrong_0 and wrong_1 resulting datasets.

 

Any hints? what am I doing wrong here? I placed 'end' before and after output with no success. Thanks in advance!

 

data have;
input id $ z1 z2 z3 z4 z5 z6 z7;
cards;
a 0 0 0 0 0 1 1
b 0 0 0 0 0 0 0
c 1 0 0 0 0 0 0
d 0 1 1 0 0 0 0
e 0 0 0 0 0 0 1
f 0 0 0 0 0 0 0
; 

data output_0;
input id $ z1 z2 z3 z4 z5 z6 z7;
cards;
b 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0
;

data output_1;
input id $ z1 z2 z3 z4 z5 z6 z7;
cards;
a 0 0 0 0 0 1 1
c 1 0 0 0 0 0 0
d 0 1 1 0 0 0 0
e 0 0 0 0 0 0 1
;

data wrong_1; set have; /*output data has N=6 rows instead 4*/
array m z:; 
 do over m;
if m in ('1') then output; 
end; 
run;

data wrong_0; set have; /*output data has N=36 rows instead 2*/
array m z:; 
 do over m;
if m in ('0') then output; 
end; 
run;


 


Accepted Solutions
Solution
‎07-16-2018 11:09 AM
Super User
Posts: 2,068

Re: Subset data based on the contents of group dummy variables

[ Edited ]

if you are using array, your syntax is the issue

 

     if sum(of m)>1

 

should be

 

     if sum(of m(*))>1

 

and if you are using in operator

 

data wrong_1 wrong_0;
set have;
array m z1-z7;
if 1 in m then output wrong_1;
else output wrong_0;
run;

View solution in original post


All Replies
Super User
Posts: 2,068

Re: Subset data based on the contents of group dummy variables

data have;
input id $ z1 z2 z3 z4 z5 z6 z7;
cards;
a 0 0 0 0 0 1 1
b 0 0 0 0 0 0 0
c 1 0 0 0 0 0 0
d 0 1 1 0 0 0 0
e 0 0 0 0 0 0 1
f 0 0 0 0 0 0 0
; 

data output_0  output_1;
set have;
if max(of z1-z7)>0 then output output_0;
else output output_1;
run;
Super User
Posts: 2,068

Re: Subset data based on the contents of group dummy variables

Posted in reply to novinosrin

using var list z:

 

data output_0  output_1;
set have;
if max(of z:)>0 then output output_0;
else output output_1;
run;
Respected Advisor
Posts: 3,288

Re: Subset data based on the contents of group dummy variables

[ Edited ]

Well this is certainly confusing

 

output_0 if any dummy variable in z1-z7 series contains '1'

else

output_1 if  z1-z7 series are all '0'

 

because your code does the exact opposite.

 

However, this should get the data separated properly, except for the confusion stated above.

 

data wrong_1 wrong_0;
    set have;
    if sum(of z:)>1 then output wrong_1;
    else output wrong_0;
run;
--
Paige Miller
Super Contributor
Posts: 383

Re: Subset data based on the contents of group dummy variables

Posted in reply to PaigeMiller

@PaigeMiller

Sorry for a confusion, I will correct that. Below is the error I got.

 

ERROR: Array subscript out of range at line 377 column 15.

374  data p.wrong_1 p.wrong_0;
375      set p.have;
376      array m z1-z7;
377      if sum(of m)>1 then output p.wrong_1;
378      else output p.wrong_0;
379  run;

ERROR: Array subscript out of range at line 377 column 15.
id=a z1=0 z2=0 z3=0 z4=0 z5=0 z6=1 z7=1 _I_=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set P.HAVE.
WARNING: The data set P.WRONG_1 may be incomplete.  When this step was stopped there were 0
         observations and 8 variables.
WARNING: Data set P.WRONG_1 was not replaced because this step was stopped.
WARNING: The data set P.WRONG_0 may be incomplete.  When this step was stopped there were 0
         observations and 8 variables.
WARNING: Data set P.WRONG_0 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.42 seconds
      cpu time            0.01 seconds

Solution
‎07-16-2018 11:09 AM
Super User
Posts: 2,068

Re: Subset data based on the contents of group dummy variables

[ Edited ]

if you are using array, your syntax is the issue

 

     if sum(of m)>1

 

should be

 

     if sum(of m(*))>1

 

and if you are using in operator

 

data wrong_1 wrong_0;
set have;
array m z1-z7;
if 1 in m then output wrong_1;
else output wrong_0;
run;

Respected Advisor
Posts: 3,288

Re: Subset data based on the contents of group dummy variables

Posted in reply to novinosrin

ERROR: Array subscript out of range at line 377 column 15.

374  data p.wrong_1 p.wrong_0;
375      set p.have;
376      array m z1-z7;
377      if sum(of m)>1 then output p.wrong_1;
378      else output p.wrong_0;
379  run;

 

I showed simpler code that you have changed, causing the error. Run my exact code without changes.

--
Paige Miller
Super Contributor
Posts: 383

Re: Subset data based on the contents of group dummy variables

Posted in reply to PaigeMiller
I edited the question. Thanks for pointing out.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 96 views
  • 1 like
  • 3 in conversation