DATA Step, Macro, Functions and more

convert all missing observations to zero by group

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

convert all missing observations to zero by group

 

  Hello SAS users!

  Though this may be an easy task to those who have expertise with programming, I am struggling with this.

 

 So I got data 'have' below.


data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

 

 

And I want to convert 'have' to 'want' like below.

 

data want;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 0 1 1 3 2
1 3 5 9 0 3 9 6
1 4 5 8 0 7 3 2
1 5 5 8 2 0 6
1 6 5 0 9 6 3 9
2 1 3 0 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 0 5 . .
2 4 3 13 7 9 . .
;
run;

 

 

 

So when you look at data set 'have',

id=1 have 5 not-all-missing numeric variables x1, x2, x3, x4, x5

and id=2 have only 3 not-all-missing numeric variables x1, x2, x3.

 

That means id=1 have 5 variables and id=2 have 3 variables.

For id=1, I want to convert missing values in x1, x2, x3, x4, x5 to zero 

and For id=2, I want to convert missing values in x1, x2, x3 to zero but remains x4, x5 all missing.

 

I got thousands of id, so the array of variable x could be more than 100 (x1~x105)

How do I perform this EFFICIENTLY?

 

Any help will be appreciated.

 

 

 

 


Accepted Solutions
Solution
a week ago
PROC Star
Posts: 1,400

Re: convert all missing observations to zero by group

There is definitely a more efficient way, but here is a transpose/transpose back approach

 

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

proc transpose data=have out=temp1;
   by id permno N;
run;

proc sql;
   create table temp2 as
   select *
         ,case when count(col1)=0 then 0 else 1 end as flag
   from temp1
   group by id, _NAME_;
quit;

data temp3;
   set temp2;
   if col1=. & flag=1 then col1=0;
   drop flag;
run;

proc sort data=temp3;
   by id permno;
run;

proc transpose data=temp3 out=want(drop=_NAME_);
   by id permno;
   id _NAME_;
   var col1;
run;

View solution in original post


All Replies
Super User
Super User
Posts: 9,840

Re: convert all missing observations to zero by group

Something like:

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

data want;
set have;
array val x:;
do over val;
if val=. then val=0;
end;
run;

Would actually change the value.  However you might not need to.  If you just want missings to be displayed as zero then you could use:

options missing="0";

Note this afffects the whole session.

You might want to be careful doing that anyways, 0 is not missing, therefore you are imputing data - this can affect calculations such as proc means.

Contributor
Posts: 32

Re: convert all missing observations to zero by group

 

Dear RW9

What I want to do is to remain missing values of id=2's x4, x5 

not to change all missing values in the dataset to zero

Super User
Super User
Posts: 9,840

Re: convert all missing observations to zero by group

Ah, ok then a couple of if statements are need:

data have;
 input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

data want (drop=flag i);
 set have;
 array x(5);
 flag=0;
 do i=1 to 5;
   if flag=0 and x{i}=. then do;
     x{i}=0;
     flag=1;
   end;
   else if flag=1 and x{i}=. then x{i}=0;
   else if flag=1 and x{i} ne . then leave;
 end;
run; 
Contributor
Posts: 32

Re: convert all missing observations to zero by group

Thanks for your help
But I still find zeros in x4 and x5 in id=2, permno=2, 4
Could you give me an advice regarding your code?
Super User
Super User
Posts: 9,840

Re: convert all missing observations to zero by group

Ok, sorry, didn't see that.  Just work from the back to start:

data have;
 input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

data want (drop=flag i);
 set have;
 array x(5);
 flag=0;
 do i=5 to 1 by -1;
   if x{i} ne . then flag=1;
   if flag=1 and x{i}=. then x{i}=0;
 end;
run; 
PROC Star
Posts: 1,400

Re: convert all missing observations to zero by group

[ Edited ]

@RW9, I politely disagree that this would work in the general case. What if the data looks like this?

 

data have;
 input id permno N x1 x2 x3 x4 x5 x6;
cards;
1 1 5 4 3 2 2 1 1
1 2 5 . 1 1 3 2 2
1 3 5 9 . 3 9 6 3
1 4 5 8 . 7 3 2 4
1 5 5 8 2 . 6 5
1 6 5 . 9 6 3 9 6
2 1 3 . 13 8 . . 1
2 2 3 23 5 8 . . 2
2 3 3 11 . 5 . . 3
2 4 3 13 7 9 . . 4
;
run;

Then your code would convert the missing values for ID=2 in the variables x4 and x5 even though they are all missing in the by group and should remain so.

 

That is of course, if I understand your needs correctly @jkim197? Smiley Happy

Super User
Super User
Posts: 9,840

Re: convert all missing observations to zero by group

I don't disagree that taking different data, or different logic then my code would not work.  TBH its so hard to tell what the OP is trying to do...

First step would be to ascertain why there is a transposed format in the first place, and why there are missings in a given set of data, which would most likely be simpler to fix earlier in the process.  However we have only what is posted here.

Solution
a week ago
PROC Star
Posts: 1,400

Re: convert all missing observations to zero by group

There is definitely a more efficient way, but here is a transpose/transpose back approach

 

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

proc transpose data=have out=temp1;
   by id permno N;
run;

proc sql;
   create table temp2 as
   select *
         ,case when count(col1)=0 then 0 else 1 end as flag
   from temp1
   group by id, _NAME_;
quit;

data temp3;
   set temp2;
   if col1=. & flag=1 then col1=0;
   drop flag;
run;

proc sort data=temp3;
   by id permno;
run;

proc transpose data=temp3 out=want(drop=_NAME_);
   by id permno;
   id _NAME_;
   var col1;
run;
Super User
Posts: 10,849

Re: convert all missing observations to zero by group

If you have many variable to impute , Make a macro .

 

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6 .
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;
proc sql;
create table temp as
 select *,n(x1)=0 as _x1,n(x2)=0 as _x2,n(x3)=0 as _x3,n(x4)=0 as _x4,n(x5)=0 as _x5
  from have
   group by id
    order by id,permno;
quit;
data want;
 set temp;
 array x{*} x1-x5;
 array _x{*} _x1-_x5;
 do i=1 to dim(x);
  if missing(x{i}) and _x{i}=0 then x{i}=0;
 end;
 drop i _:;
run;
Super User
Posts: 2,051

Re: convert all missing observations to zero by group

HI @jkim197 just out of curiosity and if you have time, does the variable N in your HAVE denote the number of variables in starting/ascending sequence  in the array group X1-Xn series to check and impute? as N is 5 and 3 for ID1 and ID2 respectively?

 

If that is the case the following should do? nevertheless ,I am wary when somebody brilliant like Ksharp didn't offer this,that makes me think I am the one whose understanding is wrong. Can you clarify plz?

data want;
set have;
by id;
array t(*) x:;
do _n_=1 to n;
if missing(t(_n_)) then t(_n_)=0;
end;
run;
Contributor
Posts: 32

Re: convert all missing observations to zero by group

Posted in reply to novinosrin

Yes you are right.

your code works great and fast.. Thank you

Super User
Posts: 2,051

Re: convert all missing observations to zero by group

@jkim197  That made me chuckle.  Like @hashman pointed out, the description of the question explaining the contents of the dataset matters most in the first place for which a robust solution is required. I am glad, my peek into your dataset and the guesses or in other words intuition turned out to be correct. 

Last but not the least, irony in lack of clarity gave us a gift to receive the privilege of yet another beautiful APP class of PD that I copied to my notes Smiley Happy

Frequent Contributor
Posts: 112

Re: convert all missing observations to zero by group

[ Edited ]

@jkim197:

 

First, please do not include irrelevant variables (such as Permno and N, in this case) in your sample input/output: They muddy up the picture and distract from the task at hand. Your sample input had better look like this:

data have ;      
  input id x1-x5 
  cards ;        
1   4  3  2  2  1
1   .  1  1  3  2
1   9  .  3  9  6
1   8  .  7  3  2
1   8  2  .  6   
1   .  9  6  3  9
2   . 13  8  .  .
2  23  5  8  .  .
2  11  .  5  .  .
2  13  7  9  .  .
;                
run ;            

Second, you want an efficient solution. That means, to begin with, that you want to minimize the number of passes through the input data. In this case, you need at minimum 2 passes since you cannot do what you need to any BY group without pre-processing it beforehand. But any task involving the logic (1) read a BY group to discover something and (2) read it again to process it based on the discovery is a typical work for what is known as the double DoW-loop, whose extra advantage lies in processing the whole shebang in a single step.

 

Hence, below: (1) every BY group is read first to determine the maximum number of non-missing values in any row _NX, then (2) this very BY group is scanned again to set the nulls to zeros in each row only for the first _NX leftmost X-variables.

data want (drop = _:) ; 
do _ng = 1 by 1 until (last.id) ;
set have ;
by ID ;
_nx = _nx <> N (of x:) ;
end ;
do _ng = 1 to _ng ;
set have ;
array x [*] x: ;
do _i = 1 to _nx ;
if nmiss (x[_i]) then x[_i] = 0 ;
end ;
output ;
end ;
run ;

Even if you have 105 X-variables, scanning through the array is a pretty quick affair. However, if you want to avoid that in favor of using a potentially faster string search/replace function, you can do as below. Note that it is not to suggest that you code this way: APP tricks like this are reserved to extreme circumstances where performance gains are more than well worth the extra complexity; I just want to indicate that the possibility is there (plus perhaps it may pique someone's curiosity to learn more about the APP functions).

data want (drop = _:) ;                                             
  do _ng = 1 by 1 until (last.id) ;                                 
    set have ;                                                      
    by ID ;                                                         
    _nx = _nx <> N (of x:) ;                                        
  end ;                                                             
  retain _a _m _z ;                                                 
  if _n_ = 1 then do ;                                              
    _a  = addrlong (x1) ;                                           
    _m  = put (. , rb8.) ;                                          
    _z  = put (0 , rb8.) ;                                          
  end ;                                                             
  _nx = 8 * _nx ;                                                   
  do _ng = 1 to _ng ;                                               
    set have ;                                                      
    call pokelong (tranwrd (peekclong (_a, _nx), _m, _z), _a, _nx) ;
    output ;                                                        
  end ;                                                             
run ;                                                               

FWIW

Paul D.

  

Senior User
Posts: 1

Re: convert all missing observations to zero by group

I propose this solution:

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 . 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . 10
;
run;

data want;
set have;
array X(*) x:;
do i=1 to dim(x);
   if x(i)=. then do;
   rc=i;
   do j=rc to dim(x);
   if x(j) ne . then do;
   x(i)=0;
   leave;
 
   end;  
  end;
 end;
end;
run; 
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 296 views
  • 10 likes
  • 7 in conversation