BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jkim197
Obsidian | Level 7

 

  Hello SAS users!

  Though this may be an easy task to those who have expertise with programming, I am struggling with this.

 

 So I got data 'have' below.


data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

 

 

And I want to convert 'have' to 'want' like below.

 

data want;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 0 1 1 3 2
1 3 5 9 0 3 9 6
1 4 5 8 0 7 3 2
1 5 5 8 2 0 6
1 6 5 0 9 6 3 9
2 1 3 0 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 0 5 . .
2 4 3 13 7 9 . .
;
run;

 

 

 

So when you look at data set 'have',

id=1 have 5 not-all-missing numeric variables x1, x2, x3, x4, x5

and id=2 have only 3 not-all-missing numeric variables x1, x2, x3.

 

That means id=1 have 5 variables and id=2 have 3 variables.

For id=1, I want to convert missing values in x1, x2, x3, x4, x5 to zero 

and For id=2, I want to convert missing values in x1, x2, x3 to zero but remains x4, x5 all missing.

 

I got thousands of id, so the array of variable x could be more than 100 (x1~x105)

How do I perform this EFFICIENTLY?

 

Any help will be appreciated.

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

There is definitely a more efficient way, but here is a transpose/transpose back approach

 

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

proc transpose data=have out=temp1;
   by id permno N;
run;

proc sql;
   create table temp2 as
   select *
         ,case when count(col1)=0 then 0 else 1 end as flag
   from temp1
   group by id, _NAME_;
quit;

data temp3;
   set temp2;
   if col1=. & flag=1 then col1=0;
   drop flag;
run;

proc sort data=temp3;
   by id permno;
run;

proc transpose data=temp3 out=want(drop=_NAME_);
   by id permno;
   id _NAME_;
   var col1;
run;

View solution in original post

14 REPLIES 14
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like:

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

data want;
set have;
array val x:;
do over val;
if val=. then val=0;
end;
run;

Would actually change the value.  However you might not need to.  If you just want missings to be displayed as zero then you could use:

options missing="0";

Note this afffects the whole session.

You might want to be careful doing that anyways, 0 is not missing, therefore you are imputing data - this can affect calculations such as proc means.

jkim197
Obsidian | Level 7

 

Dear RW9

What I want to do is to remain missing values of id=2's x4, x5 

not to change all missing values in the dataset to zero

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, ok then a couple of if statements are need:

data have;
 input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

data want (drop=flag i);
 set have;
 array x(5);
 flag=0;
 do i=1 to 5;
   if flag=0 and x{i}=. then do;
     x{i}=0;
     flag=1;
   end;
   else if flag=1 and x{i}=. then x{i}=0;
   else if flag=1 and x{i} ne . then leave;
 end;
run; 
jkim197
Obsidian | Level 7
Thanks for your help
But I still find zeros in x4 and x5 in id=2, permno=2, 4
Could you give me an advice regarding your code?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, sorry, didn't see that.  Just work from the back to start:

data have;
 input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

data want (drop=flag i);
 set have;
 array x(5);
 flag=0;
 do i=5 to 1 by -1;
   if x{i} ne . then flag=1;
   if flag=1 and x{i}=. then x{i}=0;
 end;
run; 
PeterClemmensen
Tourmaline | Level 20

@RW9, I politely disagree that this would work in the general case. What if the data looks like this?

 

data have;
 input id permno N x1 x2 x3 x4 x5 x6;
cards;
1 1 5 4 3 2 2 1 1
1 2 5 . 1 1 3 2 2
1 3 5 9 . 3 9 6 3
1 4 5 8 . 7 3 2 4
1 5 5 8 2 . 6 5
1 6 5 . 9 6 3 9 6
2 1 3 . 13 8 . . 1
2 2 3 23 5 8 . . 2
2 3 3 11 . 5 . . 3
2 4 3 13 7 9 . . 4
;
run;

Then your code would convert the missing values for ID=2 in the variables x4 and x5 even though they are all missing in the by group and should remain so.

 

That is of course, if I understand your needs correctly @jkim197? 🙂

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I don't disagree that taking different data, or different logic then my code would not work.  TBH its so hard to tell what the OP is trying to do...

First step would be to ascertain why there is a transposed format in the first place, and why there are missings in a given set of data, which would most likely be simpler to fix earlier in the process.  However we have only what is posted here.

PeterClemmensen
Tourmaline | Level 20

There is definitely a more efficient way, but here is a transpose/transpose back approach

 

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;

proc transpose data=have out=temp1;
   by id permno N;
run;

proc sql;
   create table temp2 as
   select *
         ,case when count(col1)=0 then 0 else 1 end as flag
   from temp1
   group by id, _NAME_;
quit;

data temp3;
   set temp2;
   if col1=. & flag=1 then col1=0;
   drop flag;
run;

proc sort data=temp3;
   by id permno;
run;

proc transpose data=temp3 out=want(drop=_NAME_);
   by id permno;
   id _NAME_;
   var col1;
run;
Ksharp
Super User

If you have many variable to impute , Make a macro .

 

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6 .
1 6 5 . 9 6 3 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . .
;
run;
proc sql;
create table temp as
 select *,n(x1)=0 as _x1,n(x2)=0 as _x2,n(x3)=0 as _x3,n(x4)=0 as _x4,n(x5)=0 as _x5
  from have
   group by id
    order by id,permno;
quit;
data want;
 set temp;
 array x{*} x1-x5;
 array _x{*} _x1-_x5;
 do i=1 to dim(x);
  if missing(x{i}) and _x{i}=0 then x{i}=0;
 end;
 drop i _:;
run;
novinosrin
Tourmaline | Level 20

HI @jkim197 just out of curiosity and if you have time, does the variable N in your HAVE denote the number of variables in starting/ascending sequence  in the array group X1-Xn series to check and impute? as N is 5 and 3 for ID1 and ID2 respectively?

 

If that is the case the following should do? nevertheless ,I am wary when somebody brilliant like Ksharp didn't offer this,that makes me think I am the one whose understanding is wrong. Can you clarify plz?

data want;
set have;
by id;
array t(*) x:;
do _n_=1 to n;
if missing(t(_n_)) then t(_n_)=0;
end;
run;
jkim197
Obsidian | Level 7

Yes you are right.

your code works great and fast.. Thank you

novinosrin
Tourmaline | Level 20

@jkim197  That made me chuckle.  Like @hashman pointed out, the description of the question explaining the contents of the dataset matters most in the first place for which a robust solution is required. I am glad, my peek into your dataset and the guesses or in other words intuition turned out to be correct. 

Last but not the least, irony in lack of clarity gave us a gift to receive the privilege of yet another beautiful APP class of PD that I copied to my notes 🙂

hashman
Ammonite | Level 13

@jkim197:

 

First, please do not include irrelevant variables (such as Permno and N, in this case) in your sample input/output: They muddy up the picture and distract from the task at hand. Your sample input had better look like this:

data have ;      
  input id x1-x5 
  cards ;        
1   4  3  2  2  1
1   .  1  1  3  2
1   9  .  3  9  6
1   8  .  7  3  2
1   8  2  .  6   
1   .  9  6  3  9
2   . 13  8  .  .
2  23  5  8  .  .
2  11  .  5  .  .
2  13  7  9  .  .
;                
run ;            

Second, you want an efficient solution. That means, to begin with, that you want to minimize the number of passes through the input data. In this case, you need at minimum 2 passes since you cannot do what you need to any BY group without pre-processing it beforehand. But any task involving the logic (1) read a BY group to discover something and (2) read it again to process it based on the discovery is a typical work for what is known as the double DoW-loop, whose extra advantage lies in processing the whole shebang in a single step.

 

Hence, below: (1) every BY group is read first to determine the maximum number of non-missing values in any row _NX, then (2) this very BY group is scanned again to set the nulls to zeros in each row only for the first _NX leftmost X-variables.

data want (drop = _:) ; 
do _ng = 1 by 1 until (last.id) ;
set have ;
by ID ;
_nx = _nx <> N (of x:) ;
end ;
do _ng = 1 to _ng ;
set have ;
array x [*] x: ;
do _i = 1 to _nx ;
if nmiss (x[_i]) then x[_i] = 0 ;
end ;
output ;
end ;
run ;

Even if you have 105 X-variables, scanning through the array is a pretty quick affair. However, if you want to avoid that in favor of using a potentially faster string search/replace function, you can do as below. Note that it is not to suggest that you code this way: APP tricks like this are reserved to extreme circumstances where performance gains are more than well worth the extra complexity; I just want to indicate that the possibility is there (plus perhaps it may pique someone's curiosity to learn more about the APP functions).

data want (drop = _:) ;                                             
  do _ng = 1 by 1 until (last.id) ;                                 
    set have ;                                                      
    by ID ;                                                         
    _nx = _nx <> N (of x:) ;                                        
  end ;                                                             
  retain _a _m _z ;                                                 
  if _n_ = 1 then do ;                                              
    _a  = addrlong (x1) ;                                           
    _m  = put (. , rb8.) ;                                          
    _z  = put (0 , rb8.) ;                                          
  end ;                                                             
  _nx = 8 * _nx ;                                                   
  do _ng = 1 to _ng ;                                               
    set have ;                                                      
    call pokelong (tranwrd (peekclong (_a, _nx), _m, _z), _a, _nx) ;
    output ;                                                        
  end ;                                                             
run ;                                                               

FWIW

Paul D.

  

mansour_ib_sas
Pyrite | Level 9

I propose this solution:

data have;
input id permno N x1 x2 x3 x4 x5;
cards;
1 1 5 4 3 2 2 1
1 2 5 . 1 1 3 2
1 3 5 9 . 3 9 6
1 4 5 8 . 7 3 2
1 5 5 8 2 . 6
1 6 5 . 9 6 . 9
2 1 3 . 13 8 . .
2 2 3 23 5 8 . .
2 3 3 11 . 5 . .
2 4 3 13 7 9 . 10
;
run;

data want;
set have;
array X(*) x:;
do i=1 to dim(x);
   if x(i)=. then do;
   rc=i;
   do j=rc to dim(x);
   if x(j) ne . then do;
   x(i)=0;
   leave;
 
   end;  
  end;
 end;
end;
run; 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 3113 views
  • 10 likes
  • 7 in conversation