BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sarahfung532
Calcite | Level 5

Hi,

 

I have a big data with hundreds variables, each variable has a missing value that across all the variables diagonally. The data looks like:

ID   X1    X2     X3    X4    X5 ...... Xn

1    .        2.1    3.1    4.1    5.1  ..... 

2   1.2     .        3.3    4.5    5.3

3   1.3     2.2    .        4.4    5.5  

4   1.4     2.3    3.6     .       5.1

5   1.8     2.9    3.0    4.7    .

...

;

How could I move one row up for each variable (one row less in the end), and the result I wanted is:

ID   X1    X2     X3    X4    X5 ...... Xn

1   1.2     2.1    3.1    4.1   5.1  ..... 

2   1.3     2.2    3.3    4.5   5.3

3   1.4     2.3    3.6    4.4   5.5  

4   1.8     2.9    3.0    4.7   5.1

...

;

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi again @sarahfung532  I am not sure how rolling up leading to exluding ID makes sense if you may clarify, nonetheless here is an approach

 



data have;
input ID   X1    X2     X3    X4    X5;
cards;
1    .        2.1    3.1    4.1    5.1 
2   1.2     .        3.3    4.5    5.3
3   1.3     2.2    .        4.4    5.5  
4   1.4     2.3    3.6     .       5.1
5   1.8     2.9    3.0    4.7    .
;

data want;
 do _n_=1 by 1 until(z);
  set have end=z;
  array t(5,5) _temporary_;
  array u x1-x5;
  do over u;
   t(_n_,_i_)=u;
  end;
 end;
 do _n_=1 to 5;
  do _i_=1 to dim(t,1)-1;
   if t(_i_,_n_)=. then do;
    t(_i_,_n_)=t(_i_+1,_n_);
    t(_i_+1,_n_)=.;
   end;
  end;
 end;
 do _n_=1 to dim(t,1)-1;
  do over u;
   u=t(_n_,_i_);
  end;
  output;
 end;
 drop id;
run;
/*And if you want the ID rolled up as it seems your output exludes ID5,
the following does it*/
data want_with_ID;
 do _n_=1 by 1 until(z);
  set have end=z;
  array t(5,5) _temporary_ ;
  array u x1-x5;
  array v(5) _temporary_;
  v(_n_)=id;
  do over u;
   t(_n_,_i_)=u;
  end;
 end;
 do _n_=1 to 5;
  do _i_=1 to dim(t,1)-1;
   if t(_i_,_n_)=. then do;
    t(_i_,_n_)=t(_i_+1,_n_);
    t(_i_+1,_n_)=.;
   end;
  end;
 end;
 do _n_=1 to dim(t,1)-1;
  id=v(_n_);
  do over u;
   u=t(_n_,_i_);
  end;
  output;
 end;
run;

And if your ID is char, please make the V array, a temp array with a $ length like

array v(5)$32 _temporary_;

View solution in original post

12 REPLIES 12
Astounding
PROC Star

This isn't very elegant, but it will work:

 

data want;
   set have (keep=x1 where=(x1 > .));
   set have (keep=x2 where=(x2 > .));
   ...
   set have (keep=xn where=(xn > .));
run;
sarahfung532
Calcite | Level 5

Thank you for your quick response! It would work for small data set. But my data have 10,000 observations and over 800 same type of variables (x1 to x800), so the "set" statement won't be so practical in here ... 

novinosrin
Tourmaline | Level 20

HI @sarahfung532  Is your ID numeric? If yes, i will take a shot a with multi dim array. If not, I have to think of something different

novinosrin
Tourmaline | Level 20

Hi again @sarahfung532  I am not sure how rolling up leading to exluding ID makes sense if you may clarify, nonetheless here is an approach

 



data have;
input ID   X1    X2     X3    X4    X5;
cards;
1    .        2.1    3.1    4.1    5.1 
2   1.2     .        3.3    4.5    5.3
3   1.3     2.2    .        4.4    5.5  
4   1.4     2.3    3.6     .       5.1
5   1.8     2.9    3.0    4.7    .
;

data want;
 do _n_=1 by 1 until(z);
  set have end=z;
  array t(5,5) _temporary_;
  array u x1-x5;
  do over u;
   t(_n_,_i_)=u;
  end;
 end;
 do _n_=1 to 5;
  do _i_=1 to dim(t,1)-1;
   if t(_i_,_n_)=. then do;
    t(_i_,_n_)=t(_i_+1,_n_);
    t(_i_+1,_n_)=.;
   end;
  end;
 end;
 do _n_=1 to dim(t,1)-1;
  do over u;
   u=t(_n_,_i_);
  end;
  output;
 end;
 drop id;
run;
/*And if you want the ID rolled up as it seems your output exludes ID5,
the following does it*/
data want_with_ID;
 do _n_=1 by 1 until(z);
  set have end=z;
  array t(5,5) _temporary_ ;
  array u x1-x5;
  array v(5) _temporary_;
  v(_n_)=id;
  do over u;
   t(_n_,_i_)=u;
  end;
 end;
 do _n_=1 to 5;
  do _i_=1 to dim(t,1)-1;
   if t(_i_,_n_)=. then do;
    t(_i_,_n_)=t(_i_+1,_n_);
    t(_i_+1,_n_)=.;
   end;
  end;
 end;
 do _n_=1 to dim(t,1)-1;
  id=v(_n_);
  do over u;
   u=t(_n_,_i_);
  end;
  output;
 end;
run;

And if your ID is char, please make the V array, a temp array with a $ length like

array v(5)$32 _temporary_;
sarahfung532
Calcite | Level 5

I will try your code tomorrow. Thank you very much! The diagonal missing data was from a Jackknife procedure for Deming regression analysis. We don't care the ID match up or not, we need to do a matrix calculations after, which not allow missing values in there. That's why I need help to remove only the missing values. 🙂

novinosrin
Tourmaline | Level 20

"We don't care the ID match up or not,"

 

That makes it a piece of cake. Feel free to come back to us should you need any clarification tomorrow. It's no biggie. Of course the code provided by @Astounding  is very convenient, however the multidim array isn't hard either once you know how to reference the elements. All the best!

Tom
Super User Tom
Super User

@sarahfung532 wrote:

I will try your code tomorrow. Thank you very much! The diagonal missing data was from a Jackknife procedure for Deming regression analysis. We don't care the ID match up or not, we need to do a matrix calculations after, which not allow missing values in there. That's why I need help to remove only the missing values. 🙂


Still makes no sense to me. The normal meaning of a dataset is that a single observation represents that value of the variables for a single event/sample/person.  Moving the values from the next observation for one of the variables makes no sense.

 

If you just need to eliminate the non-missing values for your matrix operations to work then replace the missing values with some constant. Like zero or 1  or the row mean or the column mean that will not impact the operation.

sarahfung532
Calcite | Level 5

You code works beautifully!! 🙂 I tried it on my data which is 10,000 observations and over  50,000 variables, and the who procedure only took less than 2 minutes! Thank you so much!!!

Astounding
PROC Star

It becomes more practical if you let macro language do the typing.  For example:

 

%macro nonmissing (n=);
   %local i;
   %do i=1 %to &n;
      set have (keep=x&i where=(x&i > .));
   %end;
%mend nonmissing;

data want;
   %nonmissing (n=800)
run;
sarahfung532
Calcite | Level 5

Thank you! I will try that later.

Tom
Super User Tom
Super User

Are you sure?  That looks like some type of array with missing values along the diagonal.  Are you sure that eliminating the missing values won't put the values in the wrong place? Are you sure that you shouldn't instead be moving the values one column to the left instead?

sarahfung532
Calcite | Level 5

The data actually came from a Deming Jackknife procedure, which the diagonal missing values are useless. We only need the non-missing values for further calculations. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 2885 views
  • 2 likes
  • 4 in conversation