Solved: Re: move one row up for all variables

sarahfung532 · Posted 02-13-2020 04:55 PM

Hi,

I have a big data with hundreds variables, each variable has a missing value that across all the variables diagonally. The data looks like:

ID X1 X2 X3 X4 X5 ...... Xn

1 . 2.1 3.1 4.1 5.1 .....

2 1.2 . 3.3 4.5 5.3

3 1.3 2.2 . 4.4 5.5

4 1.4 2.3 3.6 . 5.1

5 1.8 2.9 3.0 4.7 .

...

;

How could I move one row up for each variable (one row less in the end), and the result I wanted is:

ID X1 X2 X3 X4 X5 ...... Xn

1 1.2 2.1 3.1 4.1 5.1 .....

2 1.3 2.2 3.3 4.5 5.3

3 1.4 2.3 3.6 4.4 5.5

4 1.8 2.9 3.0 4.7 5.1

...

;

novinosrin · Posted 02-13-2020 06:04 PM

Hi again @sarahfung532 I am not sure how rolling up leading to exluding ID makes sense if you may clarify, nonetheless here is an approach



data have;
input ID   X1    X2     X3    X4    X5;
cards;
1    .        2.1    3.1    4.1    5.1 
2   1.2     .        3.3    4.5    5.3
3   1.3     2.2    .        4.4    5.5  
4   1.4     2.3    3.6     .       5.1
5   1.8     2.9    3.0    4.7    .
;

data want;
 do _n_=1 by 1 until(z);
  set have end=z;
  array t(5,5) _temporary_;
  array u x1-x5;
  do over u;
   t(_n_,_i_)=u;
  end;
 end;
 do _n_=1 to 5;
  do _i_=1 to dim(t,1)-1;
   if t(_i_,_n_)=. then do;
    t(_i_,_n_)=t(_i_+1,_n_);
    t(_i_+1,_n_)=.;
   end;
  end;
 end;
 do _n_=1 to dim(t,1)-1;
  do over u;
   u=t(_n_,_i_);
  end;
  output;
 end;
 drop id;
run;
/*And if you want the ID rolled up as it seems your output exludes ID5,
the following does it*/
data want_with_ID;
 do _n_=1 by 1 until(z);
  set have end=z;
  array t(5,5) _temporary_ ;
  array u x1-x5;
  array v(5) _temporary_;
  v(_n_)=id;
  do over u;
   t(_n_,_i_)=u;
  end;
 end;
 do _n_=1 to 5;
  do _i_=1 to dim(t,1)-1;
   if t(_i_,_n_)=. then do;
    t(_i_,_n_)=t(_i_+1,_n_);
    t(_i_+1,_n_)=.;
   end;
  end;
 end;
 do _n_=1 to dim(t,1)-1;
  id=v(_n_);
  do over u;
   u=t(_n_,_i_);
  end;
  output;
 end;
run;

And if your ID is char, please make the V array, a temp array with a $ length like

array v(5)$32 _temporary_;

View solution in original post

Astounding · Posted 02-13-2020 05:06 PM

This isn't very elegant, but it will work:

data want;
   set have (keep=x1 where=(x1 > .));
   set have (keep=x2 where=(x2 > .));
   ...
   set have (keep=xn where=(xn > .));
run;

sarahfung532 · Posted 02-13-2020 05:25 PM

Thank you for your quick response! It would work for small data set. But my data have 10,000 observations and over 800 same type of variables (x1 to x800), so the "set" statement won't be so practical in here ...

novinosrin · Posted 02-13-2020 05:35 PM

HI @sarahfung532 Is your ID numeric? If yes, i will take a shot a with multi dim array. If not, I have to think of something different

novinosrin · Posted 02-13-2020 06:04 PM

Hi again @sarahfung532 I am not sure how rolling up leading to exluding ID makes sense if you may clarify, nonetheless here is an approach



data have;
input ID   X1    X2     X3    X4    X5;
cards;
1    .        2.1    3.1    4.1    5.1 
2   1.2     .        3.3    4.5    5.3
3   1.3     2.2    .        4.4    5.5  
4   1.4     2.3    3.6     .       5.1
5   1.8     2.9    3.0    4.7    .
;

data want;
 do _n_=1 by 1 until(z);
  set have end=z;
  array t(5,5) _temporary_;
  array u x1-x5;
  do over u;
   t(_n_,_i_)=u;
  end;
 end;
 do _n_=1 to 5;
  do _i_=1 to dim(t,1)-1;
   if t(_i_,_n_)=. then do;
    t(_i_,_n_)=t(_i_+1,_n_);
    t(_i_+1,_n_)=.;
   end;
  end;
 end;
 do _n_=1 to dim(t,1)-1;
  do over u;
   u=t(_n_,_i_);
  end;
  output;
 end;
 drop id;
run;
/*And if you want the ID rolled up as it seems your output exludes ID5,
the following does it*/
data want_with_ID;
 do _n_=1 by 1 until(z);
  set have end=z;
  array t(5,5) _temporary_ ;
  array u x1-x5;
  array v(5) _temporary_;
  v(_n_)=id;
  do over u;
   t(_n_,_i_)=u;
  end;
 end;
 do _n_=1 to 5;
  do _i_=1 to dim(t,1)-1;
   if t(_i_,_n_)=. then do;
    t(_i_,_n_)=t(_i_+1,_n_);
    t(_i_+1,_n_)=.;
   end;
  end;
 end;
 do _n_=1 to dim(t,1)-1;
  id=v(_n_);
  do over u;
   u=t(_n_,_i_);
  end;
  output;
 end;
run;

And if your ID is char, please make the V array, a temp array with a $ length like

array v(5)$32 _temporary_;

sarahfung532 · Posted 02-13-2020 06:27 PM

I will try your code tomorrow. Thank you very much! The diagonal missing data was from a Jackknife procedure for Deming regression analysis. We don't care the ID match up or not, we need to do a matrix calculations after, which not allow missing values in there. That's why I need help to remove only the missing values. 🙂

novinosrin · Posted 02-13-2020 06:30 PM

"We don't care the ID match up or not,"

That makes it a piece of cake. Feel free to come back to us should you need any clarification tomorrow. It's no biggie. Of course the code provided by @Astounding is very convenient, however the multidim array isn't hard either once you know how to reference the elements. All the best!

Tom · Posted 02-13-2020 07:57 PM

@sarahfung532 wrote:

I will try your code tomorrow. Thank you very much! The diagonal missing data was from a Jackknife procedure for Deming regression analysis. We don't care the ID match up or not, we need to do a matrix calculations after, which not allow missing values in there. That's why I need help to remove only the missing values. 🙂

Still makes no sense to me. The normal meaning of a dataset is that a single observation represents that value of the variables for a single event/sample/person. Moving the values from the next observation for one of the variables makes no sense.

If you just need to eliminate the non-missing values for your matrix operations to work then replace the missing values with some constant. Like zero or 1 or the row mean or the column mean that will not impact the operation.

sarahfung532 · Posted 02-14-2020 12:19 PM

You code works beautifully!! 🙂 I tried it on my data which is 10,000 observations and over 50,000 variables, and the who procedure only took less than 2 minutes! Thank you so much!!!

Astounding · Posted 02-13-2020 05:37 PM

It becomes more practical if you let macro language do the typing. For example:

%macro nonmissing (n=);
   %local i;
   %do i=1 %to &n;
      set have (keep=x&i where=(x&i > .));
   %end;
%mend nonmissing;

data want;
   %nonmissing (n=800)
run;

sarahfung532 · Posted 02-13-2020 06:17 PM

Thank you! I will try that later.

Tom · Posted 02-13-2020 05:45 PM

Are you sure? That looks like some type of array with missing values along the diagonal. Are you sure that eliminating the missing values won't put the values in the wrong place? Are you sure that you shouldn't instead be moving the values one column to the left instead?

sarahfung532 · Posted 02-13-2020 06:15 PM

The data actually came from a Deming Jackknife procedure, which the diagonal missing values are useless. We only need the non-missing values for further calculations.