Hi,
I have a big data with hundreds variables, each variable has a missing value that across all the variables diagonally. The data looks like:
ID X1 X2 X3 X4 X5 ...... Xn
1 . 2.1 3.1 4.1 5.1 .....
2 1.2 . 3.3 4.5 5.3
3 1.3 2.2 . 4.4 5.5
4 1.4 2.3 3.6 . 5.1
5 1.8 2.9 3.0 4.7 .
...
;
How could I move one row up for each variable (one row less in the end), and the result I wanted is:
ID X1 X2 X3 X4 X5 ...... Xn
1 1.2 2.1 3.1 4.1 5.1 .....
2 1.3 2.2 3.3 4.5 5.3
3 1.4 2.3 3.6 4.4 5.5
4 1.8 2.9 3.0 4.7 5.1
...
;
Hi again @sarahfung532 I am not sure how rolling up leading to exluding ID makes sense if you may clarify, nonetheless here is an approach
data have;
input ID X1 X2 X3 X4 X5;
cards;
1 . 2.1 3.1 4.1 5.1
2 1.2 . 3.3 4.5 5.3
3 1.3 2.2 . 4.4 5.5
4 1.4 2.3 3.6 . 5.1
5 1.8 2.9 3.0 4.7 .
;
data want;
do _n_=1 by 1 until(z);
set have end=z;
array t(5,5) _temporary_;
array u x1-x5;
do over u;
t(_n_,_i_)=u;
end;
end;
do _n_=1 to 5;
do _i_=1 to dim(t,1)-1;
if t(_i_,_n_)=. then do;
t(_i_,_n_)=t(_i_+1,_n_);
t(_i_+1,_n_)=.;
end;
end;
end;
do _n_=1 to dim(t,1)-1;
do over u;
u=t(_n_,_i_);
end;
output;
end;
drop id;
run;
/*And if you want the ID rolled up as it seems your output exludes ID5,
the following does it*/
data want_with_ID;
do _n_=1 by 1 until(z);
set have end=z;
array t(5,5) _temporary_ ;
array u x1-x5;
array v(5) _temporary_;
v(_n_)=id;
do over u;
t(_n_,_i_)=u;
end;
end;
do _n_=1 to 5;
do _i_=1 to dim(t,1)-1;
if t(_i_,_n_)=. then do;
t(_i_,_n_)=t(_i_+1,_n_);
t(_i_+1,_n_)=.;
end;
end;
end;
do _n_=1 to dim(t,1)-1;
id=v(_n_);
do over u;
u=t(_n_,_i_);
end;
output;
end;
run;
And if your ID is char, please make the V array, a temp array with a $ length like
array v(5)$32 _temporary_;
This isn't very elegant, but it will work:
data want;
set have (keep=x1 where=(x1 > .));
set have (keep=x2 where=(x2 > .));
...
set have (keep=xn where=(xn > .));
run;
Thank you for your quick response! It would work for small data set. But my data have 10,000 observations and over 800 same type of variables (x1 to x800), so the "set" statement won't be so practical in here ...
HI @sarahfung532 Is your ID numeric? If yes, i will take a shot a with multi dim array. If not, I have to think of something different
Hi again @sarahfung532 I am not sure how rolling up leading to exluding ID makes sense if you may clarify, nonetheless here is an approach
data have;
input ID X1 X2 X3 X4 X5;
cards;
1 . 2.1 3.1 4.1 5.1
2 1.2 . 3.3 4.5 5.3
3 1.3 2.2 . 4.4 5.5
4 1.4 2.3 3.6 . 5.1
5 1.8 2.9 3.0 4.7 .
;
data want;
do _n_=1 by 1 until(z);
set have end=z;
array t(5,5) _temporary_;
array u x1-x5;
do over u;
t(_n_,_i_)=u;
end;
end;
do _n_=1 to 5;
do _i_=1 to dim(t,1)-1;
if t(_i_,_n_)=. then do;
t(_i_,_n_)=t(_i_+1,_n_);
t(_i_+1,_n_)=.;
end;
end;
end;
do _n_=1 to dim(t,1)-1;
do over u;
u=t(_n_,_i_);
end;
output;
end;
drop id;
run;
/*And if you want the ID rolled up as it seems your output exludes ID5,
the following does it*/
data want_with_ID;
do _n_=1 by 1 until(z);
set have end=z;
array t(5,5) _temporary_ ;
array u x1-x5;
array v(5) _temporary_;
v(_n_)=id;
do over u;
t(_n_,_i_)=u;
end;
end;
do _n_=1 to 5;
do _i_=1 to dim(t,1)-1;
if t(_i_,_n_)=. then do;
t(_i_,_n_)=t(_i_+1,_n_);
t(_i_+1,_n_)=.;
end;
end;
end;
do _n_=1 to dim(t,1)-1;
id=v(_n_);
do over u;
u=t(_n_,_i_);
end;
output;
end;
run;
And if your ID is char, please make the V array, a temp array with a $ length like
array v(5)$32 _temporary_;
I will try your code tomorrow. Thank you very much! The diagonal missing data was from a Jackknife procedure for Deming regression analysis. We don't care the ID match up or not, we need to do a matrix calculations after, which not allow missing values in there. That's why I need help to remove only the missing values. 🙂
"We don't care the ID match up or not,"
That makes it a piece of cake. Feel free to come back to us should you need any clarification tomorrow. It's no biggie. Of course the code provided by @Astounding is very convenient, however the multidim array isn't hard either once you know how to reference the elements. All the best!
@sarahfung532 wrote:
I will try your code tomorrow. Thank you very much! The diagonal missing data was from a Jackknife procedure for Deming regression analysis. We don't care the ID match up or not, we need to do a matrix calculations after, which not allow missing values in there. That's why I need help to remove only the missing values. 🙂
Still makes no sense to me. The normal meaning of a dataset is that a single observation represents that value of the variables for a single event/sample/person. Moving the values from the next observation for one of the variables makes no sense.
If you just need to eliminate the non-missing values for your matrix operations to work then replace the missing values with some constant. Like zero or 1 or the row mean or the column mean that will not impact the operation.
You code works beautifully!! 🙂 I tried it on my data which is 10,000 observations and over 50,000 variables, and the who procedure only took less than 2 minutes! Thank you so much!!!
It becomes more practical if you let macro language do the typing. For example:
%macro nonmissing (n=);
%local i;
%do i=1 %to &n;
set have (keep=x&i where=(x&i > .));
%end;
%mend nonmissing;
data want;
%nonmissing (n=800)
run;
Thank you! I will try that later.
Are you sure? That looks like some type of array with missing values along the diagonal. Are you sure that eliminating the missing values won't put the values in the wrong place? Are you sure that you shouldn't instead be moving the values one column to the left instead?
The data actually came from a Deming Jackknife procedure, which the diagonal missing values are useless. We only need the non-missing values for further calculations.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.