Solved: Re: shifts up not missing variables

mansour_ib_sas · Posted 11-21-2018 02:06 PM

Hello,

I want to shifts up the var2 this data.

For that, I use the option next.

I have two problems,

I can not define the beginning of the missing values and their end.

here is an example

data test;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
; 
run;


data test1;
set test end=eof;
next=_n_+4;
if not eof then set test (keep=var2 rename=(var2=next_var2 )) point=next;
run;

Thank you

mkeintz · Posted 11-26-2018 10:14 AM

When you have a statement like:

SET A B;

you are creating a single stream of data by concatenating observations in A , followed by observations in B. It doesn't matter that A and B don't have the same variables, you will still get Na+Nb observations (where Na is number of obs in A, and Nb is defined similarly).

But

SET A;
SET B;

generates two synchronized streams, so the number of obs will be the minimum of Na vs Nb. (The data step stops when either SET statement attempts to read beyond end of data. Of course any variable in both B and A will get the value from B overwriting the value from A.

Finally,

MERGE A B;

makes a (let's call it) a single merged stream. It will produce the same results as "SET A; SET B;", EXCEPT anytime Na^=Nb, there will be additional observations generating a total number of observations equal to the MAXIMUM of Na vs Nb. And those extra observations will have missing values assigned to variables belonging only to the smaller dataset.

These are very powerful capabilities of the SAS data step, which is often able to trivially generate results attainable in proc sql only by very tortured coding.

As to the question on the "end=" option of the SET statement, that is a question (unlike the above) which can easily be answered by the sas documentation, or a google search for sas+set+end.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

r_behata · Posted 11-21-2018 02:13 PM

What is your expected output ?

novinosrin · Posted 11-21-2018 02:14 PM

+1 mate!

mansour_ib_sas · Posted 11-21-2018 02:15 PM

data test;
input var1 var2;
cards;
1 4
2 9
3 .
4 .
5 .
3 .
; 
run;

novinosrin · Posted 11-21-2018 02:22 PM


data test;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
; 
run;


data test1;
merge test(drop=var2)  test(keep=var2 where=(var2 is not missing)) ;
run;

mkeintz · Posted 11-25-2018 05:44 PM

And what if the OP wanted:

data want;
input var1 var2;
cards;
1 4
2 9
3 9
4 9
5 9
3 9
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 11-25-2018 06:00 PM

data test1;
merge test(drop=var2)  test(keep=var2 where=(var2 is not missing)) ;
retain _iorc_;
if not missing(var2) then _iorc_=var2;
else var2=_iorc_;
run;

mkeintz · Posted 11-25-2018 06:10 PM

An easy way to retain a SET (or MERGE) variable across multiple observations is to do conditional SET statements:

data want;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
run;

data want;
  set have (drop=var2);
  if eod2=0 then set have (keep=var2 where=(not missing(var2))) end=eod2;
run;

VAR2, like all variables read by SET or MERGE, is automatically retained until a subsequent data step iteration executes a SET or MERGE retrieving the same variable and overwrites the old value. Simply avoid any attempt to read var2 beyond end of data set HAVE.

While there's not much advantage in the case of a single VAR2 variables, consider the advantages when there are, say, a dozen variables that are simultaneously all missing or all non-missing and all need to be "shifted up".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 11-25-2018 06:15 PM

Nice

mansour_ib_sas · Posted 11-26-2018 06:39 AM

Thank you,

in this example, why i have a 12 observations in t3 and only 6 in want.

data want;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
run;

data t1;
set want(keep=var1);
run;

data t2;
set want(keep=var2);
run;

data t3;
set t1 t2;
run;

data want;
  set have (drop=var2);
  set have (keep=var2);
run;

another question please.

is eod2 own for this statement only or for data result (want)?

then set have (keep=var2 where=(not missing(var2))) end=eod2

if so, could you, please, explain to me how he had to go "shift up" the observations of var2

mkeintz · Posted 11-26-2018 10:14 AM

When you have a statement like:

SET A B;

you are creating a single stream of data by concatenating observations in A , followed by observations in B. It doesn't matter that A and B don't have the same variables, you will still get Na+Nb observations (where Na is number of obs in A, and Nb is defined similarly).

But

SET A;
SET B;

generates two synchronized streams, so the number of obs will be the minimum of Na vs Nb. (The data step stops when either SET statement attempts to read beyond end of data. Of course any variable in both B and A will get the value from B overwriting the value from A.

Finally,

MERGE A B;

makes a (let's call it) a single merged stream. It will produce the same results as "SET A; SET B;", EXCEPT anytime Na^=Nb, there will be additional observations generating a total number of observations equal to the MAXIMUM of Na vs Nb. And those extra observations will have missing values assigned to variables belonging only to the smaller dataset.

These are very powerful capabilities of the SAS data step, which is often able to trivially generate results attainable in proc sql only by very tortured coding.

As to the question on the "end=" option of the SET statement, that is a question (unlike the above) which can easily be answered by the sas documentation, or a google search for sas+set+end.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 11-22-2018 07:24 AM


data test;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
; 
run;


data test1;
merge test(drop=var2)  test(keep=var2 firstobs=5) ;
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away