BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hello,

I want to shifts up the var2 this data.

For that, I use the option next.

I have two problems,

I can not define the beginning of the missing values ​​and their end.

here is an example

 

 

data test;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
; 
run;


data test1;
set test end=eof;
next=_n_+4;
if not eof then set test (keep=var2 rename=(var2=next_var2 )) point=next;
run;

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

When you have a statement like:

     SET A B;

 

you are creating a single stream of data by concatenating observations in A , followed by observations in B.  It doesn't matter that A and B don't have the  same variables, you will still get Na+Nb observations (where Na is number of obs in A, and Nb is defined similarly).

 

But

    SET A;
    SET B;

 

generates two synchronized streams, so the number of obs will be the minimum of Na vs Nb.   (The data step stops when either SET statement attempts to read beyond end of data.  Of  course any  variable in both B and  A will get the value from B overwriting the value from A.

 

Finally,

   MERGE A B;

makes a (let's call it) a single merged stream.  It will produce the same results  as "SET A; SET B;",  EXCEPT anytime  Na^=Nb, there will be additional observations generating a  total number of observations equal to the  MAXIMUM of Na vs Nb.   And those extra observations will have missing values assigned to variables belonging only to the smaller dataset.

 

These are very powerful capabilities of the SAS data step, which is often able to trivially generate results attainable in proc sql only by very tortured coding.

 

 

As to the question on the "end=" option of the SET statement, that is a question (unlike the above) which can easily be answered by the sas documentation, or a google search for   sas+set+end.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

11 REPLIES 11
r_behata
Barite | Level 11

What is your expected output ?

novinosrin
Tourmaline | Level 20

+1 mate!

mansour_ib_sas
Pyrite | Level 9
data test;
input var1 var2;
cards;
1 4
2 9
3 .
4 .
5 .
3 .
; 
run;
novinosrin
Tourmaline | Level 20

data test;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
; 
run;


data test1;
merge test(drop=var2)  test(keep=var2 where=(var2 is not missing)) ;
run;
mkeintz
PROC Star

And what if the OP wanted:

 

data want;
input var1 var2;
cards;
1 4
2 9
3 9
4 9
5 9
3 9
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20
data test1;
merge test(drop=var2)  test(keep=var2 where=(var2 is not missing)) ;
retain _iorc_;
if not missing(var2) then _iorc_=var2;
else var2=_iorc_;
run;
mkeintz
PROC Star

An easy way to retain a SET (or MERGE) variable across multiple observations is to do conditional SET statements:

 

data want;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
run;

data want;
  set have (drop=var2);
  if eod2=0 then set have (keep=var2 where=(not missing(var2))) end=eod2;
run;

 

VAR2, like all variables read by SET or MERGE, is automatically retained until a subsequent data step iteration executes a SET or MERGE retrieving the same variable and overwrites the old value.  Simply avoid any attempt to read var2 beyond end of data set HAVE.

 

While there's not much advantage in the case of a single VAR2 variables, consider the advantages when there are, say, a dozen variables that are simultaneously all missing or all non-missing and all need to be "shifted up".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Nice

mansour_ib_sas
Pyrite | Level 9

Thank you,

in this example, why i have a 12 observations in t3 and only 6 in want.

 

data want;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
run;

data t1;
set want(keep=var1);
run;

data t2;
set want(keep=var2);
run;

data t3;
set t1 t2;
run;

data want;
  set have (drop=var2);
  set have (keep=var2);
run;

another question please.

is  eod2 own for this statement only or for data result (want)?

then set have (keep=var2 where=(not missing(var2))) end=eod2

 if so, could you, please, explain to me how he had to go "shift up" the observations of var2

mkeintz
PROC Star

When you have a statement like:

     SET A B;

 

you are creating a single stream of data by concatenating observations in A , followed by observations in B.  It doesn't matter that A and B don't have the  same variables, you will still get Na+Nb observations (where Na is number of obs in A, and Nb is defined similarly).

 

But

    SET A;
    SET B;

 

generates two synchronized streams, so the number of obs will be the minimum of Na vs Nb.   (The data step stops when either SET statement attempts to read beyond end of data.  Of  course any  variable in both B and  A will get the value from B overwriting the value from A.

 

Finally,

   MERGE A B;

makes a (let's call it) a single merged stream.  It will produce the same results  as "SET A; SET B;",  EXCEPT anytime  Na^=Nb, there will be additional observations generating a  total number of observations equal to the  MAXIMUM of Na vs Nb.   And those extra observations will have missing values assigned to variables belonging only to the smaller dataset.

 

These are very powerful capabilities of the SAS data step, which is often able to trivially generate results attainable in proc sql only by very tortured coding.

 

 

As to the question on the "end=" option of the SET statement, that is a question (unlike the above) which can easily be answered by the sas documentation, or a google search for   sas+set+end.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

data test;
input var1 var2;
cards;
1 .
2 .
3 .
4 .
5 4
3 9
; 
run;


data test1;
merge test(drop=var2)  test(keep=var2 firstobs=5) ;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1838 views
  • 4 likes
  • 5 in conversation