Hello,
I need to control the order of the variables in a given data set and use RETAIN before the SET statement. Here is an example:
data start ; var1 = 1 ; var2 = 2 ; output ; var1 = . ; var2 = 2 ; output ; run ; data end ; retain var2 var1 dummy ; set start ; if var1 = 1 then dummy = 1 ; run ;
I have done this dozens of times in the past but now I encountered a problem. The resulting data set looks like that:
var2 | var1 | dummy | |
1 | 2 | 1 | 1 |
2 | 2 | . | 1 |
I expected dummy in row 2 to be a missing. To me it seems that dummy is retained although I only used RETAIN to order the columns which should have no effect on the values of the variable itself according to this source. Can anybody explain this to me?
You attempted to use a SIDE EFFECT of the RETAIN statement. You cannot get the side effect without the statement also doing what it was intended to do.
Use a different method to change the variable order.
data end ;
if 0 then set start(keep=var2) start(keep=var1);
length dummy 8;
set start ;
if var1 = 1 then dummy = 1 ;
run ;
You attempted to use a SIDE EFFECT of the RETAIN statement. You cannot get the side effect without the statement also doing what it was intended to do.
Use a different method to change the variable order.
data end ;
if 0 then set start(keep=var2) start(keep=var1);
length dummy 8;
set start ;
if var1 = 1 then dummy = 1 ;
run ;
Thank you for the answer, but the source says: "The RETAIN statement is most often used to reorder variables because no other variable attribute specifications are required. The RETAIN statement has no effect on retaining values of existing variables being read from the data set."
So it shouldn't behave like that, should it?
@sas_nutzer wrote:
Thank you for the answer, but the source says: "The RETAIN statement is most often used to reorder variables because no other variable attribute specifications are required. The RETAIN statement has no effect on retaining values of existing variables being read from the data set."
So it shouldn't behave like that, should it?
The important part is this:
of existing variables being read from the data set
Since dummy is not being read from a dataset, it is affected by the RETAIN statement.
Although existing variables (those read by a SET or MERGE) are retained by definition, they are always overwritten when a new observation is read. Dummy is not overwritten and therefore keeps its value.
According to my understanding of RETAIN, the code is doing what it is supposed to do. Since DUMMY is a retained variable, and for the second row where var1=. DUMMY does not get assigned a value in the code. So it RETAINs the value from the previous observation.
Over the years, I have seen a lot of people spend a lot of time trying to get columns in a specific order. And I think in 99.9% of the cases, there is no need to do this at all, and the time spent ordering the variables is essentially not productive in my mind. Maybe yours is the 0.1% of the time where a specific order is required — please explain why you think you must have a specific order.
@sas_nutzer wrote:
In this case I need the correct order.
Why? What bad things happen if your ordering is not correct? Please explain in detail.
@PaigeMiller wrote:
@sas_nutzer wrote:
In this case I need the correct order.Why? What bad things happen if your ordering is not correct? Please explain in detail.
The dataset is difficult for humans to work with. You cannot LOOK at the data without having to rearrange the columns.
Example I have seen way too many times. Someone lazily create a bunch of optional long character variables at the front of the program data vector.
data class;
length comment1-comment4 $200;
set sashelp.class;
run;
Then when you try to examine the data the whole first page of data is filled with empty columns.
In this particular case, you do not need to use Dummy on the RETAIN statement. Any new variables created in the DATA step will automatically be added last to the PDV.
data start ;
var1 = 1 ;
var2 = 2 ;
output ;
var1 = . ;
var2 = 2 ;
output ;
run ;
data end ;
retain var2 var1;
set start ;
if var1 = 1 then dummy = 1 ;
run ;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.