BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Solph
Pyrite | Level 9

Hello,

I’ve a question about using RETAIN to reorder the variable order in a dataset. I know I can use retain, such as the following

     DATA want;

     retain Var3 Var4 Var2 Var1;

     set old;

     run;

(I also just learned I could us attrib to do it, such as the following)

     DATA   want;

     attrib Var3 Var4 Var2 Var1;

     set old;

     run;

But I also know from my experience that I could mess up the output data if I’ve data processing work after the SET statement. I don't exactly remember the kind of data processing I used when I noticed the problem. Now that I'm using RETAIN a lot to reorder the variables, I thought I'd better know what kind of the data processing after SET will mess up or won’t mess up the data.


Say instead of the two steps below,

     data combined;

merge data1 data2; by id;

     data combined;

retain var3 var4 var1 var2;

set combined;

run;

Can I do in one step without causing any problem?

     data combined;

retain var3 var4 var1 var2;

merge data1 data2;

by id;

run;


How about it's more than just merging but involve creating derived variables, processing missing values, say the following?

     data new; retain var3 var4 var1 var2;

          set old;

          If var3=. then var3=var2;

          run;


Sorry I can't find an example when I noticed the problem of using RETAIN. Any general comments about potential problems when using retain are appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
kmw
SAS Employee kmw
SAS Employee

Editor's Note: This is a popular topic.  The following SAS Note shows how to reorder variables in a SAS data set:

Usage Note 8395

View solution in original post

13 REPLIES 13
Doc_Duke
Rhodochrosite | Level 12

Don't use RETAIN just to reorder the variables.  Use ATTRIB, LENGTH, or even FORMAT.

RETAIN has a significant side effect here in that the input data buffer is not cleared between observations.  It is valuable when used as designed, but can be nasty when doing SETs with different variables.

Doc Muhlbaier

Duke

Solph
Pyrite | Level 9

I learned the issues about RETAIN much later on.

But with ATTRIB, LENGTH, FORMAT, it's quite inconvenient to use them because I have to specify the format or length of each of the variables involved, such as the following. Is there an easy way out?

data attrib_method; attrib a b c d format=1.; set sample;

data attrib_method;attrib a b c d informat=1.;set sample;

data attrib_method;attrib a b c d label=’example’; set sample;

data attrib_method; attrib a b c d length=3.; set sample;

data_null__
Jade | Level 19

Retain is fine as long as you know what it does and does NOT do.

For variables coming from SET MERGE or UPDATE is does nothing except to change the order when the RETAIN statement is placed properly.  Variables coming from SET MERGE or UPDATE are NOT initialized to missing like variables that are created in a data step or read from INFILE.  That's why something like "if _N_ eq 1 then set ...;"  works to put the values read from this SET statement on every observation.

If you use RETAIN on variables that ARE created in a data step then you will need to understand the effect that will have on those variables and "program" accordinly.

Doc_Duke
Rhodochrosite | Level 12

You could also use a KEEP statement, but then you have to specify EVERY variable.

data_null__
Jade | Level 19

I don't believe KEEP will work to reorder variables.  Can you make example?

Solph
Pyrite | Level 9

I don't think KEEP works (unless there are tricks in specifying it). I guess it's a trade off for using RETAIN. Do I mind the data processing time for using RETAIN just for reordering variable? Do I want to spend time specifying the format/length of each of the variables involved in reordering when using attrib, length, format, etc.

data_null__
Jade | Level 19

If you don't want to "move" them around after the fact create the variables in the proper order to begin with.  I showed you how to get TRANSPOSE to do that in your TRANSPOSE thread.

Tom
Super User Tom
Super User

You can use the KEEP and/or DROP dataset option.

One way to get variables defined is to copy the definition from an existing dataset.  You can use a SET statement that is not executed.

data combined;

  if 0 then set data2 (keep=var3 var4) data1(keep= var1 var2);

  merge data1 data2;

  by id;

run;

ScottBass
Rhodochrosite | Level 12

Hi,

While I usually agree with Tom, and he's a heck of a lot smarter than me ;-), on this occasion I'll disagree.

IMO, the keep statement will set the variable attributes, but doesn't give you much, if any, control over the variable order in the PDV.

For example, if in data2, the (kept) variables are in this order:  var4 var3, and in data1 the variables are in this order:  var2 var1, then in the final dataset the variable order would be var4 var3 var2 var1 (then all the other variables contributed by data1 and data2 on the merge statement), NOT the order you'd think from the keep statement. 

Plus, as stated above, any variables that are defined via a set or merge statement have an implied retain, which has occasionally bit me in the past.  That's also why you'll see "if 0 then set somedataset (keep=whatever)" to set the variable attributes when using a data step hash object, but with call missing somewhere in the mix (often call missing (of _all_) to code around the implied retain issue.

But I digress...

My favourite approach when I want to reorder the variables in the PDV is a "naked format" statement.  For example:

data whatever;

  format height weight age sex name;

  set sashelp.class;

run;

During data step compilation, the compiler will set the variable PDV order based on the order of the statements in the data step.  In this example, the variable order is set by the format statement.  Then, the set statement kicks in, defining the rest of the variable attributes (length, label, format, informat).  If the variables have a permanent format associated with them in the dataset, "last one wins", and the format is associated with the variable at that point.

Occasionally this approach doesn't work, for example if I need to reference a variable before the set statement.  Since the variable attributes aren't set by the format statement, and numeric variables are the default, occasionally I get a character variable cast as numeric.  In this scenario, I need to revert to a length statement.  But this is rare.

As a first approach, try the "naked format" statement, and see if that works for you.

HTH...

Scott


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Howles
Quartz | Level 8

Too bad that a "naked" ATTRIBUTE statement is not allowed.

Scott Bass wrote:

[snip]

My favourite approach when I want to reorder the variables in the PDV is a "naked format" statement.  For example:

data whatever;

  format height weight age sex name;

  set sashelp.class;

run;

During data step compilation, the compiler will set the variable PDV order based on the order of the statements in the data step.  In this example, the variable order is set by the format statement.  Then, the set statement kicks in, defining the rest of the variable attributes (length, label, format, informat).  If the variables have a permanent format associated with them in the dataset, "last one wins", and the format is associated with the variable at that point.

Occasionally this approach doesn't work, for example if I need to reference a variable before the set statement.  Since the variable attributes aren't set by the format statement, and numeric variables are the default, occasionally I get a character variable cast as numeric.  In this scenario, I need to revert to a length statement.  But this is rare.

As a first approach, try the "naked format" statement, and see if that works for you.

HTH...

Scott

data_null__
Jade | Level 19

That's no different from RETAIN (not as good actually) and you could end up needing a set statement for every variable.

kmw
SAS Employee kmw
SAS Employee

Editor's Note: This is a popular topic.  The following SAS Note shows how to reorder variables in a SAS data set:

Usage Note 8395

ScottBass
Rhodochrosite | Level 12

From the usage note:

 

For example, there are six variables in data set ONE in the positional order of A, B, C, D, E, and F. If the new order needs to be A, E, C, D, B, and F, then the following DATA step reorders the variables in that order:

data two;
   retain a e c d b;
   set one;
run;

As noted, the variable F is left off the RETAIN statement since the order of variable F should not change. After running the above DATA step, the variables in data set ONE are in the order of A, E, C, D, B and F.

 

So expanding on the Usage Note:

 

data one;
   set sashelp.class;
   a=name;
   b=age;
   c=sex;
   d=weight;
   e=height;
   f=height*weight;
   keep a--f;
run;

data two;
   retain g a e c d b;
   set one;
   if c='M' then g='FOO';
run;

data three;
   format g a e c d b;
   set one;
   if c='M' then g='FOO';
run;

Compare the output between two and three.  IMO the output from two is possibly undesired, and the output from three is as expected.

 

The usage note is good, but IMO should go into more detail of the repercussions of using retain to set the PDV order for a data step (as opposed to data set) variable.  Obviously an experienced SAS programmer would know this, but this could catch a newbie.

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 82372 views
  • 10 likes
  • 7 in conversation