BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi SAS Users,

Normally, we use RETAIN to reorder the columns in a dataset.

For example, I have a list of column names as below

GVKEY IID DATADATE CIK CONML LOC SIC ISIN SEDOL EXCHG AJEXDI CSHOC CSHTRD CURCDD PRCCD PRCHD PRCLD PRCSTD TPCI TRFD ex_rate gviidkey prccd_abs prccd_abs_ lagprccd_abs_ lagajexdi lagtrfd raw_return log_raw_return raw_return_abs dollar_vol

/*I highlight DATADATE and gviidkey manually for noticing purpose*/

Normally if I want to put gviidkey and DATADATE in the first two columns, I did:

data have;
	retain gviidkey DATADATE;
	set have;
run;

However, if I want to put the two columns (GVKEY and IID) to the last columns of the dataset, what I should do without typing all column names in retain statement?

 

Warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@Phil_NZ wrote:

Hi @mkeintz 

Thank you for your explanation

I saw a lot of discussions using the phrase "if 0 then set(merge)", but now I faced in my discussion.

 

I myself read the document but I still do not understand why you use it here. Could you please briefly explain it to me and what does it play in your code here?

And I do not understand why there are two datasets have in the code. I am not able to understand the mechanism behind, could you please explain it to me ?

if 0 then set have (keep=gvkey iid) have;

Thank you.


The purpose of this code is to obligate the compiler of the data step to order variables as you want .  It relies on the fact that the compiler will arrange variables in the order they are encountered in the code.

 

So the statement

if 0 then set have (drop=gvkey iid)  have;

delays exposure of GVKEY and IID to the compiler until all the other variables in the  dataset have already been seen by the compiler.  GVKEY and IID will be found in the second HAVE, but not the first.

 

Similarly

if 0 then set have (keep=gvkey iid)  have;

exposes GVKEY and IID to the compiler prior to all the other variables in HAVE.    This is effectively what you were doing when you issued a "RETAIN GVKEY IID" statement prior to your "set have" statement.

 

In both of my codes above, it doesn't matter what position those variables had in HAVE.   GVKEY and IID could be anywhere - left, right, or middle, but using the above will re-order as I describe.

 

And again, this is just to guide the compiler in the task of ordering variables. It has no influence on the actual data processing, because the "IF 0" condition is never true, so the "THEN SET ..." clauses are not actually executed.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

You don't need retain;

 

You could put GVKEY IID at the start by:

 

data want;
  if 0 then set have (keep=gvkey iid) have;
  set have;
  ... remaining code ...
run;

And similarly put them at the end by

data want;
  if 0 then set have (drop=gvkey iid) have;
  set have;
  ... remaining code ...
run;

 

 

 

This takes advantage of the fact that the compiler orders variables in the sequence in which they are encountered, as it builds the executable code.  But the "IF 0" condition, while being recognized by the compiler, nevertheless will not execute, because zero is never true.  

 

BTW, you could even drop the second "have" argument in the "if 0 then set" statements.

Also BTW, you could use "if 0 the MERGE"  as well as "if 0 then SET".  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Phil_NZ
Barite | Level 11

Hi @mkeintz 

Thank you for your explanation

I saw a lot of discussions using the phrase "if 0 then set(merge)", but now I faced in my discussion.

 

I myself read the document but I still do not understand why you use it here. Could you please briefly explain it to me and what does it play in your code here?

And I do not understand why there are two datasets have in the code. I am not able to understand the mechanism behind, could you please explain it to me ?

if 0 then set have (keep=gvkey iid) have;

Thank you.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
mkeintz
PROC Star

@Phil_NZ wrote:

Hi @mkeintz 

Thank you for your explanation

I saw a lot of discussions using the phrase "if 0 then set(merge)", but now I faced in my discussion.

 

I myself read the document but I still do not understand why you use it here. Could you please briefly explain it to me and what does it play in your code here?

And I do not understand why there are two datasets have in the code. I am not able to understand the mechanism behind, could you please explain it to me ?

if 0 then set have (keep=gvkey iid) have;

Thank you.


The purpose of this code is to obligate the compiler of the data step to order variables as you want .  It relies on the fact that the compiler will arrange variables in the order they are encountered in the code.

 

So the statement

if 0 then set have (drop=gvkey iid)  have;

delays exposure of GVKEY and IID to the compiler until all the other variables in the  dataset have already been seen by the compiler.  GVKEY and IID will be found in the second HAVE, but not the first.

 

Similarly

if 0 then set have (keep=gvkey iid)  have;

exposes GVKEY and IID to the compiler prior to all the other variables in HAVE.    This is effectively what you were doing when you issued a "RETAIN GVKEY IID" statement prior to your "set have" statement.

 

In both of my codes above, it doesn't matter what position those variables had in HAVE.   GVKEY and IID could be anywhere - left, right, or middle, but using the above will re-order as I describe.

 

And again, this is just to guide the compiler in the task of ordering variables. It has no influence on the actual data processing, because the "IF 0" condition is never true, so the "THEN SET ..." clauses are not actually executed.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Phil_NZ
Barite | Level 11

@mkeintz 

Thank you very much, I understand like that:

Regarding the code

if 0 then set have (drop=gvkey iid)  have;

So, SAS will encounter other variables first and assumingly dropping gvkey and iid, then because this logic is wrong, so it adds back these two columns, leading to these two columns will be at the last position.

I hope that I did not fall into any fallacy.

 

Many thanks and best regards.

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 462 views
  • 3 likes
  • 2 in conversation