Hi SAS Users,
Normally, we use RETAIN to reorder the columns in a dataset.
For example, I have a list of column names as below
GVKEY IID DATADATE CIK CONML LOC SIC ISIN SEDOL EXCHG AJEXDI CSHOC CSHTRD CURCDD PRCCD PRCHD PRCLD PRCSTD TPCI TRFD ex_rate gviidkey prccd_abs prccd_abs_ lagprccd_abs_ lagajexdi lagtrfd raw_return log_raw_return raw_return_abs dollar_vol
/*I highlight DATADATE and gviidkey manually for noticing purpose*/
Normally if I want to put gviidkey and DATADATE in the first two columns, I did:
data have;
retain gviidkey DATADATE;
set have;
run;
However, if I want to put the two columns (GVKEY and IID) to the last columns of the dataset, what I should do without typing all column names in retain statement?
Warm regards.
@Phil_NZ wrote:
Hi @mkeintz
Thank you for your explanation
I saw a lot of discussions using the phrase "if 0 then set(merge)", but now I faced in my discussion.
I myself read the document but I still do not understand why you use it here. Could you please briefly explain it to me and what does it play in your code here?
And I do not understand why there are two datasets have in the code. I am not able to understand the mechanism behind, could you please explain it to me ?
if 0 then set have (keep=gvkey iid) have;
Thank you.
The purpose of this code is to obligate the compiler of the data step to order variables as you want . It relies on the fact that the compiler will arrange variables in the order they are encountered in the code.
So the statement
if 0 then set have (drop=gvkey iid) have;
delays exposure of GVKEY and IID to the compiler until all the other variables in the dataset have already been seen by the compiler. GVKEY and IID will be found in the second HAVE, but not the first.
Similarly
if 0 then set have (keep=gvkey iid) have;
exposes GVKEY and IID to the compiler prior to all the other variables in HAVE. This is effectively what you were doing when you issued a "RETAIN GVKEY IID" statement prior to your "set have" statement.
In both of my codes above, it doesn't matter what position those variables had in HAVE. GVKEY and IID could be anywhere - left, right, or middle, but using the above will re-order as I describe.
And again, this is just to guide the compiler in the task of ordering variables. It has no influence on the actual data processing, because the "IF 0" condition is never true, so the "THEN SET ..." clauses are not actually executed.
You don't need retain;
You could put GVKEY IID at the start by:
data want;
if 0 then set have (keep=gvkey iid) have;
set have;
... remaining code ...
run;
And similarly put them at the end by
data want;
if 0 then set have (drop=gvkey iid) have;
set have;
... remaining code ...
run;
This takes advantage of the fact that the compiler orders variables in the sequence in which they are encountered, as it builds the executable code. But the "IF 0" condition, while being recognized by the compiler, nevertheless will not execute, because zero is never true.
BTW, you could even drop the second "have" argument in the "if 0 then set" statements.
Also BTW, you could use "if 0 the MERGE" as well as "if 0 then SET".
Hi @mkeintz
Thank you for your explanation
I saw a lot of discussions using the phrase "if 0 then set(merge)", but now I faced in my discussion.
I myself read the document but I still do not understand why you use it here. Could you please briefly explain it to me and what does it play in your code here?
And I do not understand why there are two datasets have in the code. I am not able to understand the mechanism behind, could you please explain it to me ?
if 0 then set have (keep=gvkey iid) have;
Thank you.
@Phil_NZ wrote:
Hi @mkeintz
Thank you for your explanation
I saw a lot of discussions using the phrase "if 0 then set(merge)", but now I faced in my discussion.
I myself read the document but I still do not understand why you use it here. Could you please briefly explain it to me and what does it play in your code here?
And I do not understand why there are two datasets have in the code. I am not able to understand the mechanism behind, could you please explain it to me ?
if 0 then set have (keep=gvkey iid) have;
Thank you.
The purpose of this code is to obligate the compiler of the data step to order variables as you want . It relies on the fact that the compiler will arrange variables in the order they are encountered in the code.
So the statement
if 0 then set have (drop=gvkey iid) have;
delays exposure of GVKEY and IID to the compiler until all the other variables in the dataset have already been seen by the compiler. GVKEY and IID will be found in the second HAVE, but not the first.
Similarly
if 0 then set have (keep=gvkey iid) have;
exposes GVKEY and IID to the compiler prior to all the other variables in HAVE. This is effectively what you were doing when you issued a "RETAIN GVKEY IID" statement prior to your "set have" statement.
In both of my codes above, it doesn't matter what position those variables had in HAVE. GVKEY and IID could be anywhere - left, right, or middle, but using the above will re-order as I describe.
And again, this is just to guide the compiler in the task of ordering variables. It has no influence on the actual data processing, because the "IF 0" condition is never true, so the "THEN SET ..." clauses are not actually executed.
Thank you very much, I understand like that:
Regarding the code
if 0 then set have (drop=gvkey iid) have;
So, SAS will encounter other variables first and assumingly dropping gvkey and iid, then because this logic is wrong, so it adds back these two columns, leading to these two columns will be at the last position.
I hope that I did not fall into any fallacy.
Many thanks and best regards.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.