Quick question...
I know you can order fields by using the following code:
data help;
retain
field1 field2 ...;
set help;
run;
But I have a very large data set and I would like a clean way of ordering new variables created along the way a particular way. Is there a way of doing this?
Another question - if you want a new field to appear beside an existing field that is in the middle of your dataset is this possible?
Many thanks
Matthew
Matthew,
Here is one way to do it:
data have (drop=i);
retain X1-X10 claim_type y1-y10 (21*1);
do i=1 to 3;
output;
end;
run;
data to_add;
informat claim_amt comma8.;
informat claim_no $3.;
input claim_amt claim_no;
cards;
86.42 001
5,680.00 002
257.00 003
;
data want;
set have;
set to_add;
run;
proc sql noprint;
select varnum
into :first_end
from dictionary.columns
where libname="WORK" and
memname="WANT" and
name="claim_type"
;
select name
into :reorder separated by " "
from
(select name
from dictionary.columns
where libname="WORK" and
memname="WANT" and
varnum le &first_end.
outer union corr
select name
from dictionary.columns
where libname="WORK" and
memname="TO_ADD" )
;
quit;
data want;
retain &reorder.;
set want;
run;
With coding you can do a lot.
Changing the order as renaming a variable using proc dataset... no.
https://communities.sas.com/thread/35111
Do you mind to create a view for that purpose ? it almost cost no time for you .
data help/view=help;
retain
field1 field2 ...;
set help;
run;
Your code generates the following ERROR: UPDATE views are not supported.
The view should have a different name from the original dataset:
data help1/view=help1;
retain
field1 field2 ...;
set help;
run;
Sorry . that code is not tested . try this tested one :
data have/view=have;
retain age weight height name;
set sashelp.class;
run;
Xia Keshan
Ordering fields, why should you bothering about that.
My statement(1): You are either a lazy coder or nor really knowing how the data is stored.
Let me do some argumentation:
1/ With SAS the language you have logical order defined in the syntax for SAS-datasets. Are you going to use a RDBMS SQL that that order definition (*) does not exist.
2/ With a SAS datastep you are using a PDV (Program-Data-Vector) it is document the a variable when it gets defined that will be the order.
My Statement(2): The order of the variables in a SAS dataset (stored) is not the order you are seeing/thinking while processing the dataset.
Ha Hypothesis The order of the variables is the one you are seeing Hb the order of the variables is NOT the one you are seeing.
Prove:
Xia's program is working the retain approach will show the variables is an different order. But is not changing the type char/numeric and length of the variables
As the variables only can get a place in the PDV with the complete length information being known the retain is only defining the order in some kind of table not being the real storage place in the PDV.
Can we find additional information for this prove?
See reference 1:
"In SAS 6 of the SAS System, the position of variables on a SAS data set is determined by the order in which they are encountered regardless of their type or length. In order to improve performance, SAS 7 and beyond of the SAS System positions 8 byte and 4 byte numeric variables at 8 byte boundaries at the front of a SAS data set."
See refrence 2:
"When SAS processes the program data vector, it typically moves the data in one large operation rather than by individual variables. When data is properly aligned (in 8-byte boundaries), data movement can occur in as little as two clock cycles (a single load followed by a single store). SAS moves unaligned data by more complex means, at worst, a single byte at a time. This would be at least eight times slower for an 8-byte variable.
Many high-performance RISC (Reduced Instruction Set Computer) processors pay a very large performance penalty for movement of unaligned data. When possible, leave numeric data at full width (eight bytes). Note that SAS must widen short numeric data for any arithmetic operation. On the other hand, short numeric data can save both memory and I/O. You must determine which method is most advantageous for your operating environment and situation. "
How can the data-view approach be effective reordering the data?
This question arises when going to use statistical procedures on this reordered dataset. The only way I see is that the data is fully processed by the datastep and then fed to the proc instead of feeding the data immediate. Additional research on performance aspects is needed.
Conclusion
Reliabililty of statement(2) is 100% an confidence interval is not applicable.
Statement(1) is left to the reader.
External references
1* 2676 - Determining the order of variables within a SAS data set
2* SAS(R) 9.4 Language Reference: Concepts, Third Edition Specifying Variable Lengths Techniques for Optimizing CPU Performance
3* Xia program as dd 06-0-2014
data have/view=have;
retain age weight height name;
set sashelp.class;
run;
: I don't disagree with your facts, but I don't agree with your premise. I don't think reordering variables is a definite sign of either laziness or not understanding how SAS works.
Reordering the way that variables appear (both to the user and the PDV) has its benefits, particularly in using variable lists and in reviewing data. Your non-lazy programmer is wasting a lot of time typing when they could have been using variable lists, more prone to unnecessary error and, if they're on a case sensitive operating system, more likely to make errors.
How SAS actually stores the variables in such cases is irrelevant to the discussion as far as I can see. The following is a way to move a variable to the middle, with no side effects that I'm aware of (unless, of course, one decides to accomplish more in the final data step than simply reordering the variables):
data have;
retain X1-X10 y1-y10 (20*1);
do i=1 to 3;
output;
end;
run;
proc sql noprint;
select name
into :reorder separated by " "
from
(select name
from dictionary.columns
where libname="WORK" and
memname="HAVE" and
varnum le 10
outer union corr
select name
from dictionary.columns
where libname="WORK" and
memname="HAVE" and
varnum eq 21
outer union corr
select name
from dictionary.columns
where libname="WORK" and
memname="HAVE" and
11 le varnum le 20
)
;
quit;
data want;
retain &reorder.;
set have;
run;
Hi Arthur
Thanks for your helpful code. Just a quick question. My dataset consists of about 100 columns and what I'm attempting to do is to create two new field (claim_amt and claim_no) and place it next to an existing field (claim_type) which is currently halfway through the dataset (column 50).
Is there a way to use your code but where you can dynamically find the position of the field claim_type rather than manually working out the position of the column where you want to place the new fields. Also, if there are two fields rather than 1(from your code) what amendments to your code are required?
Just to clairify why I need this.... I export to excel so the presentation of the dataset is important and needs to follow a logical order.
Many thanks
Matthew
Hi Matthew,
I would use a multiple datastep merge with itself, each time with a block of variables already in wanted order order:
data have;
merge have (keep = 1st variable in dataset -- claim_type) /* this keeps all varibles in between*/
have (keep = claim_amt claim_no)
have (keep = 1st variable after claim_type -- last variable before claim_amt)
have (keep = ... ); /*if there are variables after claim_no */
run;
Depending on the setting of the SAS option MERGENOBY you may get a warning that merge is done without a BY statement, you can ignore here
Hans
Matthew,
Here is one way to do it:
data have (drop=i);
retain X1-X10 claim_type y1-y10 (21*1);
do i=1 to 3;
output;
end;
run;
data to_add;
informat claim_amt comma8.;
informat claim_no $3.;
input claim_amt claim_no;
cards;
86.42 001
5,680.00 002
257.00 003
;
data want;
set have;
set to_add;
run;
proc sql noprint;
select varnum
into :first_end
from dictionary.columns
where libname="WORK" and
memname="WANT" and
name="claim_type"
;
select name
into :reorder separated by " "
from
(select name
from dictionary.columns
where libname="WORK" and
memname="WANT" and
varnum le &first_end.
outer union corr
select name
from dictionary.columns
where libname="WORK" and
memname="TO_ADD" )
;
quit;
data want;
retain &reorder.;
set want;
run;
Thanks Arthur
That works perfect however, when I try to move two fields form the middle to the end (rather than the other way around) it does't work. In the example above if you wanted to move claim_type to the right of the new variables at the end?
Thanks
data have (drop=i);
retain X1-X10 claim_type y1-y10 (21*1);
do i=1 to 3;
output;
end;
run;
data to_add;
informat claim_amt comma8.;
informat claim_no $3.;
input claim_amt claim_no;
cards;
86.42 001
5,680.00 002
257.00 003
;
data want;
set have;
set to_add;
run;
proc sql noprint;
select varnum
into :first_end
from dictionary.columns
where libname="WORK" and
memname="WANT" and
name="claim_type"
;
select name
into :reorder separated by " "
from
(select name
from dictionary.columns
where libname="WORK" and
memname="WANT" and
varnum lt &first_end.
outer union corr
select name
from dictionary.columns
where libname="WORK" and
memname="WANT" and
varnum gt &first_end.
outer union corr
select name
from dictionary.columns
where libname="WORK" and
memname="WANT" and
varnum eq &first_end.
)
;
quit;
data want;
retain &reorder.;
set want;
run;
You can only physically re-order columns in a table by fully re-creating the table. Question is: Why is it important to you in which order the variables are stored? It shouldn't matter.
Usually people care about the order of the fields in their tables because they are actively using the data. If you want to examine the data using generic interactive tools (like SAS display manager) then the order that the data is presented makes a big difference in how easy the data is to use.
@Tom your argument of examining data using interactive tools is a valid one.
The retain trick Xia's program and add Arthurs one to that, with a data/view will work but needing an additional step. It will work even with an external DBMS. The involved overhead is not a recommendation to be used that with real analytics. The trick Arthurs is proposing is having a last step replacing the data. This will cause making a full copy (writing) of the data. That could be a problem as of sizing (real big data) "side effect-1" or as of security the dataset is in a DWH copying the data is not being allowed side "effect-2".
As for SAS datasets you Tom and Arthur can agree that the physical order is different to the logical one.
SAS-institute could offer a simple logical reordering as part of "proc datasets". They choosed not to do that, please explain why.
While using the SAS procedures there is no PDV and often there is just a small subset of all variables involved.
The best way to handle that is where processing as dataset option and only naming the needed variables. "side effect-3"
They decided not to do that. Their effort on data processing is going at In-database processing and in-memory analytics.
That is seeming to moving away from a pc-based approach "side effect-4".
The analytics is moving to a more click-mick approach as seen wit Eminer and a lot of other solutions. Within those you are expected to a "do not code" habit. Typos are of history as do not type anymore and much longer used variable names are coming along "side effect-5".
Arthur where you do not see any side effects, I can see 5.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.