BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brophymj
Quartz | Level 8

Quick question...

I know you can order fields by using the following code:

data help;

retain

field1 field2 ...;

set help;

run;

But I have a very large data set and I would like a clean way of ordering new variables created along the way a particular way. Is there a way of doing this?

Another question - if you want a new field to appear beside an existing field that is in the middle of your dataset is this possible?

Many thanks

Matthew

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Matthew,

Here is one way to do it:

data have (drop=i);

  retain X1-X10 claim_type y1-y10 (21*1);

  do i=1 to 3;

    output;

  end;

run;

data to_add;

  informat claim_amt comma8.;

  informat claim_no $3.;

  input claim_amt claim_no;

  cards;

86.42 001

5,680.00  002

257.00  003

;

data want;

  set have;

  set to_add;

run;

proc sql noprint;

  select varnum

    into :first_end

      from dictionary.columns

        where libname="WORK" and

              memname="WANT" and

              name="claim_type"

  ;

             

  select name

    into :reorder separated by " "

      from

        (select name

           from dictionary.columns

             where libname="WORK" and

                   memname="WANT" and

                   varnum le &first_end.

        outer union corr

        select name

          from dictionary.columns

            where libname="WORK" and

                  memname="TO_ADD"        )    

;

quit;

data want;

  retain &reorder.;

  set want;

run;

View solution in original post

17 REPLIES 17
jakarman
Barite | Level 11

With coding you can do a lot.

Changing the order as renaming a variable using proc dataset... no.

https://communities.sas.com/thread/35111

---->-- ja karman --<-----
Ksharp
Super User

Do you mind to create a view for that purpose ? it almost cost no time for you .

data help/view=help;

retain

field1 field2 ...;

set help;

run;

CTorres
Quartz | Level 8

Your code generates the following ERROR: UPDATE views are not supported.

The view should have a different name from the original dataset:

data help1/view=help1;

retain

field1 field2 ...;

set help;

run;

Ksharp
Super User

Sorry . that code is not tested . try this tested one :

data have/view=have;

retain age weight height name;

set sashelp.class;

run;

Xia Keshan

jakarman
Barite | Level 11

Ordering fields, why should you bothering about that.
My statement(1): You are either a lazy coder or nor really knowing how the data is stored.

Let me do some argumentation: 

1/ With SAS the language you have logical order defined in the syntax for SAS-datasets. Are you going to use a RDBMS SQL that that order definition (*) does not exist.
2/ With a SAS datastep you are using a PDV (Program-Data-Vector) it is document the a variable when it gets defined that will be the order.

My Statement(2): The order of the variables in a SAS dataset (stored) is not the order you are seeing/thinking while processing the dataset.

Ha Hypothesis The order of the variables is the one you are seeing         Hb the order of the variables is NOT the one you are seeing.
Prove:

Xia's program is working the retain approach will show the variables is an different order. But is not changing the type char/numeric and length of the variables

As the variables only can get a place in the PDV with the complete length information being known the retain is only defining the order in some kind of table not being the real storage place in the PDV.

Can we find additional information for this prove?

See reference 1:

"In SAS 6 of the SAS System, the position of variables on a SAS data set is determined by the order in which they are encountered regardless of their type or length. In order to improve performance, SAS 7 and beyond of the SAS System positions 8 byte and 4 byte numeric variables at 8 byte boundaries at the front of a SAS data set."

See refrence 2:

"When SAS processes the program data vector, it typically moves the data in one large operation rather than by individual variables. When data is properly aligned (in 8-byte boundaries), data movement can occur in as little as two clock cycles (a single load followed by a single store). SAS moves unaligned data by more complex means, at worst, a single byte at a time. This would be at least eight times slower for an 8-byte variable.                        

Many high-performance RISC (Reduced Instruction Set Computer) processors pay a very large performance penalty for movement of unaligned data. When possible, leave numeric data at full width (eight bytes). Note that SAS must widen short numeric data for any arithmetic operation. On the other hand, short numeric data can save both memory and I/O. You must determine which method is most advantageous for your operating environment and situation. "

How can the data-view approach be effective reordering the data?

This question arises when going to use statistical procedures on this reordered dataset. The only way I see is that the data is fully processed by the datastep and then fed to the proc instead of feeding the data immediate. Additional research on performance aspects is needed.

Conclusion

Reliabililty of statement(2) is 100% an confidence interval is not applicable.

Statement(1) is left to the reader.

      

External references

1* 2676 - Determining the order of variables within a SAS data set

2* SAS(R) 9.4 Language Reference: Concepts, Third Edition Specifying Variable Lengths    Techniques for Optimizing CPU Performance

3* Xia program as dd 06-0-2014

    data have/view=have;

       retain age weight height name;

       set sashelp.class;

    run;

---->-- ja karman --<-----
art297
Opal | Level 21

: I don't disagree with your facts, but I don't agree with your premise. I don't think reordering variables is a definite sign of either laziness or not understanding how SAS works.

Reordering the way that variables appear (both to the user and the PDV) has its benefits, particularly in using variable lists and in reviewing data. Your non-lazy programmer is wasting a lot of time typing when they could have been using variable lists, more prone to unnecessary error and, if they're on a case sensitive operating system, more likely to make errors.

How SAS actually stores the variables in such cases is irrelevant to the discussion as far as I can see. The following is a way to move a variable to the middle, with no side effects that I'm aware of (unless, of course, one decides to accomplish more in the final data step than simply reordering the variables):

data have;

  retain X1-X10 y1-y10 (20*1);

  do i=1 to 3;

    output;

  end;

run;

proc sql noprint;

  select name

    into :reorder separated by " "

      from

        (select name

           from dictionary.columns

             where libname="WORK" and

                   memname="HAVE" and

                   varnum le 10

        outer union corr

        select name

          from dictionary.columns

            where libname="WORK" and

                  memname="HAVE" and

                  varnum eq 21

        outer union corr

        select name

          from dictionary.columns

            where libname="WORK" and

                  memname="HAVE" and

                  11 le varnum le 20

        )    

;

quit;

data want;

  retain &reorder.;

  set have;

run;

brophymj
Quartz | Level 8

Hi Arthur

Thanks for your helpful code. Just a quick question. My dataset consists of about 100 columns and what I'm attempting to do is to create two new field (claim_amt and claim_no) and place it next to an existing field (claim_type) which is currently halfway through the dataset (column 50).

Is there a way to use your code but where you can dynamically find the position of the field claim_type rather than manually working out the position of the column where you want to place the new fields. Also, if there are two fields rather than 1(from your code) what amendments to your code are required?

Just to clairify why I need this.... I export to excel so the presentation of the dataset is important and needs to follow a logical order.

Many thanks

Matthew

HWSteinberg
Calcite | Level 5

Hi Matthew,

I would use a multiple datastep merge with itself, each time with a block of variables already in wanted order order:

data have;

  merge have (keep = 1st variable in dataset -- claim_type) /* this keeps all varibles in between*/

            have (keep = claim_amt claim_no)

            have (keep = 1st variable after claim_type -- last variable before claim_amt)

            have (keep = ... ); /*if there are variables after claim_no */

run;

Depending on the setting of the SAS option MERGENOBY you may get a warning that merge is done without a BY statement, you can ignore here

Hans

art297
Opal | Level 21

Matthew,

Here is one way to do it:

data have (drop=i);

  retain X1-X10 claim_type y1-y10 (21*1);

  do i=1 to 3;

    output;

  end;

run;

data to_add;

  informat claim_amt comma8.;

  informat claim_no $3.;

  input claim_amt claim_no;

  cards;

86.42 001

5,680.00  002

257.00  003

;

data want;

  set have;

  set to_add;

run;

proc sql noprint;

  select varnum

    into :first_end

      from dictionary.columns

        where libname="WORK" and

              memname="WANT" and

              name="claim_type"

  ;

             

  select name

    into :reorder separated by " "

      from

        (select name

           from dictionary.columns

             where libname="WORK" and

                   memname="WANT" and

                   varnum le &first_end.

        outer union corr

        select name

          from dictionary.columns

            where libname="WORK" and

                  memname="TO_ADD"        )    

;

quit;

data want;

  retain &reorder.;

  set want;

run;

brophymj
Quartz | Level 8

Thanks Arthur

That works perfect however, when I try to move two fields form the middle to the end (rather than the other way around) it does't work. In the example above if you wanted to move claim_type to the right of the new variables at the end?

Thanks

art297
Opal | Level 21

data have (drop=i);

  retain X1-X10 claim_type y1-y10 (21*1);

  do i=1 to 3;

    output;

  end;

run;

data to_add;

  informat claim_amt comma8.;

  informat claim_no $3.;

  input claim_amt claim_no;

  cards;

86.42 001

5,680.00  002

257.00  003

;

data want;

  set have;

  set to_add;

run;

proc sql noprint;

  select varnum

    into :first_end

      from dictionary.columns

        where libname="WORK" and

              memname="WANT" and

              name="claim_type"

  ;

             

  select name

    into :reorder separated by " "

      from

        (select name

           from dictionary.columns

             where libname="WORK" and

                   memname="WANT" and

                   varnum lt &first_end.

        outer union corr

        select name

           from dictionary.columns

             where libname="WORK" and

                   memname="WANT" and

                   varnum gt &first_end.

        outer union corr

        select name

           from dictionary.columns

             where libname="WORK" and

                   memname="WANT" and

                   varnum eq &first_end.

       )

;

quit;

data want;

  retain &reorder.;

  set want;

run;

Patrick
Opal | Level 21

You can only physically re-order columns in a table by fully re-creating the table. Question is: Why is it important to you in which order the variables are stored? It shouldn't matter.

Tom
Super User Tom
Super User

Usually people care about the order of the fields in their tables because they are actively using the data.  If you want to examine the data using generic interactive tools (like SAS display manager) then the order that the data is presented makes a big difference in how easy the data is to use.

jakarman
Barite | Level 11

@Tom your argument of examining data using interactive tools is a valid one.

The retain trick Xia's program and add Arthurs one to that, with a data/view will work but needing an additional step. It will work even with an external DBMS. The involved overhead is not a recommendation to be used that with real analytics.  The trick Arthurs is proposing is having a last step replacing the data. This will cause making a full copy (writing) of the data. That could be a problem as of sizing (real  big data) "side effect-1" or as of security the dataset is in a DWH copying the data is not being allowed side "effect-2".

As for SAS datasets you Tom and Arthur can agree that the physical order is different to the logical one. 
SAS-institute could offer a simple logical reordering as part of "proc datasets".  They choosed not to do that, please explain why. 

While using the SAS procedures there is no PDV and often there is just a small subset of all variables involved.

The best way to handle that is where processing as dataset option and only naming the needed variables.  "side effect-3"  

They decided not to do that. Their effort on data processing is going at In-database processing and in-memory analytics.

That is seeming to moving away from a pc-based approach "side effect-4".         

The analytics is  moving to a more click-mick approach as seen wit Eminer and a lot of other solutions.  Within those you are expected to a "do not code" habit. Typos are of history as do not type anymore and much longer used variable names are coming along  "side effect-5".

Arthur where you do not see any side effects, I can see 5.

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2004 views
  • 1 like
  • 8 in conversation