BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jtscheirer
Calcite | Level 5

I have a large dataset, with about 60,000 rows and 35 columns. I generated a few calculations, which were appended as columns to the end of the table, but I want them to appear in the middle, next to the columns that were used to produce these calculations. I know I can use RETAIN or PROC SQL to list columns in the order I want them to appear, but that is not practical with this large of a dataset. Is there a shorthand way I can just move a couple of columns from the end to specific positions in the middle without having to list every single column name and generate a new table that takes a while to run?

 

My background is in R, which has the INDEX function that allows users to move columns to a specific position. I'm hoping there is something similar in SAS.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@jtscheirer wrote:

It was only taking a long time to run because I wrote an error in the code. So you are correct, it does not take long to run. But typing out 35 column names is still impractical. I simply want to move columns 33-35 to occupy positions 19-21 instead. The structure would stay the same (60,000 rows and 35 columns)


The number of observations (I assume that is what you meant by "rows") does not matter to the code, just how long it might take to run.  The position number of the variables (I assume that is what you meant by "columns") is useless for this problem.  But if you know the NAMES of the first, 18th, 33rd, and 35th variables then the code is simple using positional variable lists.

 

Just so we can share some example code let's assume the variables are currently named V1 to V100.  So the names of interest are V1, V18, V33 and V35.  In which case you could run this program.  No RETAIN statement needed.

data want;
  set have(keep=v1 -- v18);
  set have(keep=v33 -- v35);
  set have;
run;

How does it work?  The data step compiler places the variables into the dataset in the order they are first "seen".  So the first SET statement let's it know about the first 18 variables. Then it sees the ones you want moved.  Then it adds the rest when it sees the final SET.  When it runs it will  have to do a little bit of extra working copying the same information over and over, but since SAS and the operating system will cache the dataset it will be in memory transfers so have a minimal impact on performance. 

If you are worried about that extra work then here is a "trick" to get the data step to "see" the variables without actually reading the observations multiple times.

data want;
  if 1=0 then set have(keep=v1 -- v18) have(keep=v33 -- v35);
  set have;
run;

The obviously false IF condition means the first SET statement will never execute, but the datasets are still examined during the compile step so their variables are placed in the desired order.

 

But why do you want to move the variables around?  If you want them to appear in that order in some report then just tell the reporting code that fact.  Again using my simple example names that code might look like this:

proc print data=have;
  var v1--v18 v33--v35 v19--v32 v36--v100;
run;

 

View solution in original post

11 REPLIES 11
Astounding
PROC Star

Nothing takes a long time to run.  Your data set is small.

 

There might be a way to write a shorter program that doesn't list all the names.  But you would have to explain the structure that you have and the structure that you want.  That's about the same amount of work as using a RETAIN statement.  But if you spell out the before and after structure, I can probably give you a short cut.

jtscheirer
Calcite | Level 5

It was only taking a long time to run because I wrote an error in the code. So you are correct, it does not take long to run. But typing out 35 column names is still impractical. I simply want to move columns 33-35 to occupy positions 19-21 instead. The structure would stay the same (60,000 rows and 35 columns)

Tom
Super User Tom
Super User

@jtscheirer wrote:

It was only taking a long time to run because I wrote an error in the code. So you are correct, it does not take long to run. But typing out 35 column names is still impractical. I simply want to move columns 33-35 to occupy positions 19-21 instead. The structure would stay the same (60,000 rows and 35 columns)


The number of observations (I assume that is what you meant by "rows") does not matter to the code, just how long it might take to run.  The position number of the variables (I assume that is what you meant by "columns") is useless for this problem.  But if you know the NAMES of the first, 18th, 33rd, and 35th variables then the code is simple using positional variable lists.

 

Just so we can share some example code let's assume the variables are currently named V1 to V100.  So the names of interest are V1, V18, V33 and V35.  In which case you could run this program.  No RETAIN statement needed.

data want;
  set have(keep=v1 -- v18);
  set have(keep=v33 -- v35);
  set have;
run;

How does it work?  The data step compiler places the variables into the dataset in the order they are first "seen".  So the first SET statement let's it know about the first 18 variables. Then it sees the ones you want moved.  Then it adds the rest when it sees the final SET.  When it runs it will  have to do a little bit of extra working copying the same information over and over, but since SAS and the operating system will cache the dataset it will be in memory transfers so have a minimal impact on performance. 

If you are worried about that extra work then here is a "trick" to get the data step to "see" the variables without actually reading the observations multiple times.

data want;
  if 1=0 then set have(keep=v1 -- v18) have(keep=v33 -- v35);
  set have;
run;

The obviously false IF condition means the first SET statement will never execute, but the datasets are still examined during the compile step so their variables are placed in the desired order.

 

But why do you want to move the variables around?  If you want them to appear in that order in some report then just tell the reporting code that fact.  Again using my simple example names that code might look like this:

proc print data=have;
  var v1--v18 v33--v35 v19--v32 v36--v100;
run;

 

jtscheirer
Calcite | Level 5
This is exactly what I was looking for. Thanks, Tom!
Kurt_Bremser
Super User

Read the variable list from SASHELP.VCOLUMN.

Add 3 to the variable number for all variables 19 and above, with the exception of variables 33 and above, where you subtract 14.

Sort the resulting dataset by variable number, and use SELECT INTO to create a macro variable containing the names in the desired order, which you can then use in a RETAIN.

PaigeMiller
Diamond | Level 26

What possible benefit is there to moving columns from their current position to another position in the data set?

 

Please tell us, in my mind, there is no benefit, please educate me.

 

 

--
Paige Miller
jtscheirer
Calcite | Level 5
This is just how the client requested the data be presented to them. The newly generated columns would sit next to other columns that were used to calculate them, as well as others that are useful for comparison. So they would be able to compare columns that are next to each other rather than columns that require scrolling across the dataset to compare.

Regardless of the actual utility of this, that's just how they asked me to present the data.
ballardw
Super User

"Presented to them" would generally mean a report to me, not a raw data set.

 

And as a value added I would offer a report that showed the comparison between columns.

 

Personally I find the idea of scrolling through 60,000 lines of data set and "comparing columns" to be a nightmare. That is reading 1,000 pages of text. My response as an analyst in general would be "what are you looking for in the comparison". Then extract and present in a nice form the records of most interest already identified.

 

Or maybe present a graph that would have the cases of interest pop out easier than reading 1,000s of pages of text.

jtscheirer
Calcite | Level 5
"Presented" was the wrong choice of word here. The output they want is the actual dataset so that they can use it for some other purpose. They just asked me for an excel sheet with new columns inserted into specific positions. So that is what I'm trying to do. And I think I've figured out how, based on a couple comments above.
Tom
Super User Tom
Super User

You can use ODS EXCEL to create an Excel sheet from a print out such as that generated by PROC PRINT or PROC REPORT.

ballardw
Super User

@jtscheirer wrote:
"Presented" was the wrong choice of word here. The output they want is the actual dataset so that they can use it for some other purpose. They just asked me for an excel sheet with new columns inserted into specific positions. So that is what I'm trying to do. And I think I've figured out how, based on a couple comments above.

NOT the actual data set as they requested an Excel sheet.

ods excel file="path\file.xlsx";

proc print data=someset noobs label;
   var <use the col1 -- colx list for the first columns>
       <use the -- list for the "inserted" calculated columns>
       <use the colx+1 -- coly list for the last columns>
   ;
run;

ods excel;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1882 views
  • 7 likes
  • 6 in conversation