Solved: Re: Move columns to a specific position

jtscheirer · Posted 09-13-2024 07:27 PM

I have a large dataset, with about 60,000 rows and 35 columns. I generated a few calculations, which were appended as columns to the end of the table, but I want them to appear in the middle, next to the columns that were used to produce these calculations. I know I can use RETAIN or PROC SQL to list columns in the order I want them to appear, but that is not practical with this large of a dataset. Is there a shorthand way I can just move a couple of columns from the end to specific positions in the middle without having to list every single column name and generate a new table that takes a while to run?

My background is in R, which has the INDEX function that allows users to move columns to a specific position. I'm hoping there is something similar in SAS.

Thanks!

Tom · Posted 09-13-2024 08:48 PM

@jtscheirer wrote:

It was only taking a long time to run because I wrote an error in the code. So you are correct, it does not take long to run. But typing out 35 column names is still impractical. I simply want to move columns 33-35 to occupy positions 19-21 instead. The structure would stay the same (60,000 rows and 35 columns)

The number of observations (I assume that is what you meant by "rows") does not matter to the code, just how long it might take to run. The position number of the variables (I assume that is what you meant by "columns") is useless for this problem. But if you know the NAMES of the first, 18th, 33rd, and 35th variables then the code is simple using positional variable lists.

Just so we can share some example code let's assume the variables are currently named V1 to V100. So the names of interest are V1, V18, V33 and V35. In which case you could run this program. No RETAIN statement needed.

data want;
  set have(keep=v1 -- v18);
  set have(keep=v33 -- v35);
  set have;
run;

How does it work? The data step compiler places the variables into the dataset in the order they are first "seen". So the first SET statement let's it know about the first 18 variables. Then it sees the ones you want moved. Then it adds the rest when it sees the final SET. When it runs it will have to do a little bit of extra working copying the same information over and over, but since SAS and the operating system will cache the dataset it will be in memory transfers so have a minimal impact on performance.

If you are worried about that extra work then here is a "trick" to get the data step to "see" the variables without actually reading the observations multiple times.

data want;
  if 1=0 then set have(keep=v1 -- v18) have(keep=v33 -- v35);
  set have;
run;

The obviously false IF condition means the first SET statement will never execute, but the datasets are still examined during the compile step so their variables are placed in the desired order.

But why do you want to move the variables around? If you want them to appear in that order in some report then just tell the reporting code that fact. Again using my simple example names that code might look like this:

proc print data=have;
  var v1--v18 v33--v35 v19--v32 v36--v100;
run;

View solution in original post

Astounding · Posted 09-13-2024 07:49 PM

Nothing takes a long time to run. Your data set is small.

There might be a way to write a shorter program that doesn't list all the names. But you would have to explain the structure that you have and the structure that you want. That's about the same amount of work as using a RETAIN statement. But if you spell out the before and after structure, I can probably give you a short cut.

jtscheirer · Posted 09-13-2024 07:55 PM

It was only taking a long time to run because I wrote an error in the code. So you are correct, it does not take long to run. But typing out 35 column names is still impractical. I simply want to move columns 33-35 to occupy positions 19-21 instead. The structure would stay the same (60,000 rows and 35 columns)

Tom · Posted 09-13-2024 08:48 PM

@jtscheirer wrote:

It was only taking a long time to run because I wrote an error in the code. So you are correct, it does not take long to run. But typing out 35 column names is still impractical. I simply want to move columns 33-35 to occupy positions 19-21 instead. The structure would stay the same (60,000 rows and 35 columns)

The number of observations (I assume that is what you meant by "rows") does not matter to the code, just how long it might take to run. The position number of the variables (I assume that is what you meant by "columns") is useless for this problem. But if you know the NAMES of the first, 18th, 33rd, and 35th variables then the code is simple using positional variable lists.

Just so we can share some example code let's assume the variables are currently named V1 to V100. So the names of interest are V1, V18, V33 and V35. In which case you could run this program. No RETAIN statement needed.

data want;
  set have(keep=v1 -- v18);
  set have(keep=v33 -- v35);
  set have;
run;

How does it work? The data step compiler places the variables into the dataset in the order they are first "seen". So the first SET statement let's it know about the first 18 variables. Then it sees the ones you want moved. Then it adds the rest when it sees the final SET. When it runs it will have to do a little bit of extra working copying the same information over and over, but since SAS and the operating system will cache the dataset it will be in memory transfers so have a minimal impact on performance.

If you are worried about that extra work then here is a "trick" to get the data step to "see" the variables without actually reading the observations multiple times.

data want;
  if 1=0 then set have(keep=v1 -- v18) have(keep=v33 -- v35);
  set have;
run;

The obviously false IF condition means the first SET statement will never execute, but the datasets are still examined during the compile step so their variables are placed in the desired order.

But why do you want to move the variables around? If you want them to appear in that order in some report then just tell the reporting code that fact. Again using my simple example names that code might look like this:

proc print data=have;
  var v1--v18 v33--v35 v19--v32 v36--v100;
run;

jtscheirer · Posted 09-16-2024 04:40 PM

This is exactly what I was looking for. Thanks, Tom!

Kurt_Bremser · Posted 09-14-2024 09:11 AM

Read the variable list from SASHELP.VCOLUMN.

Add 3 to the variable number for all variables 19 and above, with the exception of variables 33 and above, where you subtract 14.

Sort the resulting dataset by variable number, and use SELECT INTO to create a macro variable containing the names in the desired order, which you can then use in a RETAIN.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 09-14-2024 07:14 AM

What possible benefit is there to moving columns from their current position to another position in the data set?

Please tell us, in my mind, there is no benefit, please educate me.

--
Paige Miller

jtscheirer · Posted 09-16-2024 01:26 PM

This is just how the client requested the data be presented to them. The newly generated columns would sit next to other columns that were used to calculate them, as well as others that are useful for comparison. So they would be able to compare columns that are next to each other rather than columns that require scrolling across the dataset to compare.

Regardless of the actual utility of this, that's just how they asked me to present the data.

ballardw · Posted 09-16-2024 03:14 PM

"Presented to them" would generally mean a report to me, not a raw data set.

And as a value added I would offer a report that showed the comparison between columns.

Personally I find the idea of scrolling through 60,000 lines of data set and "comparing columns" to be a nightmare. That is reading 1,000 pages of text. My response as an analyst in general would be "what are you looking for in the comparison". Then extract and present in a nice form the records of most interest already identified.

Or maybe present a graph that would have the cases of interest pop out easier than reading 1,000s of pages of text.

jtscheirer · Posted 09-16-2024 04:39 PM

"Presented" was the wrong choice of word here. The output they want is the actual dataset so that they can use it for some other purpose. They just asked me for an excel sheet with new columns inserted into specific positions. So that is what I'm trying to do. And I think I've figured out how, based on a couple comments above.

Tom · Posted 09-16-2024 04:41 PM

You can use ODS EXCEL to create an Excel sheet from a print out such as that generated by PROC PRINT or PROC REPORT.

ballardw · Posted 09-16-2024 04:57 PM

@jtscheirer wrote:
"Presented" was the wrong choice of word here. The output they want is the actual dataset so that they can use it for some other purpose. They just asked me for an excel sheet with new columns inserted into specific positions. So that is what I'm trying to do. And I think I've figured out how, based on a couple comments above.

NOT the actual data set as they requested an Excel sheet.

ods excel file="path\file.xlsx";

proc print data=someset noobs label;
   var <use the col1 -- colx list for the first columns>
       <use the -- list for the "inserted" calculated columns>
       <use the colx+1 -- coly list for the last columns>
   ;
run;

ods excel;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away