BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cmoore
Obsidian | Level 7

Hi,

 

I have a fairly big dataset that contains 1800 columns.For each record, I want to be able to remove blank columns with no data and move anything with data to the start of the data table (furthest left). Please see attached the want and have view of the data. Would you happen to know the best way to achieve this on a large dataset?

 

Thanks


Chris

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

That should be fairly easy to do, something like:

 

data want;
  set have;
  array columns col1-col1800;
  do i=1 to n(of columns(*));
    if missing(columns(i)) then do j=max(j,i)+1 to dim(columns);
      if not missing(columns(j)) then do;
        columns(i)=columns(j);
        call missing(columns(j));
        leave;
        end;
end; end;
drop i j; run;

Only you should make sure that all your variables have the same length, otherwise your variables may be truncated. If they are not the same length, you can put in a LENGTH statement before the SET statement, e.g. "length col1-col1800 $60;" or whatever is the length of the longest column.

 

If you want to drop the empty columns afterwards, you may want to store the maximum number of non-missing columns in a macro variable:

data temp;
  length col1-col1800 $60; /* be sure to have length of longest column */
  set have end=done;
  array columns col1-col1800;
  do i=1 to n(of columns(*));
    if missing(columns(i)) then do j=max(j,i)+1 to dim(columns);
      if not missing(columns(j)) then do;
        columns(i)=columns(j);
        call missing(columns(j));
        leave;
        end;
      end;
    end;
  retain maxcol 0;
  maxcol=max(maxcol,i); /* i is actually N(of columns(*))+1 now */
  drop i j maxcol;
  if done then call symputx('maxcol',maxcol);
run;

data want;
  set temp;
  drop col&maxcol-col1800;
run;

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post example data in the form of a datastep using the {i} code window.  I will not download Excel files, nor write code to try to process them.  

 

For your issue, you can remove columns by doing the following.  First identify columns with all missing observations, you can use the code found in my post here:

https://communities.sas.com/t5/SAS-Statistical-Procedures/How-do-i-get-the-summary-statistics-of-a-d...

To do this, then generate a macro variable with all the variables to be dropped from there.

 

As for moving variables, this doesn't make any sense.  Logical position in the dataset is irrelevant from a programming perspective.  If you need to re-arrange for a data transfer or report, then setup a template of what you want the output to look like then set that and set your data underneath that.

 

You will also find your programming life is much easier if you use normalised (goes down the page) rather than transposed (goes across the page) data.  For instance this issue you face here, if you have:

VARNAME   VARRESULT

abc               123

...

 

Then is is a simple where clause to drop all missing data, and at the end when you want to export or report the data, you can simply transpose the data then.

s_lassen
Meteorite | Level 14

That should be fairly easy to do, something like:

 

data want;
  set have;
  array columns col1-col1800;
  do i=1 to n(of columns(*));
    if missing(columns(i)) then do j=max(j,i)+1 to dim(columns);
      if not missing(columns(j)) then do;
        columns(i)=columns(j);
        call missing(columns(j));
        leave;
        end;
end; end;
drop i j; run;

Only you should make sure that all your variables have the same length, otherwise your variables may be truncated. If they are not the same length, you can put in a LENGTH statement before the SET statement, e.g. "length col1-col1800 $60;" or whatever is the length of the longest column.

 

If you want to drop the empty columns afterwards, you may want to store the maximum number of non-missing columns in a macro variable:

data temp;
  length col1-col1800 $60; /* be sure to have length of longest column */
  set have end=done;
  array columns col1-col1800;
  do i=1 to n(of columns(*));
    if missing(columns(i)) then do j=max(j,i)+1 to dim(columns);
      if not missing(columns(j)) then do;
        columns(i)=columns(j);
        call missing(columns(j));
        leave;
        end;
      end;
    end;
  retain maxcol 0;
  maxcol=max(maxcol,i); /* i is actually N(of columns(*))+1 now */
  drop i j maxcol;
  if done then call symputx('maxcol',maxcol);
run;

data want;
  set temp;
  drop col&maxcol-col1800;
run;
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 4814 views
  • 4 likes
  • 5 in conversation