DATA Step, Macro, Functions and more

Cleaning a dataset to move column data to the left and ignore blank columns

Accepted Solution Solved
Reply
Contributor
Posts: 48
Accepted Solution

Cleaning a dataset to move column data to the left and ignore blank columns

Hi,

 

I have a fairly big dataset that contains 1800 columns.For each record, I want to be able to remove blank columns with no data and move anything with data to the start of the data table (furthest left). Please see attached the want and have view of the data. Would you happen to know the best way to achieve this on a large dataset?

 

Thanks


Chris

 

 


Accepted Solutions
Solution
‎12-12-2017 07:07 AM
PROC Star
Posts: 228

Re: Cleaning a dataset to move column data to the left and ignore blank columns

That should be fairly easy to do, something like:

 

data want;
  set have;
  array columns col1-col1800;
  do i=1 to n(of columns(*));
    if missing(columns(i)) then do j=max(j,i)+1 to dim(columns);
      if not missing(columns(j)) then do;
        columns(i)=columns(j);
        call missing(columns(j));
        leave;
        end;
end; end;
drop i j; run;

Only you should make sure that all your variables have the same length, otherwise your variables may be truncated. If they are not the same length, you can put in a LENGTH statement before the SET statement, e.g. "length col1-col1800 $60;" or whatever is the length of the longest column.

 

If you want to drop the empty columns afterwards, you may want to store the maximum number of non-missing columns in a macro variable:

data temp;
  length col1-col1800 $60; /* be sure to have length of longest column */
  set have end=done;
  array columns col1-col1800;
  do i=1 to n(of columns(*));
    if missing(columns(i)) then do j=max(j,i)+1 to dim(columns);
      if not missing(columns(j)) then do;
        columns(i)=columns(j);
        call missing(columns(j));
        leave;
        end;
      end;
    end;
  retain maxcol 0;
  maxcol=max(maxcol,i); /* i is actually N(of columns(*))+1 now */
  drop i j maxcol;
  if done then call symputx('maxcol',maxcol);
run;

data want;
  set temp;
  drop col&maxcol-col1800;
run;

View solution in original post


All Replies
Super User
Super User
Posts: 9,211

Re: Cleaning a dataset to move column data to the left and ignore blank columns

Please post example data in the form of a datastep using the {i} code window.  I will not download Excel files, nor write code to try to process them.  

 

For your issue, you can remove columns by doing the following.  First identify columns with all missing observations, you can use the code found in my post here:

https://communities.sas.com/t5/SAS-Statistical-Procedures/How-do-i-get-the-summary-statistics-of-a-d...

To do this, then generate a macro variable with all the variables to be dropped from there.

 

As for moving variables, this doesn't make any sense.  Logical position in the dataset is irrelevant from a programming perspective.  If you need to re-arrange for a data transfer or report, then setup a template of what you want the output to look like then set that and set your data underneath that.

 

You will also find your programming life is much easier if you use normalised (goes down the page) rather than transposed (goes across the page) data.  For instance this issue you face here, if you have:

VARNAME   VARRESULT

abc               123

...

 

Then is is a simple where clause to drop all missing data, and at the end when you want to export or report the data, you can simply transpose the data then.

Solution
‎12-12-2017 07:07 AM
PROC Star
Posts: 228

Re: Cleaning a dataset to move column data to the left and ignore blank columns

That should be fairly easy to do, something like:

 

data want;
  set have;
  array columns col1-col1800;
  do i=1 to n(of columns(*));
    if missing(columns(i)) then do j=max(j,i)+1 to dim(columns);
      if not missing(columns(j)) then do;
        columns(i)=columns(j);
        call missing(columns(j));
        leave;
        end;
end; end;
drop i j; run;

Only you should make sure that all your variables have the same length, otherwise your variables may be truncated. If they are not the same length, you can put in a LENGTH statement before the SET statement, e.g. "length col1-col1800 $60;" or whatever is the length of the longest column.

 

If you want to drop the empty columns afterwards, you may want to store the maximum number of non-missing columns in a macro variable:

data temp;
  length col1-col1800 $60; /* be sure to have length of longest column */
  set have end=done;
  array columns col1-col1800;
  do i=1 to n(of columns(*));
    if missing(columns(i)) then do j=max(j,i)+1 to dim(columns);
      if not missing(columns(j)) then do;
        columns(i)=columns(j);
        call missing(columns(j));
        leave;
        end;
      end;
    end;
  retain maxcol 0;
  maxcol=max(maxcol,i); /* i is actually N(of columns(*))+1 now */
  drop i j maxcol;
  if done then call symputx('maxcol',maxcol);
run;

data want;
  set temp;
  drop col&maxcol-col1800;
run;
PROC Star
Posts: 1,190

Re: Cleaning a dataset to move column data to the left and ignore blank columns

Sounds like this is excactly what you want to do: Shift non-missing values left in an observation Smiley Happy

Super User
Posts: 13,066

Re: Cleaning a dataset to move column data to the left and ignore blank columns

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 285 views
  • 4 likes
  • 5 in conversation