Hi,
My provider send me files very bad thinked, and I have no choice but to accept it.
One of the main problem is there can be in the Excel file they provide some rows or columns empty at any moment of the Excel file (usually at the beginning and end but not only). The number of empty rows/column change from one file to each other, and whatever the size of the file.
Do you know how I can remove efficiantly those "blank" rows/columns, without having to specify which row/column is impacted?
Second question, when I import the file I specify
getnames = "no"
When I do this, SAS names columns A, B, C...
When I remove the first column I will have column names B,C,...
If I succeed to remove "blank" columns automatically, I will have to go back to A, B, C... names to work efficiently on data.
Do you know I can do it?
Thanks for answers. 🙂
First of all, tell them to send the data in a reasonable format (File - Save As - Type csv in Excel). If they insist on Excel, triple your fee.
And charge again anytime you receive a file where the internal format has changed.
Because then you have to write an additonal step that corrects the crap delivered from Excel; proc import for Excel does not provide any good mechanisms to deal with Excel idiosyncrasies and stupid layouts. It just tries its best at guessing and delivers "something".
With text-based files, you can write a data step that names columns in a consistent manner, no matter where the name row is. Just adapt the firstobs= option in the infile statement.
If you have an empty column somewhere, just read it into a dummy variable that you drop immediately.
But this is all unnecessary bullsh*t. Insist on a proper, consistent file format and documentation. This should be possible even for people of lesser intellect.
What does your data transfer agreement state? Oh you don't have one...
If you are using Excel, then you could do it really simply in a VBA macro, just loop over columns in worksheet and if xlDown cell is last one, then that is empty and can be removed.
You can do it in SAS, bit of coding. If the columns which are blank are as numeric in SAS, then doa a proc means with var _numeric_ nmiss, then use the output dataset to create a list of variables to drop:
proc means data=fromexcel; var _numeric_; output out=todel nmiss=nmiss; run; proc sql; select _name_ into :listtodrop separated by " " from todel; quit; data want; set fromexcel (drop=&listtodrop.); run;
Remove empty rows and empty columns from excel
inspired by
https://goo.gl/ZlgFGn
https://communities.sas.com/t5/Base-SAS-Programming/how-to-check-if-a-macro-variable-exists/td-p/206121
Slightly more complex with a mixture of numeric and character columns in excel.
libname xel "d:/xls/colrow.xlsx";
* create some data;
data xel.have;
 array cols[*] $8 Mary Joe Mike Jane Ted Tim;
 do rep=1 to 5;
    do var= 1 to dim(cols);
       jane=' ';
       if rep=3 then cols[var]='';
       else cols[var]='A'!!put(int(100*uniform(-1)),z3.);
    end;
    drop rep var;
    output;
 end;
run;quit;
libname xel clear;
HAVE (Excel sheet with missing column and missing row)
Up to 40 obs WORK.HAVE total obs=5
Obs    MARY    JOE     MIKE    JANE    TED     TIM
 1     A092    A041    A069            A069    A074
 2     A010    A022    A069            A057    A068
 3
 4     A092    A006    A010            A009    A095
 5     A009    A002    A068            A086    A094
WANT
Up to 40 obs from want total obs=4
Obs    MARY    JOE     MIKE    TED     TIM
 1     A005    A082    A037    A067    A088
 2     A019    A071    A026    A060    A043
 3     A044    A077    A070    A041    A062
 4     A086    A062    A008    A051    A081
WORKING CODE
       proc freq data=xel.have nlevels;
       set xel.have(drop=&colmis);
       if coalescec(of _character_)="" then delete;
FULL SOLUTION
=============
libname xel "d:/xls/colrow.xlsx";
* create excel workbook with sheet have and table have;
data xel.have;
 array cols[*] $8 Mary Joe Mike Jane Ted Tim;
 do rep=1 to 5;
    do var= 1 to dim(cols);
       jane=' ';
       if rep=3 then cols[var]='';
       else cols[var]='A'!!put(int(100*uniform(-1)),z3.);
    end;
    drop rep var;
    output;
 end;
run;quit;
ods output nlevels=nlevels;
proc freq data=xel.have nlevels;
run;quit;
ods select all;
/*
Up to 40 obs from WORK.NLEVELS total obs=6
Obs    TABLEVAR    TABLEVARLABEL    NLEVELS    NMISSLEVELS    NNONMISSLEVELS
 1       MARY          MARY            5            1                4
 2       JOE           JOE             5            1                4
 3       MIKE          MIKE            5            1                4
 4       JANE          JANE            1            1                0
 5       TED           TED             5            1                4
 6       TIM           TIM             5            1                4
*/
%symdel colmis; * just incase it exists;
data _null_;
   length colmis $1024;
   do until (dne);
      set nlevels end=dne;
      if nnonmisslevels=0 then
         colmis=catx(' ',colmis,tablevar);
   end;
   call symputx('colmis',colmis);
   rc=dosubl('
     data want;
       set xel.have(drop=&colmis);
       if coalescec(of _character_)="" then delete;
     run;quit;
   ');
run;quit;
%put &=colmis;
/*
COLMIS=JANE
Up to 40 obs from want total obs=4
Obs    MARY    JOE     MIKE    TED     TIM
 1     A005    A082    A037    A067    A088
 2     A019    A071    A026    A060    A043
 3     A044    A077    A070    A041    A062
 4     A086    A062    A008    A051    A081
*/
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
