Hi,
My provider send me files very bad thinked, and I have no choice but to accept it.
One of the main problem is there can be in the Excel file they provide some rows or columns empty at any moment of the Excel file (usually at the beginning and end but not only). The number of empty rows/column change from one file to each other, and whatever the size of the file.
Do you know how I can remove efficiantly those "blank" rows/columns, without having to specify which row/column is impacted?
Second question, when I import the file I specify
getnames = "no"
When I do this, SAS names columns A, B, C...
When I remove the first column I will have column names B,C,...
If I succeed to remove "blank" columns automatically, I will have to go back to A, B, C... names to work efficiently on data.
Do you know I can do it?
Thanks for answers. 🙂
First of all, tell them to send the data in a reasonable format (File - Save As - Type csv in Excel). If they insist on Excel, triple your fee.
And charge again anytime you receive a file where the internal format has changed.
Because then you have to write an additonal step that corrects the crap delivered from Excel; proc import for Excel does not provide any good mechanisms to deal with Excel idiosyncrasies and stupid layouts. It just tries its best at guessing and delivers "something".
With text-based files, you can write a data step that names columns in a consistent manner, no matter where the name row is. Just adapt the firstobs= option in the infile statement.
If you have an empty column somewhere, just read it into a dummy variable that you drop immediately.
But this is all unnecessary bullsh*t. Insist on a proper, consistent file format and documentation. This should be possible even for people of lesser intellect.
What does your data transfer agreement state? Oh you don't have one...
If you are using Excel, then you could do it really simply in a VBA macro, just loop over columns in worksheet and if xlDown cell is last one, then that is empty and can be removed.
You can do it in SAS, bit of coding. If the columns which are blank are as numeric in SAS, then doa a proc means with var _numeric_ nmiss, then use the output dataset to create a list of variables to drop:
proc means data=fromexcel; var _numeric_; output out=todel nmiss=nmiss; run; proc sql; select _name_ into :listtodrop separated by " " from todel; quit; data want; set fromexcel (drop=&listtodrop.); run;
Remove empty rows and empty columns from excel
inspired by
https://goo.gl/ZlgFGn
https://communities.sas.com/t5/Base-SAS-Programming/how-to-check-if-a-macro-variable-exists/td-p/206121
Slightly more complex with a mixture of numeric and character columns in excel.
libname xel "d:/xls/colrow.xlsx";
* create some data;
data xel.have;
array cols[*] $8 Mary Joe Mike Jane Ted Tim;
do rep=1 to 5;
do var= 1 to dim(cols);
jane=' ';
if rep=3 then cols[var]='';
else cols[var]='A'!!put(int(100*uniform(-1)),z3.);
end;
drop rep var;
output;
end;
run;quit;
libname xel clear;
HAVE (Excel sheet with missing column and missing row)
Up to 40 obs WORK.HAVE total obs=5
Obs MARY JOE MIKE JANE TED TIM
1 A092 A041 A069 A069 A074
2 A010 A022 A069 A057 A068
3
4 A092 A006 A010 A009 A095
5 A009 A002 A068 A086 A094
WANT
Up to 40 obs from want total obs=4
Obs MARY JOE MIKE TED TIM
1 A005 A082 A037 A067 A088
2 A019 A071 A026 A060 A043
3 A044 A077 A070 A041 A062
4 A086 A062 A008 A051 A081
WORKING CODE
proc freq data=xel.have nlevels;
set xel.have(drop=&colmis);
if coalescec(of _character_)="" then delete;
FULL SOLUTION
=============
libname xel "d:/xls/colrow.xlsx";
* create excel workbook with sheet have and table have;
data xel.have;
array cols[*] $8 Mary Joe Mike Jane Ted Tim;
do rep=1 to 5;
do var= 1 to dim(cols);
jane=' ';
if rep=3 then cols[var]='';
else cols[var]='A'!!put(int(100*uniform(-1)),z3.);
end;
drop rep var;
output;
end;
run;quit;
ods output nlevels=nlevels;
proc freq data=xel.have nlevels;
run;quit;
ods select all;
/*
Up to 40 obs from WORK.NLEVELS total obs=6
Obs TABLEVAR TABLEVARLABEL NLEVELS NMISSLEVELS NNONMISSLEVELS
1 MARY MARY 5 1 4
2 JOE JOE 5 1 4
3 MIKE MIKE 5 1 4
4 JANE JANE 1 1 0
5 TED TED 5 1 4
6 TIM TIM 5 1 4
*/
%symdel colmis; * just incase it exists;
data _null_;
length colmis $1024;
do until (dne);
set nlevels end=dne;
if nnonmisslevels=0 then
colmis=catx(' ',colmis,tablevar);
end;
call symputx('colmis',colmis);
rc=dosubl('
data want;
set xel.have(drop=&colmis);
if coalescec(of _character_)="" then delete;
run;quit;
');
run;quit;
%put &=colmis;
/*
COLMIS=JANE
Up to 40 obs from want total obs=4
Obs MARY JOE MIKE TED TIM
1 A005 A082 A037 A067 A088
2 A019 A071 A026 A060 A043
3 A044 A077 A070 A041 A062
4 A086 A062 A008 A051 A081
*/
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.