DATA Step, Macro, Functions and more

Detecting blank variables

Reply
Contributor
Posts: 56

Detecting blank variables

Is there any easy way to detect any variables in a datset that has all blank or missing values ?

Thanks
N/A
Posts: 0

Re: Detecting blank variables

You could use proc freq on all variables and visually inspect. Alternatively, you could output the results of proc freq and inspect the resulting datset.
N/A
Posts: 0

Re: Detecting blank variables

[pre]
if n_value = . then missing="true";
if s_value = " " then blank = "true";

proc sql;
select * from indata where field is missing;
quit;
[/pre]
Contributor
Posts: 56

Re: Detecting blank variables

Hi Chuck,

Thanks for your interesting code. But maybe I did not present my issue clear enough. What I am looking for is to elminating all the varaibles(not variable value) from a dataset when ALL their values are blank or missing. For example, If I am reading from an excel spreadsheet with

data file3;
set inf.'Sheet1$b4:iv64000'n;
run;

I want to elminate all the excel blank columns in my dataset file3.

Thanks
N/A
Posts: 0

Re: Detecting blank variables

Another possible method would be to use PROC TRANSPOSE.

First read all the rows and eliminate all blank rows which can be easily achieved, as suggested by Chuck.

The you can use PROC TRASNPOSE which will convert the columns into rows, so again you have the same problem of eliminating blank rows. That being done you can transpose again to get back your original data.

A bit of attention should be paid while handling the header rows, that is all.
Contributor
Posts: 56

Re: Detecting blank variables

By the way, I need a way to eliminate the blank rows too. Thanks
N/A
Posts: 0

Re: Detecting blank variables

The solution would require multiple steps.

Eliminating blank rows is easiest and can be done in the initial read.
Test if each column is missing/blank, if any column is not missing/blank, then output that row.

You should know in advance which columns are of interest so that you don't have to test all 255 Excel columns.

SAS can easily control which columns are in a table through the use of keep and drop (both statements and dataset options). But, programmatically determining which to keep or drop is non-trivial. I expect this will require some MACRO programming to create a KEEP or DROP statement to use in a DATA step or a SELECT statement to use with PROC SQL; CREATE TABLE _________ as ...;
Contributor
Posts: 56

Re: Detecting blank variables

Hi Chuck,

Thanks for your reply. I just hoping for some easier way to get this done but I guess I have to settle to converting the dataset variables into an macor array and then examine them one by one.in a macro do loop.

Thanks
N/A
Posts: 0

Re: Detecting blank variables

No, you don't have to convert all the variables into a macro variables ...

You can determine if a column is blank by using a single select distinct max(...) statement for each column, then inspect the result within a macro using an open, fetchobs, varn and varc, etc.
Contributor
Posts: 56

Re: Detecting blank variables

Hi Chuck,

One more thing, even I define to read in 255 column and 64000 rows, SAS do eliminated a lot of blank columns and rows. In this case I was ended up with 11 columns and 300 rows in the dataset file3. However, out of the 11 columns there are about 5 blank colums (those columns SAS assign variables name such F2,F3..) and about 75 blank rows. The excel spread sheet actual has 6 column and 225 rows of data. I do not know how SAS can eliminae most rows and columns and not the others. It must have something to do with the way excel was set up that SAS think it is not an empty columns/row. Do you have any experience in readin excel in ?

Thanks
N/A
Posts: 0

Re: Detecting blank variables

Yes, we do read in Excel spreadsheets.
If your code is correct, it will drop all the columns and all the rows properly.
Contributor
Posts: 56

Re: Detecting blank variables

> Yes, we do read in Excel spreadsheets.
> If your code is correct, it will drop all the columns
> and all the rows properly.

I find that within a spreadsheet if all the cell values in a column/row was delete without deleting the column/row, then SAS will read in the coulmn as a valid SAS variable and the row as valid obs even they are both total empty. So if I was handed a spreadsheet to read into SAS without knowing some of the column/row were previous non blank. How can I instruct SAS not to read those empty but previously non-empty column/row ?

Thanks
N/A
Posts: 0

Re: Detecting blank variables

Can't, you'll have to filter those out after the fact.
That is, you will have to filter the empty rows while reading in the data, and then filter out empty columns later.

We use Excel spreadsheets as a simple mechanism to hold configuration information. It is easy to edit and print the information. The first row contains the field names, and all non-used rows and columns are "deleted" or "empty" so that SAS considers them "missing" and doesn't read them. Since the first row contains the field names, SAS automatically assigns variables (variable names and labels) to those columns, and all subsequent rows are read in as data. Our code then assumes specific column names. If you are being handed amorphous Excel spreadsheets, then your job is going to be a lot harder, but I would also question as to "why?" what is the purpose? The point of SAS is to do data analysis against either massive amounts of data, or in very sophisticated manners, perhaps applying highly sophisticated (and preprogrammed in "proc"s) analyses to the data. If the input are unknown Excel spreadsheets, then how is the system going to know what to do with the data? Message was edited by: Chuck
Contributor
Posts: 56

Re: Detecting blank variables

Thanks. Appricate for your comments
Frequent Contributor
Posts: 95

Re: Detecting blank variables

I have two suggestions, one manual and one hopefully automated.

Manual Solution:
When you get an Excel file and find these empty columns occurring in your SAS import, then open up the Excel file in MS Excel and delete the empty columns (don't just delete the cells again, actually delete the column). Resave the file and SAS should work better.

Automated Solution:
From one of the earlier posts, it was mentioned that SAS assigns it's own column names (e.g. F2, F3, . . . ) to the blank columns. Maybe you can just drop all the column names that fit this format. For example.
[pre]
data imported;
infile cards missover;
input FirstCol SecondCol F1 F2 F3;
cards;
1 2
3 4
5 6
;
run;


data cleaned;
set imported;
drop F1 - F256; *Drop any columns that start with 'F' followed by a number up to 256;
run;
[/pre]

Be careful not to drop valid column names that start with 'F' in this example. You might have to tweak your column names to handle this properly.
Ask a Question
Discussion stats
  • 22 replies
  • 7708 views
  • 1 like
  • 4 in conversation