DATA Step, Macro, Functions and more

Checking in valid values for all variables in data

Reply
Contributor
Posts: 38

Checking in valid values for all variables in data

Good Morning,

 

I am checking for invalid values for all variables in a dataset. My data has more than 250 numeric values and I know the range of values for all the variables. Some have a range of 0-6 and blank as values and some have 0-4, 99, ., and blank as valid values. I found several examples like using proc freq, proc format along with data step, data step with where and put statement. All the techniques I found seem to be little time consuming and I am on a time crunch.

 

I was wondering if there is much efficient way to get this done. Any recommendations would be really appreciated.

 

Thanks.

rsvs

 

PROC Star
Posts: 740

Re: Checking in valid values for all variables in data

Can you provide some sample data and what you want you output to look like? It makes it much easier to help.

Contributor
Posts: 38

Re: Checking in valid values for all variables in data

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Contributor
Posts: 38

Re: Checking in valid values for all variables in data

My sample data. I have only inncluded few varaibles. Like I said I do have over 250 variables in my data.

 

Thanks in advance.

 

rsvs

Super User
Posts: 7,778

Re: Checking in valid values for all variables in data

Excel files are useless for presenting example data, as they do not keep the structural information of SAS datasets (variable types, lengths, formats,....). On top of that, many corporate firewalls (where most SAS installations are used) will routinely block MS Office files for security reasons.

Post your example data in a data step, as described here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 11,343

Re: Checking in valid values for all variables in data

If I know ahead of time the ranges of values then when I read the data I use custom informats such as

 

Proc format library=work;

invalue lowrange

. , 1,2,3,4,5,6= _same_

other = _error_;

;

 

Then use that informat when reading the variables. If the value is out of range when read the log shows an error message.

 

If you have existing values to check then use arrays to check related variables.

If you have a format like

value check

., 1,2,3,4 = 'In Range'

other = 'invalid';

 

Then in a data step you specify all of the variables that use that format such as:

 

array grp Q1 Q3 Q25-Q30;  /*< list of ALL variables with the same rule*/

do i= 1 to dim(grp);

   if put(grp[i],check.) = 'invalid' then put "appropriate message identifying varialbe and record information";

end;

Note that the above works will with VNAME function to return the specific variable involved.

 

 

Unless you have 100+ formats needed the above should be easily accomplished to identify problem values and records in less than one day and if you only have 5 or 6 common ranges to check then considerably less. I would expect a couple hours to be more likely.

 

The complicated time consuming issues are when Var1 is in a range that indicates a further restrinction on other variables such as if Var1 =1 then variables Var3, Var4 and Var5 should be missing or within another specified range.

Ask a Question
Discussion stats
  • 5 replies
  • 162 views
  • 0 likes
  • 4 in conversation