BookmarkSubscribeRSS Feed
rsva
Fluorite | Level 6

Good Morning,

 

I am checking for invalid values for all variables in a dataset. My data has more than 250 numeric values and I know the range of values for all the variables. Some have a range of 0-6 and blank as values and some have 0-4, 99, ., and blank as valid values. I found several examples like using proc freq, proc format along with data step, data step with where and put statement. All the techniques I found seem to be little time consuming and I am on a time crunch.

 

I was wondering if there is much efficient way to get this done. Any recommendations would be really appreciated.

 

Thanks.

rsvs

 

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Can you provide some sample data and what you want you output to look like? It makes it much easier to help.

rsva
Fluorite | Level 6
##- Please type your reply above this line. Simple formatting, no
attachments. -##
rsva
Fluorite | Level 6

My sample data. I have only inncluded few varaibles. Like I said I do have over 250 variables in my data.

 

Thanks in advance.

 

rsvs

Kurt_Bremser
Super User

Excel files are useless for presenting example data, as they do not keep the structural information of SAS datasets (variable types, lengths, formats,....). On top of that, many corporate firewalls (where most SAS installations are used) will routinely block MS Office files for security reasons.

Post your example data in a data step, as described here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

ballardw
Super User

If I know ahead of time the ranges of values then when I read the data I use custom informats such as

 

Proc format library=work;

invalue lowrange

. , 1,2,3,4,5,6= _same_

other = _error_;

;

 

Then use that informat when reading the variables. If the value is out of range when read the log shows an error message.

 

If you have existing values to check then use arrays to check related variables.

If you have a format like

value check

., 1,2,3,4 = 'In Range'

other = 'invalid';

 

Then in a data step you specify all of the variables that use that format such as:

 

array grp Q1 Q3 Q25-Q30;  /*< list of ALL variables with the same rule*/

do i= 1 to dim(grp);

   if put(grp[i],check.) = 'invalid' then put "appropriate message identifying varialbe and record information";

end;

Note that the above works will with VNAME function to return the specific variable involved.

 

 

Unless you have 100+ formats needed the above should be easily accomplished to identify problem values and records in less than one day and if you only have 5 or 6 common ranges to check then considerably less. I would expect a couple hours to be more likely.

 

The complicated time consuming issues are when Var1 is in a range that indicates a further restrinction on other variables such as if Var1 =1 then variables Var3, Var4 and Var5 should be missing or within another specified range.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2119 views
  • 0 likes
  • 4 in conversation