Hi:
In that practice, we provide you with the "rules" and show you what the valid REG and TYPE values are. They are listed for you at the top of the activity.
Your task was to run PROC FREQ and determine for yourself whether the data was clean or not. This task required you to visually check what was in PROC FREQ against the table of valid values so you could decide whether you had "clean" data. As shown in the solution the REG values were all OK; the TYPE values were not.
The table of valid values were meant to be like business rules that you encounter in a real job. Sometimes you know why valid values are valid; sometimes you don't. Your job is to learn how to run procedures to validate the data. Learning PROC FREQ is the first step.
There are different ways to validate data. I am not going to post a program here that uses the class data. But consider this alternate scenario. I have a data table of students. All the students ages should be between 11-16 any ages outside that range are incorrect. Also, every student must have a signed permission slip. Here's what the fake student data looks like:
And, according to my stated rules, 3 rows in the data are not OK. Assuming I know what the valid values are, I can use PROC FORMAT to help me determine whether I have clean data or not.
This is what I want to produce:
Note that my PROC FREQ output shows me the number of errors of each type. Then my PROC PRINT shows me the exact rows in error.
Later in the course, you will learn about PROC FORMAT. Here's the full program that created the above data:
data students;
length name $10 signed_by $20;
infile datalines dlm=',' dsd;
input name $ signed_by $ age;
return;
datalines;
Anna, "Grandmother", 17
Bob, "Mother", 11
Carol, "Father", 13
Doug, " ",15
Edith, "Step-mother",10
;
run;
proc print data=students;
title 'Student Data Before Validation';
run;
proc format;
value ageck 11-16 = 'Age OK'
other = 'Incorrect Age value';
value $signck 'Grandmother', 'Grandfather' = 'Permission OK'
'Mother', 'Father' = 'Permission OK'
'Step-mother', 'Step-father' = 'Permission OK'
' ', other = 'Error';
run;
proc freq data=students;
table age signed_by / missing;
format age ageck. signed_by $signck.;
title 'Age Errors and Permission Errors';
run;
proc print data=students;
where put(age,ageck.) = 'Incorrect Age value' or
put(signed_by, $signck.) = 'Error';
title 'Age Errors and Permission Errors';
run;
Hope this helps,
Cynthia
... View more