BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Aakarshan
Fluorite | Level 6

Hi All,

 

I need a little clarification and understanding on valid and invalid values while using PROC statements.

For instance in Level 2 practice : 

HowNPREPRESERVE, and RIVERWAYS are invalid values for Type." becomes the invalid values existing.

Please explain.

Thanks

Aakarshan

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ

Hi:
For every type of data you work with, there will always be rules. What values are OK in the data, what values are not OK in the data. In the class, we tell you what values are OK and not OK. In real life, other people tell you what's OK and not OK. Sometimes, the data itself tells you that something is not OK. For example, you have data with ages for patients. You have someone with an age value of 300 -- is that OK or not OK? The depends on the unit of measure for the age -- is it years or?? months or?? days??. 300 months of age would be OK. 300 years of age is highly unlikely.

Same thing for salary. You are given data that has salaries. Somebody makes an annual salary of .05, in US currency, that means they are basically working for nearly nothing. Is that value OK or not OK? You would have to ask someone who knows the data whether that value is OK or not OK.

In that practice we told you what values were OK. Your job was only to run PROC FREQ and determine for yourself whether the data was clean or not. As shown in the solution the REG values were all OK; the TYPE values were not.

The table of valid values were meant to be like business rules that you encounter in a real job. Sometimes you know why valid values are valid; sometimes you don't. Your job is to learn how to run procedures to validate the data. Learning PROC FREQ is the first step.

Cynthia

View solution in original post

11 REPLIES 11
r_behata
Barite | Level 11

Hello @Aakarshan

 

Please do not assume that all the community members have access to your practice questions. Post the question entirely and ask for your clarification.

Aakarshan
Fluorite | Level 6

Sorry for the assumption , will post the new and detailed question.

Thanks for your response.

Cynthia_sas
SAS Super FREQ

Hi:
  It looks like you are in the Programming 1 class, in a Level 2 Practice. Each practice has a solution for you to look at. Please click the "Show Solution" button at the bottom of the Practice window.

  If seeing the solution does NOT answer your question, then please revisit the videos in the lesson that is immediately before the practice. Most of the tasks or code required in a practice is either shown in the lecture or discussed in one of the demos in the section that is immediately before your practice.

  Please remember that many people on these Forums have already taken Programming 1 in the past. They may or may not be familiar with our newest version of Programming 1. It always helps to state the following information if you are working inside an e-learning class:

1) the name of the class

2) the Lesson number you are in

3) whether you're working on an Activity or a Practice and the name of the Activity or Practice.

 

  Of these 3 pieces of information, you've only provided 1 piece and a snippet from the solution. If you're looking at the solution, then please look at the table for REG and TYPE at the TOP of that practice. We tell you exactly what values are VALID values. You need to then run the PROC FREQ step that we outline in #1 and compare what is in your actual data, with the valid values as stated at the top of the practice. This is the type of data validation that you have to do prior to any analysis or reporting with your data.

 

  If you compare the PROC FREQ table for REG with the table at the top of the practice are there any invalid values? If you compare the PROC FREQ table for TYPE with the table at the top of the practice are there any invalid values? The solution tells you that there are NOT any invalid values for REG, but that there ARE invalid values for TYPE. Did you run the PROC FREQ step shown in the solution?

 

  When I ran the program in the solution, here's what I got:

Lesson3_practice.png
  As you can see, my PROC FREQ results show that the solution inside the class is correct and that in MY output, the invalid values for TYPE are NPRE, PRESERVE and RIVERWAYS. Again, you must manually compare the table from PROC FREQ to the stated valid values at the top of the practice. Later in the class, we will show you a way to write a program to determine valid and invalid values based on a list that you can provide in the code. But for now, we want to prove that sometimes your data is clean and sometimes it is not.

 

Cynthia

Aakarshan
Fluorite | Level 6

Hi Cynthia, 

 

I have provided the further details for the clarification.But my question is not with the coding as I wrote the code right and got the result as mentioned in the solutions only.

However my question is that How we will find the invalid values. Like in the result we got three invalid values, So, my question is how are they invalid.

Thanks 

Aakarshan

Reeza
Super User

@Aakarshan wrote:

Hi Cynthia, 

 

I have provided the further details for the clarification.But my question is not with the coding as I wrote the code right and got the result as mentioned in the solutions only.

However my question is that How we will find the invalid values. Like in the result we got three invalid values, So, my question is how are they invalid.

Thanks 

Aakarshan


They're invalid because they're not in the list of valid codes provided and considered valid based on the criteria proposed in the question. 

 

Valid values and descriptions for the columns Reg and Type are as follows:...

 

Community_Guide
SAS Moderator

Hello @Aakarshan,


Your question requires more details before experts can help. Can you revise your question to include more information? 

 

Review this checklist:

  • Specify a meaningful subject line for your topic.  Avoid generic subjects like "need help," "SAS query," or "urgent."
  • When appropriate, provide sample data in text or DATA step format.  See this article for one method you can use.
  • If you're encountering an error in SAS, include the SAS log or a screenshot of the error condition. Use the Photos button to include the image in your message.
    use_buttons.png
  • It also helps to include an example (table or picture) of the result that you're trying to achieve.

To edit your original message, select the "blue gear" icon at the top of the message and select Edit Message.  From there you can adjust the title and add more details to the body of the message.  Or, simply reply to this message with any additional information you can supply.

 

edit_post.png

SAS experts are eager to help -- help them by providing as much detail as you can.

 

This prewritten response was triggered for you by fellow SAS Support Communities member @Reeza

.
Aakarshan
Fluorite | Level 6

Level 2 Practice: Using Procedures to Validate Data

The pg1.np_summary table contains information about US national parks, monuments, preserves, rivers, and seashores. Valid values and descriptions for the columns Reg and Type are as follows:

Reg Description Type Description

AAlaskaNMNational Monument
IMIntermountainNPNational Park
MWMidwestNSNational Seashore
NCNational CapitalPRENational Preserve
NENortheastRVRNational River
PWPacific West  
SESoutheast  

Reminder: If you restarted your SAS session,you must recreate the PG1 library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the EPG194 folder. In Enterprise Guide, run the Autoexec process flow.

  1. Create a new program. 

    • Write a PROC FREQ step to produce frequency tables for the Reg and Type columns in the pg1.np_summary table.
    • Submit the step and look for invalid values.


  2. What invalid values exist for Reg?
  3. What invalid values exist for Type?


  4. Write a PROC UNIVARIATE step to generate statistics for the Acres column in the pg1.np_summary table. Submit the step. 


  5. What are the observation numbers for the smallest park and the largest park?


  6. View the pg1.np_summary table to identify the name and size of the smallest and largest parks.

 

In the above highlighted and underlined point 2 and point 3. I need an explanation on that.

Coding is understood but Need an explanation as why and how "NPREPRESERVE, and RIVERWAYS are invalid values for Type."

I hope I have made myself clear for the community people to understand my query now.

Also attaching the codes used

/*NPRE, PRESERVE, and RIVERWAYS are invalid values for Type. and
No invalid values exist for Reg.*/
proc freq data= pg1.np_summary;
tables reg type ;
run;

/*Smallest: Observation 78
Largest: Observation 6*/
proc univariate data= pg1.np_summary;
var Acres ;
run;



/*Smallest: African Burial Ground Monument, .35 acres
Largest: Noatak National Preserve, 6,587,071.39 acres*/
proc print data=pg1.np_summary;
run;

 

Thanks for your detailed information with the steps.

Aakarshan

Cynthia_sas
SAS Super FREQ

Hi:
For every type of data you work with, there will always be rules. What values are OK in the data, what values are not OK in the data. In the class, we tell you what values are OK and not OK. In real life, other people tell you what's OK and not OK. Sometimes, the data itself tells you that something is not OK. For example, you have data with ages for patients. You have someone with an age value of 300 -- is that OK or not OK? The depends on the unit of measure for the age -- is it years or?? months or?? days??. 300 months of age would be OK. 300 years of age is highly unlikely.

Same thing for salary. You are given data that has salaries. Somebody makes an annual salary of .05, in US currency, that means they are basically working for nearly nothing. Is that value OK or not OK? You would have to ask someone who knows the data whether that value is OK or not OK.

In that practice we told you what values were OK. Your job was only to run PROC FREQ and determine for yourself whether the data was clean or not. As shown in the solution the REG values were all OK; the TYPE values were not.

The table of valid values were meant to be like business rules that you encounter in a real job. Sometimes you know why valid values are valid; sometimes you don't. Your job is to learn how to run procedures to validate the data. Learning PROC FREQ is the first step.

Cynthia

Aakarshan
Fluorite | Level 6

Hi,

Thank you Cynthia for your help and clearing my doubt.

Regards

Aakarshan

anranyu1996
Calcite | Level 5

The SAS output is as follows:

reg:

A

IM

MW

...

it is consistence with question page, but  type output is 

type:

NM

NP

NPRE

NS

PRE

PRESERVE

RIVERWAYS

RVR

but in the question page, the types are 

NM

NP

NS

RVR

Cynthia_sas
SAS Super FREQ

Hi:

  In that practice, we provide you with the "rules" and show you what the valid REG and TYPE values are. They are listed for you at the top of the activity.

 

  Your task was to run PROC FREQ and determine for yourself whether the data was clean or not. This task required you to visually check what was in PROC FREQ against the table of valid values so you could decide whether you had "clean" data. As shown in the solution the REG values were all OK; the TYPE values were not.

 

  The table of valid values were meant to be like business rules that you encounter in a real job. Sometimes you know why valid values are valid; sometimes you don't. Your job is to learn how to run procedures to validate the data. Learning PROC FREQ is the first step.

 

  There are different ways to validate data. I am not going to post a program here that uses the class data. But consider this alternate scenario. I have a data table of students. All the students ages should be between 11-16 any ages outside that range are incorrect. Also, every student must have a signed permission slip. Here's what the fake student data looks like:

fake_data_students.png

And, according to my stated rules, 3 rows in the data are not OK. Assuming I know what the valid values are, I can use PROC FORMAT to help me determine whether I have clean data or not.

 

This is what I want to produce:

use_format_validate.png

Note that my PROC FREQ output shows me the number of errors of each type. Then my PROC PRINT shows me the exact rows in error.

 

Later in the course, you will learn about PROC FORMAT. Here's the full program that created the above data:

data students;
  length name $10 signed_by $20;
  infile datalines dlm=',' dsd;
  input name $ signed_by $ age;
return;
datalines;
Anna,  "Grandmother", 17
Bob,   "Mother", 11
Carol, "Father", 13
Doug,  " ",15
Edith, "Step-mother",10
;
run;

proc print data=students;
title 'Student Data Before Validation';
run;

proc format;
  value ageck 11-16 = 'Age OK'
              other = 'Incorrect Age value';
			    
  value $signck 'Grandmother', 'Grandfather' = 'Permission OK'
               'Mother', 'Father' = 'Permission OK'
			   'Step-mother', 'Step-father' = 'Permission OK'
			   ' ', other = 'Error';
run;
  
proc freq data=students;
  table age signed_by / missing;
  format age ageck. signed_by $signck.;
  title 'Age Errors and Permission Errors';
run;
  
proc print data=students;
  where put(age,ageck.) = 'Incorrect Age value' or
        put(signed_by, $signck.) = 'Error';
  title 'Age Errors and Permission Errors';
run;

 

Hope this helps,

Cynthia

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 6818 views
  • 5 likes
  • 6 in conversation