BookmarkSubscribeRSS Feed
shortyofhb
Calcite | Level 5

Hello all,

I have a question regarding Proc Format and the value statement.

I have an external dataset that has age grouped into age ranges. The dataset needs to be cleaned and organized. For values such as 20-29ax or 30s-39 I used the compress function to squeeze out the letters and used a Proc Format to change those to "20-29" and "30-39"

However, what do I do about the excel columns that just say "Under 18" how would I clean that to make it presentable in SAS. I tried to do 0-17 but it didn't work because on the excel dataset it just says "Under 18"

7 REPLIES 7
Kurt_Bremser
Super User

Create an informat that catches all the "funny" values entered in the spreadsheet, and alings them to your wanted values.

proc format;
invalue $funny_excel
  "Under 18" = "0-17"
  "20-29" = "20-29"
  "20-29ax" = "20-29"
  "30-39" = "30-39"
  "30s-39" = "30-39"
  other = "ERROR"
;
run;

Add all other allowed or expected values; after each import, run a quick check for "ERROR" values to find values you had not encountered yet.

ballardw
Super User

It never hurts to show the code you have attempted.

Such as show us  the code of your format.

And some actual values.

 

 

shortyofhb
Calcite | Level 5

Proc Format Library=A;
Value $Ageft

other = "0-17"
18-29 = "18-29"
30-39 = "30-39"
40-49 = "40-49"
50-59 = "50-59"
60-69 = "60-69"
70-99 = "70+";
run;

 

The issue I am having is the age variable in my dataset is very messy.

It has age groups in intervals of 10, but some of the values have "20-29gt" or "30s-49" or "Under 18" or "70 and above"

 

How would I go about organizing this data?

 

I used a compress function to get rid of any extra letters of symbols ($,&, etc.) out of the age intervals. But I do not know about to go about now formatting it so "Under 18" or "70 and above" will properly be coded. In addition, I need to take into account missing cells.

 

Any suggestions would be great!

shortyofhb
Calcite | Level 5

Hello all,

I need some help in cleaning up an external dataset if possible, I would greatly appreciate it!

 

I imported an external excel dataset into SAS. The variable age is in intervals of age groups (20-29, 30-39, etc.) I am supposed to format the age into age groups and clean the dataset to make it look nice.

The issue I am running into is that the dataset has a bunch of messy values like (20ag-29, 30-39exa, Under 18, 70 and above, and missing values).

First I used a compress function to rid of any letters or symbols in the dataset, but then I got stuck.

I normally use a Proc Format to create my age variables and so far have:

Proc Format Library=WorkData;

Value   Ageft.   18-29   =   "18-29"

etc. all the way to 60-69.

 

However, how do I take into account the values that say "Under 18" "70 and above" as well as the blank cells?

Would I need to use an array for the blank cells and code them as missing?

 

Thank you in advanced!

(Note: When I run a Proc Freq, it takes into account everything except the "Under 18" values. I used 70-99 = "70 and above" and I think that read fine.")

 

ChrisNZ
Tourmaline | Level 20

>  I used 70-99 = "70 and above" and I think that read fine.

0-18 could also be an interval

then .

then the rest:

0-18='0-18'

.='Missing'

other='Invalid'

This way you catch -999 or +999

 

Note that using a formatted number rather than a string means that the format must be available when opening the data set. So it's not always a better solution.

shortyofhb
Calcite | Level 5

I looked at the data.

There is a 18-29 interval and a 20-29 interval in the dataset. So I created an 18-29 Value in the proc format.

But 0-18 did not read the "Under 18" value in the datasheet.

 

The excel data sheet says "Under 18" so I am not sure how to get this to read.

andreas_lds
Jade | Level 19

@shortyofhb wrote:

I looked at the data.

There is a 18-29 interval and a 20-29 interval in the dataset. So I created an 18-29 Value in the proc format.

But 0-18 did not read the "Under 18" value in the datasheet.

 

The excel data sheet says "Under 18" so I am not sure how to get this to read.


Please post the data you have in usable form: a data step using datalines-statment. If you have "Under 18" as value in your data, then using a numeric format won't give you the expected results.

 

EDIT: This looks like https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706548, i will merge both posts.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 762 views
  • 0 likes
  • 5 in conversation