BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Solly7
Pyrite | Level 9

Hi,

I need help in replacing  the value 'I don't know how tall I am' from height column as well as 'I don't know how much I weigh' from the weight column with the 0 .

 

Data Have

policy_no     height                                weight

1                 I don't know how tall I am    60

2                 1.5                                        I don't know how tall I am

 

                      Data Want

policy_no     height     Weight

1                  0          60

2                  1.5       0

 

Spoiler

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

To answer your specific question, even though I think this is not a good thing to do:

 

data have2;
     set have;
      if height='I don''t know how tall I am' then height='0';
      if weight='I don''t know how tall I am' then weight='0';
run;

To do the calculation of means, you would have to convert these to numeric variables as stated by @ballardw , but it really doesn't make sense to set these to zero, when you should be setting these to missing.

 

Does it really say tall when weight is missing?

--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

In your data set, are the variables HEIGHT and WEIGHT character or numeric? Please run PROC CONTENTS on this data set and show us. Also, if either of these variables have had a format applied, please identify which formats are used.

--
Paige Miller
Solly7
Pyrite | Level 9
Hi, kindly see below proc contents below

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat Label

1 ExternalReferenceId Char 255 $255. $255. ExternalReferenceId
2 Height Char 255 $255.
3 Weight Char 255 $255.
ballardw
Super User

Are you intending to do any analysis or summaries with the "height" and "weight" as actual numeric values?

If so then you need create new variables

 

Perhaps something like:

data want;
   set have;
   numheight= input(height,?? 5.);
   if missing(numheight) then numheight=0;
   numweight= input(weight,?? 5.);
   if missing(numweight) then numweight=0;

run;

INPUT is the way to create numeric values from text. The ?? suppresses the invalid data messages you would normally see when attempting to read "I don't know my height" into a numeric value. The IF assumes the reason for missing is something like your text entries and sets 0.

 

Or change how you read the data into SAS to address such things earlier.

Solly7
Pyrite | Level 9

Hi, yes im intending to replace them with mean value for analysis

Tom
Super User Tom
Super User

Then set them as MISSING, not a valid value like zero.

 

Make new numeric variables to store the actual numbers.

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat Label

1 ExternalReferenceId Char 255 $255. $255. ExternalReferenceId
2 Height Char 255 $255.
3 Weight Char 255 $255.

Why are those fields length $255?  The longest string that the normal SAS informat can read is 32 characters.

Here is a step that will create numeric variables HT and WT from your existing strings.  Anything that is not a valid number is converted to missing.  

data want;
  set have;
  ht = input(left(height),32.);
  wt = input(left(weight),32.);
run;

If you want to suppress the warnings about invalid data you can use the ?? prefix on the informat.

AMSAS
SAS Super FREQ

It's important to understand how SAS handles missing values, not understanding could lead to results you are not expecting:

 

See Understanding How SAS Handles Missing Values

 

Here's a simple example that helps explain it:

 


data _null_ ;
	/* Notice how the mean of 0,1,2,3 = 1.5 that's because SAS calculates the mean as (0+1+2+3)/4 = 6/4 = 1.5 */
	avg=mean(0,1,2,3) ;
	put "Mean of 0,1,2,3 = " avg ;
	/* Notice how the mean of .,1,2,3 = 2 that's because SAS calculates the mean as (1+2+3)/3 = 6/3 = 2 */
	/* Basically SAS doesn't include missing values when using functions */
	avg=mean(.,1,2,3) ;
	put "Mean of .,1,2,3 = " avg ;
run ;

 

Solly7
Pyrite | Level 9
thanks a lot
PaigeMiller
Diamond | Level 26

To answer your specific question, even though I think this is not a good thing to do:

 

data have2;
     set have;
      if height='I don''t know how tall I am' then height='0';
      if weight='I don''t know how tall I am' then weight='0';
run;

To do the calculation of means, you would have to convert these to numeric variables as stated by @ballardw , but it really doesn't make sense to set these to zero, when you should be setting these to missing.

 

Does it really say tall when weight is missing?

--
Paige Miller
Brijesh4sas
Fluorite | Level 6

data have;
input policy_no height $&100. weight $&100.;
datalines;
1 I don't know how tall I am 60
2 1.5 I don't know how tall I am
;
run;

data have2;
set have;
if anyalpha(height) then height='0';
if anyalpha(weight) then weight='0';
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1031 views
  • 3 likes
  • 6 in conversation