DATA Step, Macro, Functions and more

uniform gender values (M, Male, male, etc)

Accepted Solution Solved
Reply
Super Contributor
Posts: 312
Accepted Solution

uniform gender values (M, Male, male, etc)

in my data, the gender has values like Male, M, m, male, Female, FEMALE, F, f.

When i use proc freq, they are all listed as different values.

I can recode them using if condition, but is there a simpler way to recode them unifomly?

 

Thanks


Accepted Solutions
Solution
‎03-02-2016 01:04 PM
Super User
Posts: 5,085

Re: uniform gender values (M, Male, male, etc)

Another possibility:  don't recode.  Create a format to translate:

 

proc format;

value $gender 'M', 'm', 'Male', 'male' = 'Male'

'F', 'f', 'Female', 'female' = 'Female';

run;

 

proc freq data=have;

tables gender;

format gender $gender.;

run;

 

This leaves open the possibility of not changing your data, but still grouping the values for reporting purposes. It's a decent approach for cleaning data, as the PROC FREQ output displays values that occur in the data but are not accounted for using the format. 

 

If you did want to change the values, you could still apply the format:

 

data want;

set have;

sex = put(gender, $gender.);

run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,115

Re: uniform gender values (M, Male, male, etc)

Hi @fengyuwuzu,

 

You could use character functions instead. Example:

data want;
set have;
length sex $1;
sex=lowcase(char(gender,1));
drop gender;
rename sex=gender;
run;
Super Contributor
Posts: 312

Re: uniform gender values (M, Male, male, etc)

Thank you, This also works!
Trusted Advisor
Posts: 1,115

Re: uniform gender values (M, Male, male, etc)

@fengyuwuzu: When using the formatting approach you may want to use the order=formatted option in PROC FREQ statements, so as to ensure a consistent ordering of the two categories in the output. (Otherwise, the sort order would depend on the data because, for example, 'F'<'M', but 'M'<'f'.)

Super Contributor
Posts: 312

Re: uniform gender values (M, Male, male, etc)

Thank you for the tip, FreelanceReinhard.
Solution
‎03-02-2016 01:04 PM
Super User
Posts: 5,085

Re: uniform gender values (M, Male, male, etc)

Another possibility:  don't recode.  Create a format to translate:

 

proc format;

value $gender 'M', 'm', 'Male', 'male' = 'Male'

'F', 'f', 'Female', 'female' = 'Female';

run;

 

proc freq data=have;

tables gender;

format gender $gender.;

run;

 

This leaves open the possibility of not changing your data, but still grouping the values for reporting purposes. It's a decent approach for cleaning data, as the PROC FREQ output displays values that occur in the data but are not accounted for using the format. 

 

If you did want to change the values, you could still apply the format:

 

data want;

set have;

sex = put(gender, $gender.);

run;

Super User
Posts: 10,516

Re: uniform gender values (M, Male, male, etc)

If I'm building the data set I often take @Astounding's idea one step earlie with an INFORMAT to read all of the values into my desired coding so later on all of the values a uniform.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 297 views
  • 4 likes
  • 4 in conversation