BookmarkSubscribeRSS Feed
jrleighty
Fluorite | Level 6
I have a csv file where one of the variables is entered inconsistently in terms of capital and lower-case lettering (e.g. Married, married, MARRIED, ect.). I'm wanting to format the observations to be a single-letter value (e.g. M) for all case variants without including all variants in my code. Is there a way to tell SAS to ignore case sensitivity and focus on just the spelling when formatting the data? Open to work around as well.
4 REPLIES 4
Reeza
Super User

$UPCASE1. format?

 

data have;
input string $20.;
cards;
married
Married
Marrie
M
;
run;

proc print data=have;
var string;
format string $upcase1.;
run;
ballardw
Super User

How about an INFORMAT to read the data with?

The Invalue used to create an informat has the option of upcase which converts all text to upper case before comparing to the list of values. The other=_error_ option means that any values encountered that you didn't expect generate and invalid data message in the log and let you know a bit sooner about such. I use that to update my informat, as needed, when my data sources are inconsistent. Such as adding Spanish spellings or their own custom abbreviation like "MAR". Which I would add to the range in my invalue.

proc format;
invalue $status (upcase)
'MARRIED'='M'
'SINGLE' ='S'
'WIDOWED'='W'
'DIVORCED'='D'
' '= ' '
other=_error_ ; run; data example; input status $status10.; datalines; Married married marrieD singLE wIDOWED divorced

newvalue ;

Optionally use this to assign a NUMERIC code value so you have a pretty constant order and use another custom format to display the meaning of the code.

 

This type of informat can also be used to recode values into either a new variable or (dangerous) the same.

codedstatus = input(status,$status.);

 

Kathryn_SAS
SAS Employee

Here is another option if you don't mind creating a new variable:

data test;
input string $;
cards;
married
Married
MARRIED
;
run;

proc format;
 value $stat "MARRIED"='M';
run;

data test;
set test;
string1=put(upcase(string),$stat.);
run;

proc print;
run;
Tom
Super User Tom
Super User

Why not just read the data using the $UPCASE informat.  Then the stored value will be consistent and it will be easier to create a format to display the values.  If those are the only values you could use $1. as the format to display just the first letter.

 

If you want more control make a custom informat and/or a custom format.  I custom informat could use the UPCASE option to convert the source to upcase in addition to encoding the values.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 659 views
  • 3 likes
  • 5 in conversation