BookmarkSubscribeRSS Feed
jrleighty
Calcite | Level 5
I have a csv file where one of the variables is entered inconsistently in terms of capital and lower-case lettering (e.g. Married, married, MARRIED, ect.). I'm wanting to format the observations to be a single-letter value (e.g. M) for all case variants without including all variants in my code. Is there a way to tell SAS to ignore case sensitivity and focus on just the spelling when formatting the data? Open to work around as well.
4 REPLIES 4
Reeza
Super User

$UPCASE1. format?

 

data have;
input string $20.;
cards;
married
Married
Marrie
M
;
run;

proc print data=have;
var string;
format string $upcase1.;
run;
ballardw
Super User

How about an INFORMAT to read the data with?

The Invalue used to create an informat has the option of upcase which converts all text to upper case before comparing to the list of values. The other=_error_ option means that any values encountered that you didn't expect generate and invalid data message in the log and let you know a bit sooner about such. I use that to update my informat, as needed, when my data sources are inconsistent. Such as adding Spanish spellings or their own custom abbreviation like "MAR". Which I would add to the range in my invalue.

proc format;
invalue $status (upcase)
'MARRIED'='M'
'SINGLE' ='S'
'WIDOWED'='W'
'DIVORCED'='D'
' '= ' '
other=_error_ ; run; data example; input status $status10.; datalines; Married married marrieD singLE wIDOWED divorced

newvalue ;

Optionally use this to assign a NUMERIC code value so you have a pretty constant order and use another custom format to display the meaning of the code.

 

This type of informat can also be used to recode values into either a new variable or (dangerous) the same.

codedstatus = input(status,$status.);

 

Kathryn_SAS
SAS Employee

Here is another option if you don't mind creating a new variable:

data test;
input string $;
cards;
married
Married
MARRIED
;
run;

proc format;
 value $stat "MARRIED"='M';
run;

data test;
set test;
string1=put(upcase(string),$stat.);
run;

proc print;
run;
Tom
Super User Tom
Super User

Why not just read the data using the $UPCASE informat.  Then the stored value will be consistent and it will be easier to create a format to display the values.  If those are the only values you could use $1. as the format to display just the first letter.

 

If you want more control make a custom informat and/or a custom format.  I custom informat could use the UPCASE option to convert the source to upcase in addition to encoding the values.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 265 views
  • 3 likes
  • 5 in conversation