Hi,
I have a large health dataset with a character variable indicating the location of a health treatment. I am trying to create a dummy variable for certain cities.
Let's say:
data old;
set new;
nyc=0;
if location="New York" then nyc=1;
output;
run;
The problem is that it doesn't seem to recognize either "New York" or New York without quotation marks and I get all zeros. (I kow for a fact that at least 5% of the sample was in fact treated in NYC. What could be the problem?
Thanks
Since you do not post any sample data this is just guessing..
But perhaps your character variable is cased differently than "New York". A common way to deal with this issue is to use the UPCASE function, such that your program looks like
data old;
set new;
nyc=0;
if UPCASE(location) = "NEW YORK" then nyc=1;
output;
run;
Since you do not post any sample data this is just guessing..
But perhaps your character variable is cased differently than "New York". A common way to deal with this issue is to use the UPCASE function, such that your program looks like
data old;
set new;
nyc=0;
if UPCASE(location) = "NEW YORK" then nyc=1;
output;
run;
Great! It worked! Thanks
btw you definately need the quotation marks 🙂
Text strings must match EXACTLY.
1. Run a PROC FREQ on location variable and see values. Note that string comparisons are case secsifive.
2. Use a HEX format to display variable and look for non printing blanks.
3. Use FIND/INDEX to search a string for partial values.
And goes without saying, check your log for Notes/Warnings/Errors.
A quick and simple way to code a dataset, and to save you typing, if you just want a set number:
proc sort data=your_dataset out=codelist nodupkey (keep=location); by location; run; data codelist; set codelist; code=_n_; run; proc sql; create table WANT as select A.*, B.CODE from YOUR_DATASET A left join CODELIST B on A.LOCATION=B.LOCATION; quit;
What this does is create a table with distinct locations, and assigns 1-x based on sort order (obviously you could change the sort if you wanted), then merges that back onto your original data. In this way you don't need to type each if statement, and don't need to worry about "your string" not being the same as "data string" as both come from the same.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.