BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GKati
Pyrite | Level 9

Hi,

 

I have a large health dataset with a character variable indicating the location of a health treatment. I am trying to create a dummy variable for certain cities.

Let's say:

 

data old;

    set new;

nyc=0;

if location="New York" then nyc=1;

output;

run;

 

The problem is that it doesn't seem to recognize either "New York" or New York without quotation marks and I get all zeros. (I kow for a fact that at least 5% of the sample was in fact treated in NYC. What could be the problem?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Since you do not post any sample data this is just guessing..

 

But perhaps your character variable is cased differently than "New York". A common way to deal with this issue is to use the UPCASE function, such that your program looks like

 

data old;
    set new;
nyc=0;
if UPCASE(location) = "NEW YORK" then nyc=1;
output;
run;

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Since you do not post any sample data this is just guessing..

 

But perhaps your character variable is cased differently than "New York". A common way to deal with this issue is to use the UPCASE function, such that your program looks like

 

data old;
    set new;
nyc=0;
if UPCASE(location) = "NEW YORK" then nyc=1;
output;
run;
GKati
Pyrite | Level 9

Great! It worked! Thanks

PeterClemmensen
Tourmaline | Level 20

btw you definately need the quotation marks 🙂

Reeza
Super User

Text strings must match EXACTLY. 

 

1. Run a PROC FREQ on location variable and see values. Note that string comparisons are case secsifive. 

2. Use a HEX format to display variable and look for non printing blanks. 

3. Use FIND/INDEX to search a string for partial values. 

 

And goes without saying, check your log for Notes/Warnings/Errors. 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

A quick and simple way to code a dataset, and to save you typing, if you just want a set number:

proc sort data=your_dataset out=codelist nodupkey (keep=location);
  by location;
run;

data codelist;
  set codelist;
  code=_n_;
run;

proc sql;
  create table WANT as
  select  A.*,
          B.CODE
  from    YOUR_DATASET A
  left join CODELIST B
  on      A.LOCATION=B.LOCATION;
quit;

What this does is create a table with distinct locations, and assigns 1-x based on sort order (obviously you could change the sort if you wanted), then merges that back onto your original data.  In this way you don't need to type each if statement, and don't need to worry about "your string" not being the same as "data string" as both come from the same.  

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1796 views
  • 3 likes
  • 4 in conversation