BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I am cleaning some extraneous strings from a text field

Data SASCDC_2.Arias_County_RC_ETHNICITY_FILL;
  Set SASCDC_2.Arias_County_RC_ETHNIC_CLEANUP;
     If Identify_Race_or_Ethnicity in ("American_Indian", "American_Indian_(Cheeroke)", "American_Indian_or_Alaskan Native", "Apache", "Caucasian/_NAtive_American_Cow_Creek", "Caucasian/_Native_American",
          "Native_American", "Native_American_-_Cherokee", "Native_American_-_Turtle_Mountain_Reservation", "Native_American_German", "Native_American_Irish", 
          "Native_American_and_White", "Native_American,_Hispanic,_Caucasian", "Native_American/Hispanic", "Native_America/Mexican", "Native_American/White", "Native/Indigenous",
          "Rosebud_Sioux_Tribe", "native_American", "native_american", "American_Indian_+_Other_White", "American_Indian_and_Caucasion", "Alaska_Native", "Alaskan_Native",
           "American_Indian/Hispanic") then Identify_Race_or_Ethnicity = 'NATIVE_AMERICAN';
run;

In the survey the developers allowed individuals to provide free-form descriptions of R/E.  Then these free form responses are to be put into the stylized R/E buckets used by the Census and others.  Here is the log

Set SASCDC_2.Arias_County_RC_ETHNIC_CLEANUP;
577       If  Identify_Race_or_Ethnicity in
577! ("American_Indian","American_Indian_(Cheeroke)","American_Indian_or_Alaskan
                      ---                          ---                                 ---
---
---
                      49                           49                                  49
49
49
577! Native","Apache","Caucasian/_NAtive_American_Cow_Creek","Caucasian/_Native_American",
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS
             release.  Inserting white space between a quoted string and the succeeding
             identifier is recommended.
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS
             release.  Inserting white space between a quoted string and the succeeding
             identifier is recommended.

581             "American_Indian/Hispanic") then Identify_Race_or_Ethnicity = 'NATIVE_AMERICAN'
581! ;
582  run;
581             "American_Indian/Hispanic") then Identify_Race_or_Ethnicity = 'NATIVE_AMERICAN'
                                         ------------------------------------------------------
                                         49
581! ;
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS
             release.  Inserting white space between a quoted string and the succeeding
             identifier is recommended.

583  Data SASCDC_2.Arias_County_RC_ETHNIC_CLEANUP;
584     Set SASCDC_2.Arias_County_RC_ETHNICITY_A;
585
586     If Identify_Race_or_Ethnicity in ("Mexican-Indian (these were their own words)", "US
                                                                                      ----
----
----
----
                                                                                      49
49
49
49

I am not sure what the log is trying to tell me?  It is a note not an error technically, however when I check the field that is to be changed many of the free form texts are still there. I am searching for answers or remedies (that is why there are underscores between letters and other text expressions in the above).

 

In some other coding like Python that is sensitive to whitespace there is at least and explanation of how to correct and then the code runs as expected.  Here I think SAS is ambiguous (maybe not - so that is why I am asking how to correct this - even though it is just a note because the code doesn't produce the expected result.)

 

Thank you for your help.

 

wlierman

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The error message is just the place where SAS compiler was able to see something strange.  It has tagged this code was strange.

","A

So it is saying that you shouldn't use A right after the quoted comma because SAS might decide to use that to indicate some type of special constant.  Like they use D, T and DT for date, time and datetime constants.

 

Look higher up somewhere in your code for unbalanced quotes.

 

View solution in original post

5 REPLIES 5
Reeza
Super User
Add a space between the quotes and comma. Or you can remove the comma's entirely within a data step.

"American_Indian"_,
The compiler would like to see that underscore as a space.

The rationale is because SAS literals use a similar notation where the character after the quote does have meaning, ie

'01Jan2019'd -> d here tells SAS to read that as a date not as a character or 'My date'n tells SAS this is a variable name, not just a string of characters.

The log is warning you that this could be an issue in the future, but it's not an error. I would fix it because otherwise you don't know if you have a new issue or if it's the same ol' issue.
ballardw
Super User

Of course it depends on exactly which race/ethnicity coding you are emulating. There is more than one "census" encoding scheme and you have several of those that would fall into "more than one race" in at least one of the schemes.

 

I've often used custom informats to address such issues to map relatively open text to specific values with an Other= _error_ option to have the log tell me when there are values I didn't expect.

One small advantage of this is you can use the UPCASE option with the invalue so that the case is converted to all uppercase when the comparison is made so that these are the same:

 "Native_American" "native_American" "native_american"

as well as any other mix of capitalization.

Yes it can lead to longish Proc Format code but sometimes having all the values in one place makes keeping track of such things easier.

And the proc informat code currently doesn't generate such warnings:

13   proc format;
14   invalue $re (upcase)
15   "AMERICAN_INDIAN", "AMERICAN_INDIAN_(CHEEROKE)",
15 ! "AMERICAN_INDIAN_OR_ALASKAN NATIVE", "APACHE",
15 ! "CAUCASIAN/_NATIVE_AMERICAN_COW_CREEK",
15 ! "CAUCASIAN/_NATIVE_AMERICAN",
16    "NATIVE_AMERICAN", "NATIVE_AMERICAN_-_CHEROKEE",
16 ! "NATIVE_AMERICAN_-_TURTLE_MOUNTAIN_RESERVATION",
16 ! "NATIVE_AMERICAN_GERMAN", "NATIVE_AMERICAN_IRISH",
17    "NATIVE_AMERICAN_AND_WHITE",
17 ! "NATIVE_AMERICAN,_HISPANIC,_CAUCASIAN",
17 ! "NATIVE_AMERICAN/HISPANIC", "NATIVE_AMERICA/MEXICAN",
17 ! "NATIVE_AMERICAN/WHITE", "NATIVE/INDIGENOUS",
18    "ROSEBUD_SIOUX_TRIBE", "AMERICAN_INDIAN_+_OTHER_WHITE",
18 ! "AMERICAN_INDIAN_AND_CAUCASION", "ALASKA_NATIVE",
18 ! "ALASKAN_NATIVE",
19     "AMERICAN_INDIAN/HISPANIC" = 'NATIVE_AMERICAN'
20   ;
NOTE: Informat $RE has been output.
21   run;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds

Tom
Super User Tom
Super User

The error message is just the place where SAS compiler was able to see something strange.  It has tagged this code was strange.

","A

So it is saying that you shouldn't use A right after the quoted comma because SAS might decide to use that to indicate some type of special constant.  Like they use D, T and DT for date, time and datetime constants.

 

Look higher up somewhere in your code for unbalanced quotes.

 

wlierman
Lapis Lazuli | Level 10
Thank you. Both your guidance and Reeza's insight helped clear my problem up.

Thanks again.

wlierman
Amir
PROC Star

Hi @wlierman ,

 

Further evidence of what @Tom has said can be seen by looking at your log, one data step has ended with a run statement and another has begun with a data statement, but no notes have appeared in the log about how many observations and variables are in your data set.

 

If all of your quotes do appear to be balanced then try closing and restarting your session.

 

If you still have problems after that then, after another session restart, try running one data step at a time and make sure you get a data set created at each step. As soon as a data step does not create a data set then your problem is likely in that data step.

 

HTH.

 

 

Kind regards,

Amir.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 7658 views
  • 5 likes
  • 5 in conversation