DATA Step, Macro, Functions and more

check field for valid state name

Accepted Solution Solved
Reply
Valued Guide
Posts: 858
Accepted Solution

check field for valid state name

I am checking a field for a valid postal code. 

 

'Property State' IN ( "AL", "AK", "AZ", "AR", "CA", etc...

 

Is setting this up in a macro faster than having it in an if/then statement?  I'm checking a file with a couple million rows and want to make sure I'm efficient as possible.

 

Thanks in advance, have a great weekend.


Accepted Solutions
Solution
‎04-05-2016 08:50 AM
Super User
Posts: 10,514

Re: check field for valid state name

If I were using something like this very often I would set up a custom format. Possibly  with a result of 'Valid' for valid codes and 'Invalid' for anything else.

 

proc format;

value &customformat

 "AL", "AK", "AZ", "AR", "CA", etc = "Valid"

other = "Invalid"

;

run;

 

Then the code could be

 

If put(upcase(variable),$customformat.) ='Valid' then do;

end;

Else do;

end;

 

I use the approach to catch site codes that get misentered OR get added to data without forewarning.

View solution in original post


All Replies
Respected Advisor
Posts: 4,651

Re: check field for valid state name

I'm curious, What would be the macro version of this?

PG
Valued Guide
Posts: 858

Re: check field for valid state name

Other than using a macro variable so I don't have a long in() clause I have no idea.  Looking for anything to be more efficient.  There's a large amount of records that I need to check.

Super Contributor
Posts: 408

Re: check field for valid state name

I did some testing and the use of INDEX() proved quite fast. I cannot imagine that looping through an array of 52 states for every observation would be faster. But ultimately the proof of the pudidng is in the eating.

 

states="AL|AK|AR|AZ|CA|...|WV|WY";
if (index(states, statecode)) > 0 then put 'State OK';

 

Retain the variable states and assign only when _N_=1 for even better performance.

 

In general this class of problems (table lookup) can be addressed multiple ways:

1) Format

2) Hash table

3) SQL

4) prxmatch (I tried; wasslower)

5) if/then/else

6) Character functions

7) ...

 

This time I choose 6) but 1) and 2) do in-memory searches and can be quite fast as well. They do require more programming. It is worth trying though. They usually work best when the list of values is in a dataset. But unless the USA plans to acquire new states or spin off existing ones any time soon the list should be quite stable.

 

Hope this helps,

- Jan.

Solution
‎04-05-2016 08:50 AM
Super User
Posts: 10,514

Re: check field for valid state name

If I were using something like this very often I would set up a custom format. Possibly  with a result of 'Valid' for valid codes and 'Invalid' for anything else.

 

proc format;

value &customformat

 "AL", "AK", "AZ", "AR", "CA", etc = "Valid"

other = "Invalid"

;

run;

 

Then the code could be

 

If put(upcase(variable),$customformat.) ='Valid' then do;

end;

Else do;

end;

 

I use the approach to catch site codes that get misentered OR get added to data without forewarning.

Super User
Posts: 17,854

Re: check field for valid state name

What version of SAS are you on?

Before 9.4 there appears to be some SAS out of the box formats for State Names/Codes that may be available to you, but they're gone in SAS 9.4.

 

I would echo the format solution. 

 

Maybe one of these:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000211303.htm

Respected Advisor
Posts: 4,651

Re: check field for valid state name

Why not use an informat instead?

 

proc format;
invalue $stateFmt (default=2 just upcase)
"AL", "AK", "AZ", "AR", "CA" = [$upcase2.]
other = "??";
run;

data test;
length code stateCode $2;
input code;
stateCode = input(code, $stateFmt.);
codeError = stateCode = "??";
datalines;
al
Ak
  AZ
AB
;

proc print; run;
PG
Super User
Posts: 9,682

Re: check field for valid state name

Check functions:
If it return missing value, then it should be invalid State Name.


data _NULL_;
fips=stfips ('NC');
put fips=; 
state=stname('NC');
put state=; 
state=stname('AA');
put state=; 
run;


Super User
Posts: 10,514

Re: check field for valid state name


PGStats wrote:

Why not use an informat instead?

 


 

I used an informat for a while with one project but when I went back to the data supplier I then had to re-read the data to get the original unexpected value(s) and associated record identifiers to show them what the unexpected values were. By using the format with a report step I could then use a where statement in proc print to list the same information easier.

 

Volume of records (relatively low) and project requirements (very clean data) made the format approach a better fit in my case.

Respected Advisor
Posts: 4,651

Re: check field for valid state name

You are right @ballardw, if you use the informat to read the data, you lose the original information and that can lead to some problems later. That's why I created a new variable with the input function instead of using my informat within the input statement. 

The informat approach nevertheless allows you to hide the UPCASE and LEFT operations in the format, and to exploit the _ERROR_ mechanism, if you wish. 

PG
Super User
Posts: 10,514

Re: check field for valid state name

@PGStats and I should have spent a bit of time realizing that not everyone is looking at dozens of variables with custom formats/informats. I mostly worry about informat when reading the data but could use this for a report approach as well.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 498 views
  • 3 likes
  • 6 in conversation