BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steelers_In_DC
Barite | Level 11

I am checking a field for a valid postal code. 

 

'Property State' IN ( "AL", "AK", "AZ", "AR", "CA", etc...

 

Is setting this up in a macro faster than having it in an if/then statement?  I'm checking a file with a couple million rows and want to make sure I'm efficient as possible.

 

Thanks in advance, have a great weekend.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If I were using something like this very often I would set up a custom format. Possibly  with a result of 'Valid' for valid codes and 'Invalid' for anything else.

 

proc format;

value &customformat

 "AL", "AK", "AZ", "AR", "CA", etc = "Valid"

other = "Invalid"

;

run;

 

Then the code could be

 

If put(upcase(variable),$customformat.) ='Valid' then do;

end;

Else do;

end;

 

I use the approach to catch site codes that get misentered OR get added to data without forewarning.

View solution in original post

10 REPLIES 10
PGStats
Opal | Level 21

I'm curious, What would be the macro version of this?

PG
Steelers_In_DC
Barite | Level 11

Other than using a macro variable so I don't have a long in() clause I have no idea.  Looking for anything to be more efficient.  There's a large amount of records that I need to check.

jklaverstijn
Rhodochrosite | Level 12

I did some testing and the use of INDEX() proved quite fast. I cannot imagine that looping through an array of 52 states for every observation would be faster. But ultimately the proof of the pudidng is in the eating.

 

states="AL|AK|AR|AZ|CA|...|WV|WY";
if (index(states, statecode)) > 0 then put 'State OK';

 

Retain the variable states and assign only when _N_=1 for even better performance.

 

In general this class of problems (table lookup) can be addressed multiple ways:

1) Format

2) Hash table

3) SQL

4) prxmatch (I tried; wasslower)

5) if/then/else

6) Character functions

7) ...

 

This time I choose 6) but 1) and 2) do in-memory searches and can be quite fast as well. They do require more programming. It is worth trying though. They usually work best when the list of values is in a dataset. But unless the USA plans to acquire new states or spin off existing ones any time soon the list should be quite stable.

 

Hope this helps,

- Jan.

ballardw
Super User

If I were using something like this very often I would set up a custom format. Possibly  with a result of 'Valid' for valid codes and 'Invalid' for anything else.

 

proc format;

value &customformat

 "AL", "AK", "AZ", "AR", "CA", etc = "Valid"

other = "Invalid"

;

run;

 

Then the code could be

 

If put(upcase(variable),$customformat.) ='Valid' then do;

end;

Else do;

end;

 

I use the approach to catch site codes that get misentered OR get added to data without forewarning.

Reeza
Super User

What version of SAS are you on?

Before 9.4 there appears to be some SAS out of the box formats for State Names/Codes that may be available to you, but they're gone in SAS 9.4.

 

I would echo the format solution. 

 

Maybe one of these:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000211303.htm

PGStats
Opal | Level 21

Why not use an informat instead?

 

proc format;
invalue $stateFmt (default=2 just upcase)
"AL", "AK", "AZ", "AR", "CA" = [$upcase2.]
other = "??";
run;

data test;
length code stateCode $2;
input code;
stateCode = input(code, $stateFmt.);
codeError = stateCode = "??";
datalines;
al
Ak
  AZ
AB
;

proc print; run;
PG
Ksharp
Super User
Check functions:
If it return missing value, then it should be invalid State Name.


data _NULL_;
fips=stfips ('NC');
put fips=; 
state=stname('NC');
put state=; 
state=stname('AA');
put state=; 
run;


ballardw
Super User

@PGStats wrote:

Why not use an informat instead?

 


 

I used an informat for a while with one project but when I went back to the data supplier I then had to re-read the data to get the original unexpected value(s) and associated record identifiers to show them what the unexpected values were. By using the format with a report step I could then use a where statement in proc print to list the same information easier.

 

Volume of records (relatively low) and project requirements (very clean data) made the format approach a better fit in my case.

PGStats
Opal | Level 21

You are right @ballardw, if you use the informat to read the data, you lose the original information and that can lead to some problems later. That's why I created a new variable with the input function instead of using my informat within the input statement. 

The informat approach nevertheless allows you to hide the UPCASE and LEFT operations in the format, and to exploit the _ERROR_ mechanism, if you wish. 

PG
ballardw
Super User

@PGStats and I should have spent a bit of time realizing that not everyone is looking at dozens of variables with custom formats/informats. I mostly worry about informat when reading the data but could use this for a report approach as well.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2819 views
  • 3 likes
  • 6 in conversation