Hi, I don't know if I'm answering the below question correctly. I tried but my codes just look weird. I'd like to hear your advice. Thank you very much!
Variable zipcode is read with format$10. It reads zip codes in the form 07417 or 07417-1280 Create variable_9digit. It equals 1 if zipcode has a hyphen separating the fifth and seven digits. Otherwise it equals 0. Write the statements three ways, using the length, index, ad substr functions.
/* using length function*/ data zipcode; length zipcode $10; input zipcode ; cards; 07417 07417-1280 ; run; DATA zipcode1; Set zipcode; if zipcode= '07417-1280' then zipcode2= '1'; else zipcode2= '0'; proc print noobs; title 'Using length function'; run; /* using substr function*/ DATA zipcode2; format zipcode $10.; zipcode= '07417-1280'; form1 = Substr( zipcode, 1, 5); if form1= '07417' then newform1='0'; form2 = Substr( zipcode, 1); if form2= '07417-1280' then newform2= '1' ; proc print noobs; title 'Using substr function'; RUN; /* using index function*/ Data zipcode3; format zipcode $10.; zipcode= '07417-1280'; form1 = Substr( zipcode,1, index(zipcode, '-')-1); if form1= '07417' then newform1= '0' ; form2 = Substr( zipcode, 1); if form2= '07417-1280' then newform2= '1' ; proc print noobs; title 'Using index function'; RUN;
Hi @Amy0223
I don't understand why you need to use the different functions separately, because in my opinion, what defines the zipcode pattern is the conjonction of 4 conditions:
- length of the zipcode = 10
- digits 1 to 5 = a number
- digits 7 to 9 = a number
- 6th digit = an hyphen
The use of the prxmatch function is a more efficient way to do that but you can also use the traditional length(), index() and substr() functions to create the flag variable. It doesn't make sense to use them separately.
data zipcode_check;
set zipcode;
if length(zipcode)= 10 and
0 < substr(zipcode, 1, 5) < 99999 and
0 < substr(zipcode, 7, 4) < 9999 and
index(zipcode,"-")= 6
then variable_9digit=1;
else variable_9digit=0;
run;
Hi @Amy0223
It is a typical use case for regular expressions.
The function prxmatch() as written below checked if the zipcode variable match the following pattern: 5 digits (\d), 1 hyphen, 4 digits.
data zipcode_flag;
set zipcode;
if prxmatch('/\d{5}\-\d{4}/',zipcode) then variable_9digit = 1;
else variable_9digit = 0;
run;
The issue with your 3 tests is that your code depends specifically on one zip code in particular and not in general.
Best,
Below is my updated codes, do you think this answers the problem?
/* using length function*/ data zipcode; length zipcode $10; input zipcode ; cards; 07417 07417-1280 ; run; DATA zipcode1; Set zipcode; if prxmatch('/\d{5}\-\d{4}/',zipcode) then variable_9digit = 1; else variable_9digit = 0; proc print noobs; title 'Using length function'; run; /* using index function*/ data zipcode2; set zipcode; if index(zipcode,'-') then variable_9digit = 1; else variable_9digit=0; proc print noobs; title 'Using index function'; RUN; /* using substr function*/ DATA zipcode3; set zipcode; if Substr( zipcode, 1, 5) then variable_9digit = 1; else variable_9digit = 0; proc print noobs; title 'Using substr function'; RUN;
Hi @Amy0223
I don't understand why you need to use the different functions separately, because in my opinion, what defines the zipcode pattern is the conjonction of 4 conditions:
- length of the zipcode = 10
- digits 1 to 5 = a number
- digits 7 to 9 = a number
- 6th digit = an hyphen
The use of the prxmatch function is a more efficient way to do that but you can also use the traditional length(), index() and substr() functions to create the flag variable. It doesn't make sense to use them separately.
data zipcode_check;
set zipcode;
if length(zipcode)= 10 and
0 < substr(zipcode, 1, 5) < 99999 and
0 < substr(zipcode, 7, 4) < 9999 and
index(zipcode,"-")= 6
then variable_9digit=1;
else variable_9digit=0;
run;
You didn't use the LENGTH() function as the problem requested.
You cannot use a character expression as if it was a boolean expression. But you can use a numeric expression since SAS will treat 0 (or missing) as FALSE and any other number as TRUE.
Also why are you making a character variable instead of a numeric one?
Numeric is a lot easier since SAS will evaluate boolean expressions to 1 for TRUE and 0 for FALSE.
variable_9digit = ( '-' = substr( zipcode, 6, 1) ) ;
Do you have access to SAS to test your programs? Actually trying is the best way to learn. Especially when your programs don't work as you will learn more from the mistakes than from the code you get right the first time.
You can download a copy for for free from SAS for use in learning.
Did your instructor really write:
Variable zipcode is read with format$10.
If so you can explain to them that in SAS you use an INFORMAT to read text into values. FORMATS are used to convert values into text for display.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.