HI all,
I am trying to create new variables in my dataset indicating whether or not a person lives in in a rural or urban area. The way that urban/rural status is determined is if the persons postal code has a '0' in the middle of it then it is a rural (eg. B0A), otherwise it is urban. I could do it if it was the first character, however it's using the middle character that has got me stuck. Any thoughts would be much appreciated!
Thanks so much
Mike
Existing Dataset
data have;
input studyID postal_code $;
cards;
1 B0A
2 B2Y
3 F0N
4 B4T
5 M4T
;
Sample output:
Patient ID Postal Code Urban Rural
1 B0A 0 1
2 B2Y 1 0
3 F0N 0 1
4 B4T 1 0
5 M4T 1 0
data have;
input studyID postal_code $;
cards;
1 B0A
2 B2Y
3 F0N
4 B4T
5 M4T
;
data want;
set have;
if index(postal_code,'0')=0 then do;urban=1;rural=0;end;
else if index(postal_code,'0')>0 then do;urban=0;rural=1;end;
run;
If the postalcode is always 3 characters you can get the value with
middle = substr(postal_code,2,1);
the 2 means start reading at the second position and the 1 is read one character.
Thanks so much for this, I was not familiar with the 'substr' command. I'm sure it will come in handy in the future!
Mike
SUBSTR() or CHAR() functions will allow you to access the individual characters.
@righcoastmike wrote:
HI all,
I am trying to create new variables in my dataset indicating whether or not a person lives in in a rural or urban area. The way that urban/rural status is determined is if the persons postal code has a '0' in the middle of it then it is a rural (eg. B0A), otherwise it is urban. I could do it if it was the first character, however it's using the middle character that has got me stuck. Any thoughts would be much appreciated!
Thanks so much
Mike
Existing Dataset
data have; input studyID postal_code $; cards; 1 B0A 2 B2Y 3 F0N 4 B4T 5 M4T ;
Sample output:
Patient ID Postal Code Urban Rural
1 B0A 0 1
2 B2Y 1 0
3 F0N 0 1
4 B4T 1 0
5 M4T 1 0
data have;
input studyID postal_code $;
cards;
1 B0A
2 B2Y
3 F0N
4 B4T
5 M4T
;
data want;
set have;
if index(postal_code,'0')=0 then do;urban=1;rural=0;end;
else if index(postal_code,'0')>0 then do;urban=0;rural=1;end;
run;
Worked like a charm!
Thank you.
Mike
I had a quick question about this code. It works great, but it looks like it's doing the opposite of what I would expect.
To me lit looks like:
If the middle character of the postal code =0 then Urban=1 Rural=0
Else
If the middle character of the postal code > 0 then Urban=0 Rural=1
which would have the 1's and zero's reversed, but for some reason the code get's everything in the right place. Can you explain what I'm missing?
Thanks again.
Mike
The code doesn't even check for middle char =0 i'm afraid. if your first and second both happen to be 0, then the code will fail. If that's the case I would have to tweak a bit, which is no big deal. I assumed your sample is a good representative of your real and went with a simple check of 0 in the string following with a true(1) and false(0) assignment for rural and urban.
For example, please review the test below:
data have;
input studyID postal_code $;
cards;
1 B0A
2 02Y /*notice the change here I made to the sample*/
3 F0N
4 B4T
5 M4T
;
/*now ,please notice the result for ID 2*/
data want;
set have;
if index(postal_code,'0')=0 then do;urban=1;rural=0;end;
else if index(postal_code,'0')>0 then do;urban=0;rural=1;end;
run;
Understood, and the assumption holds.
The only postal codes with a '0' in them are rural postal codes so that works.
so in essence, if the program finds a 0 in the string then it sets rural to true (1) and urban to false (0) and vice versa?
Thanks again.
Mike
Indeed Sir. 🙂
But please feel free to revert back should you need any changes. It's very simple program to make changes. No biggie.
The only request to everybody is that please provide a good representative sample to avoid going back and forth ding dong lol
For added fun consider:
data want; set have; urban= (index(postal_code,'0')=0) ; rural = not(urban); run;
or
data want; set have; rural= (substr(postal_code,2,1)='0') ; urban = not(rural); run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.