I have a data set that is listed out by zip code as the mapping coordinate but I would like to group those zip codes into a small number of regions (say 5 or 10). Is there a way to do this using the zipcodes or do I need to develop a shape file to define the regions?
For example, my data is similar to this:
Zip | Region | Count |
---|---|---|
90001 | South | 50 |
90002 | South | 75 |
90003 | South | 60 |
90004 | West | 75 |
90005 | East | 100 |
Can I use the zip codes to define the Regions and then sum the counts to display on the map? Or do I need to create something similar to the coordinates in the maps.us table to define regions? Thank you in advance for any help provided.
You can use the sashelp.zip file to merge the zips into county or states. If you want different regions you need to figure out a way to group them individually, thought I'd think county would be a good way to start (knowing very little about this b/c I'm Canadian).
If you have zip code and regions defined in a file like sashelp.zipcode you can create a format and use it to summarize your counts.
Zafer
Example:
data zipregion;
input Zip Region $;
datalines;
90001 South
90002 South
90003 South
90004 West
90005 East
;
run;
data zipcount;
input Zip Count;
datalines;
90001 50
90002 75
90003 60
90004 75
90005 100
;
run;
/* If you have zip codes and regions defined in a data set */
data zipfmt;
set zipregion;
retain fmtname 'zipfmt';
Start = Zip;
Label = Region;
keep fmtname Start Label;
run;
proc format cntlin=zipfmt;
run;
proc summary data=zipcount nway missing;
class Zip;
var Count;
format Zip zipfmt.;
output out=zipcount_summary
sum=;
run;
proc print; run;
Sorry, I should have been more clear. Once you have the data summarized by region, how can I link the zip codes with a map file, say sashelp.zipcode and only display regions rather than zip codes. Basically I want to define my map parameter by zip code, but display region rather than zip code, using proc gmap.
What I am trying to do is similar to what is done in this example:
http://support.sas.com/kb/24/902.html
But instead of writing out all 40,000 zipcodes, could I just group them somehow so the regions are assigned by a region field that is related to a zip code.
Can't you just use the zipstate function (see http://support.sas.com/kb/24/803.html ) and then do the region assignments as was done in the example that you showed?
Some of my regions are not defined by states unfortunately. California is split between two regions for example. That is why I thought of defining my regions by zip code.
Doesn't Art have the right idea here? What if you were to use the code in the sample program, but change from ID STATE to ID ZIP? Isn't that exactly what you need? (You might have to sum up your COUNT variable separately.)
If I unserstand it right, you want to drill down from region to zip codes and color it on a map and remove the inerior boundaries for regions. Do these regions have discontiguous areas?
You will probably need shape files for zip codes (www.census.gov).
For example:
Download the following file from www.census.gov web site (2010 TIGER/Line® Shapefiles: )
2012 TIGER/Line Shapefiles: ZIP Code Tabulation Areas for Michigan.
You can download the shape files for other states as well and define your zip code to region format.
proc format;
value $zipfmt
'48001'-'48999' = 'Region 1'
'49000'-'49999' = 'Region 2'
other = 'Region 3';
run;
proc mapimport datafile="D:\Temp\Zip\tl_2010_26_zcta510.shp"
out=mi;
run;
proc sql;
create table mi_1 as
select -1*x as x,
y,
ZCTA5CE10 as Zip,
Segment,
put(ZCTA5CE10,$zipfmt.) as Region
from mi;
run;
proc sort data=mi_1;
by Region Zip Segment;
run;
proc gremove data=mi_1 out=mi_2;
by Region;
id Zip Segment;
run;
proc gproject data=mi_2 out=mi_prj degrees;
id Region;
run;
goptions device=png htext=3 gunit=pct cback=verylightmoderateyellow;
title;
proc gmap data=mi_prj map=mi_prj;
id Region;
choro Region / discrete
coutline=Black
nolegend;
run;
And here's a similar example to Alpay's, using a county map (which is shipped with SAS/Graph), rather than zip code maps - it plots the map at a couple of points along the way, so you can see how we're building up the final map:
data my_map; set maps.uscounty (where=(fipstate(state) in ('NC' 'VA')));
original_order=_n_;
statecode=fipstate(state);
run;
proc gmap data=my_map map=my_map;
id state county;
choro state / discrete;
run;
data new_map; set my_map;
if statecode='VA' then region='A';
if statecode='NC' then region='B';
/* let's say these 4 counties in NC are really in region 'A' (with Virginia) */
if statecode='NC' and county in (77 69 181 185) then region='A';
run;
proc gmap data=new_map map=maps.uscounty;
id state county;
choro region;
run;
proc sort data=new_map out=new_map;
by region original_order;
run;
proc gremove data=new_map out=new_map;
by region;
id state county;
run;
proc gmap data=new_map map=new_map;
id region;
choro region;
run;
The methods that alpay and Robert listed work, I guess I'm just struggling with how to convert the 40k plus zip codes into an array. Alpay's example assumes region include sequential zip codes, which mine do not. Here is an output of a proc means to show the range:
Analysis Variable : ZIP The 5-digit ZIP Code | |||
Region | N | Minimum | Maximum |
Region1 | 5802 | 7002 | 61378 |
Region2 | 9786 | 544 | 88595 |
Region3 | 154 | 99501 | 99950 |
Region4 | 5191 | 19930 | 74966 |
Region5 | 7386 | 1005 | 78962 |
Region6 | 4709 | 59001 | 99403 |
Region7 | 1649 | 24314 | 49289 |
Region8 | 1159 | 501 | 93599 |
Region9 | 5498 | 39426 | 96158 |
Robert's example assumes that you only want to change a handful amount of zip or counties, rather then setting the region by a range of zip codes. I ran a proc transpose to convert my data to be in columns with each row representing a region. I'm thinking maybe I can create an array of each row, though some regions have nearly 10k records and I'm not sure what the limit would be. That way I could use Robert's method to set my regions by the zipcodes in my data.
If you want to use Alpay's code and need to recreate this portion there's way to do that without using ranges:
proc format;
value $zipfmt
'48001'-'48999' = 'Region 1'
'49000'-'49999' = 'Region 2'
other = 'Region 3';
run;
My guess is you currently have data in the form something like the following:
Region ZipCode
1 48001
2 28003
3 34564
1 48002
Does that sound correct? If so you can create a format then use Alpay's code a few days ago, June 7 rather than mess with an array.
Correct, but my regions don't contain continuous zip codes.
I can't do this:
proc format;
value $zipfmt
'07002'-'61378' = 'Region 1'
'00544'-'88595' = 'Region 2'
run;
because the regions would overlap. That is why I was thinking of an array.
You can create a format from your data set using proc format and specifying your format data set (cntlin option).
For further explanation please see the documentation for proc format in help files.
I have not tested the following code but it gives an idea as a starting point.
data zipfile;
input Region ZipCode $;
datalines;
1 48001
2 28003
3 34564
1 48002
;
run;
data zipfmt;
set zipfile;
retain zipfmt '$zipfmt';
Start =Zipcode;
Label = Region;
keep fmtname Start Label;
run;
proc format cntlin=zipfmt; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.