Data visualization with SAS programming

Transform Data by Zip Code into US Regions

Reply
Frequent Contributor
Posts: 84

Transform Data by Zip Code into US Regions

I have a data set that is listed out by zip code as the mapping coordinate but I would like to group those zip codes into a small number of regions (say 5 or 10).  Is there a way to do this using the zipcodes or do I need to develop a shape file to define the regions?

For example, my data is similar to this:

ZipRegion
Count
90001South50
90002South75
90003South60
90004West75
90005East100

Can I use the zip codes to define the Regions and then sum the counts to display on the map?  Or do I need to create something similar to the coordinates in the maps.us table to define regions?  Thank you in advance for any help provided.

Grand Advisor
Posts: 17,332

Re: Transform Data by Zip Code into US Regions

You can use the sashelp.zip file to merge the zips into county or states. If you want different regions you need to figure out a way to group them individually, thought I'd think county would be a good way to start (knowing very little about this b/c I'm Canadian).

Frequent Contributor
Posts: 95

Re: Transform Data by Zip Code into US Regions

If you have zip code and regions defined in a file like sashelp.zipcode you can create a format and use it to summarize your counts.

Zafer

Example:

data zipregion;

    input Zip    Region $;

datalines;

90001 South

90002 South

90003 South

90004 West

90005 East

;

run;

data zipcount;

    input Zip    Count;

datalines;

90001 50

90002 75

90003 60

90004 75

90005 100

;

run;

/* If you have zip codes and regions defined in a data set */

data zipfmt;

    set zipregion;

    retain fmtname 'zipfmt';

    Start = Zip;

    Label = Region;

    keep fmtname Start Label;

run;

proc format cntlin=zipfmt;

run;

proc summary data=zipcount nway missing;

    class Zip;

    var Count;

    format Zip zipfmt.;

    output out=zipcount_summary

    sum=;

run;

proc print; run;

Frequent Contributor
Posts: 84

Re: Transform Data by Zip Code into US Regions

Sorry, I should have been more clear.  Once you have the data summarized by region, how can I link the zip codes with a map file, say sashelp.zipcode and only display regions rather than zip codes.  Basically I want to define my map parameter by zip code, but display region rather than zip code, using proc gmap.

Frequent Contributor
Posts: 84

Re: Transform Data by Zip Code into US Regions

What I am trying to do is similar to what is done in this example:

http://support.sas.com/kb/24/902.html

But instead of writing out all 40,000 zipcodes, could I just group them somehow so the regions are assigned by a region field that is related to a zip code.

Esteemed Advisor
Posts: 7,293

Re: Transform Data by Zip Code into US Regions

Can't you just use the zipstate function (see http://support.sas.com/kb/24/803.html ) and then do the region assignments as was done in the example that you showed?

Frequent Contributor
Posts: 84

Re: Transform Data by Zip Code into US Regions

Some of my regions are not defined by states unfortunately.  California is split between two regions for example.  That is why I thought of defining my regions by zip code.

Respected Advisor
Posts: 4,972

Re: Transform Data by Zip Code into US Regions

Doesn't Art have the right idea here?  What if you were to use the code in the sample program, but change from ID STATE to ID ZIP?  Isn't that exactly what you need?  (You might have to sum up your COUNT variable separately.)

Frequent Contributor
Posts: 95

Re: Transform Data by Zip Code into US Regions

If I unserstand it right, you want to drill down from region to zip codes and color it on a map and remove the inerior boundaries for regions. Do these regions have discontiguous areas?

You will probably need shape files for zip codes (www.census.gov).

Frequent Contributor
Posts: 95

Re: Transform Data by Zip Code into US Regions

For example:

Download the following file from www.census.gov web site (2010 TIGER/Line® Shapefiles: )

2012 TIGER/Line Shapefiles: ZIP Code Tabulation Areas for Michigan.

You can download the shape files for other states as well and define your zip code to region format.

proc format;

    value $zipfmt

    '48001'-'48999' = 'Region 1'

    '49000'-'49999' = 'Region 2'

    other = 'Region 3';

run;

proc mapimport datafile="D:\Temp\Zip\tl_2010_26_zcta510.shp"

    out=mi;

run;

proc sql;

    create table mi_1 as

    select -1*x as x,

    y,

    ZCTA5CE10 as Zip,

    Segment,

    put(ZCTA5CE10,$zipfmt.) as Region

    from mi;

run;

proc sort data=mi_1;

by Region Zip Segment;

run;

proc gremove data=mi_1 out=mi_2;

  by Region;

  id Zip Segment;

run;

proc gproject data=mi_2 out=mi_prj degrees;

id Region;

run;

goptions device=png htext=3 gunit=pct cback=verylightmoderateyellow;

title;

proc gmap data=mi_prj map=mi_prj;

  id Region;

  choro Region / discrete

                  coutline=Black

                  nolegend;

run;

SAS Employee
Posts: 967

Re: Transform Data by Zip Code into US Regions

And here's a similar example to Alpay's, using a county map (which is shipped with SAS/Graph), rather than zip code maps - it plots the map at a couple of points along the way, so you can see how we're building up the final map:

data my_map; set maps.uscounty (where=(fipstate(state) in ('NC' 'VA')));
original_order=_n_;
statecode=fipstate(state);
run;

proc gmap data=my_map map=my_map;
id state county;
choro state / discrete;
run;

data new_map; set my_map;
if statecode='VA' then region='A';
if statecode='NC' then region='B';

/* let's say these 4 counties in NC are really in region 'A' (with Virginia) */
if statecode='NC' and county in (77 69 181 185) then region='A';
run;

proc gmap data=new_map map=maps.uscounty;
id state county;
choro region;
run;

proc sort data=new_map out=new_map;
by region original_order;
run;


proc gremove data=new_map out=new_map;
by region;
id state county;
run;

proc gmap data=new_map map=new_map;
id region;
choro region;
run;

Frequent Contributor
Posts: 84

Re: Transform Data by Zip Code into US Regions

The methods that alpay and Robert listed work, I guess I'm just struggling with how to convert the 40k plus zip codes into an array.  Alpay's example assumes region include sequential zip codes, which mine do not.  Here is an output of a proc means to show the range:

Analysis Variable : ZIP The 5-digit ZIP Code
RegionNMinimumMaximum
Region15802700261378
Region2978654488595
Region31549950199950
Region451911993074966
Region57386100578962
Region647095900199403
Region716492431449289
Region8115950193599
Region954983942696158

Robert's example assumes that you only want to change a handful amount of zip or counties, rather then setting the region by a range of zip codes.  I ran a proc transpose to convert my data to be in columns with each row representing a region.  I'm thinking maybe I can create an array of each row, though some regions have nearly 10k records and I'm not sure what the limit would be.  That way I could use Robert's method to set my regions by the zipcodes in my data.

Grand Advisor
Posts: 17,332

Re: Transform Data by Zip Code into US Regions

If you want to use Alpay's code and need to recreate this portion there's way to do that without using ranges:

proc format;

    value $zipfmt

    '48001'-'48999' = 'Region 1'

    '49000'-'49999' = 'Region 2'

    other = 'Region 3';

run;

My guess is you currently have data in the form something like the following:

Region ZipCode

1    48001

2    28003

3    34564

1    48002

Does that sound correct? If so you can create a format then use Alpay's code a few days ago, June 7 rather than mess with an array.

Frequent Contributor
Posts: 84

Re: Transform Data by Zip Code into US Regions

Correct, but my regions don't contain continuous zip codes.

I can't do this:

proc format;

    value $zipfmt

    '07002'-'61378' = 'Region 1'

    '00544'-'88595' = 'Region 2'

run;

because the regions would overlap.  That is why I was thinking of an array.

Frequent Contributor
Posts: 95

Re: Transform Data by Zip Code into US Regions

You can create a format from your data set using proc format and specifying your format data set (cntlin option).

For further explanation please see the documentation for proc format in help files.

I have not tested the following code but it gives an idea as a starting point.

data zipfile;

input Region ZipCode $;

datalines;

1    48001

2    28003

3    34564

1    48002

;

run;

data zipfmt;

  set zipfile;

  retain zipfmt '$zipfmt';

  Start =Zipcode;

  Label = Region;

  keep fmtname Start Label;

run;

proc format cntlin=zipfmt; run;

Ask a Question
Discussion stats
  • 24 replies
  • 2393 views
  • 2 likes
  • 7 in conversation