BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
UMAnalyst
Obsidian | Level 7

Working with a large dataset of geocode results and I would like clean the zip codes by applying the zip of the more (or most) accurate values (indicated by SCORE) within each group*. Thanks as always.
*by id, street, city, and st.

 

HAVE:
ID      SCORE  STREET               CITY            ST  ZIP     YEAR
1001  75         123 MAIN ST        MYTOWN    MI  48009  2002
1001  50         123 MAIN ST        MYTOWN    MI  48999  2004
1001  75         123 MAIN ST        MYTOWN    MI  48009  2006
1001  50         123 MAIN ST        MYTOWN    MI  48999  2008
1001  75         123 MAIN ST        MYTOWN    MI  48009  2010
1001  60         999 WE MOVED   XTOWN      CA  94903  2012
1001  60         999 WE MOVED   XTOWN      CA  94903  2014

 

WANT:

ID      SCORE  STREET               CITY            ST  ZIP     YEAR
1001  75         123 MAIN ST        MYTOWN    MI  48009  2002
1001  50         123 MAIN ST        MYTOWN    MI  48009  2004
1001  75         123 MAIN ST        MYTOWN    MI  48009  2006
1001  50         123 MAIN ST        MYTOWN    MI  48009  2008
1001  75         123 MAIN ST        MYTOWN    MI  48009  2010
1001  60         999 WE MOVED   XTOWN      CA  94903  2012
1001  60         999 WE MOVED   XTOWN      CA  94903  2014

 

data have;
input id $ score street $9-20 city $ state $ zip $ year;
datalines;
1001 75 123 MAIN ST MYTOWN MI 48009 2002
1001 50 123 MAIN ST MYTOWN MI 48999 2004
1001 75 123 MAIN ST MYTOWN MI 48009 2006
1001 50 123 MAIN ST MYTOWN MI 48999 2008
1001 75 123 MAIN ST MYTOWN MI 48009 2010
1001 60 999 WE MOVED XTOWN CA 94903 2012
1001 60 999 WE MOVED XTOWN CA 94903 2014
;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User
* step 1: assign number to keep original order;

data int;
set have;
n = _N_;
run;

* step 2: sort;

proc sort data=int;
by id state city street descending score;
run;

* step 3 replace zips;

data int2;
set int;
by id state city street;
retain int_zip;
if first.street
then int_zip = zip;
else zip = int_zip;
run;

* step 4: recreate original order in final step;

proc sort
  data=int2
  out=want (drop=int_zip n)
;
by n;
run;

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User
* step 1: assign number to keep original order;

data int;
set have;
n = _N_;
run;

* step 2: sort;

proc sort data=int;
by id state city street descending score;
run;

* step 3 replace zips;

data int2;
set int;
by id state city street;
retain int_zip;
if first.street
then int_zip = zip;
else zip = int_zip;
run;

* step 4: recreate original order in final step;

proc sort
  data=int2
  out=want (drop=int_zip n)
;
by n;
run;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its not clear to me why the records as number 50 in the below should be changed or this:

"the more (or most) accurate values (indicated by SCORE)"

Do you meant the max number within that group is the one to replace, if so and there is only one combination of ID+max(SCORE)+ZIP then you could just merge is back on with:

proc sql;
  create table WANT as
  select  A.ID,
          A.SCORE,
          A.STREET,
          A.CITY,
          A.STATE,
          B.ZIP,
          A.YEAR
  from    HAVE A
  left join (select distinct ID,ZIP from HAVE group by ID having SCORE=max(SCORE)) B
  on      A.ID=B.ID;
quit;

However, what if you have:

1  75  A8001

1  75  A8002

So 75 is the max, but has two differing postcode, which one to use?  I would probably do a sort, then take first record, something like:

proc sort data=have out=inter;
  by id descending score;
run;

data inter;
  set inter;
  by id;
  if first.id then output;
run;

And merge that on, you can add the other variables into the sort, say zipcode for first one etc.

 

Darrell_sas
SAS Employee

If you are running at least 9.4M1, then Proc GEOCODE will return the ZIP Code used in the results in M_ZIP.  So, while ZIP is 48999, M_ZIP will be 48009.  You can look at the variable _NOTES_.  If it contains NOZCM (No Zip Code Match) then the ZIP Code given by "you" does not match the ZIP Code of the address (city and state).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1653 views
  • 2 likes
  • 4 in conversation