Adding a variable to determine whether a postal code changes

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Adding a variable to determine whether a postal code changes

Hi All, 

 

I feel like this is a pretty straightforward one, but I am still pretty new at data steps, so I thought I would call for backup. 

 

Here's the situation. 

 

I have a rather large dataset that has multiple records for about 80,000 individuals. Each idividual has anywhere from 1-15 different records associated with their ID. What I'm trying to determine is how often postal code changes among people with multiple records

 

Ideally the output would be a table with one record per ID with an indicator variable telling me if all the postal codes for a person are the same, or if there is one that is different. 

 

for example, I would like this: 

 

PATIENT ID      POSTAL CODE

        1                      M4R

        1                      M4R

        1                      M4R

        2                      M4T

        2                      M4X

        2                      M4T

 

to turn into this:

 

PATIENT ID     POSTAL CODE CHANGE

        1                            'NO' or 0      

        2                            'YES' or 1 

 

I assume the first step would be to sort by patient ID and by Postal Code, followed by so sort of "if"/ "then" statement, but I'm having trouble sorting out exactly how that statement should look. 

 

Any thougths would be much appreciated! 

 

Thanks so much. 

 

Mike


Accepted Solutions
Solution
‎08-23-2017 11:47 AM
Super User
Posts: 8,053

Re: Adding a variable to determine whether a postal code changes

Posted in reply to righcoastmike

I'd go about it slightly different:

(make sure that dataset is sorted by patient_id)

data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;

data want;
set have;
by patient_id;
retain code_change;
if post_code ne lag(post_code) then code_change = 1;
if first.patient_id then code_change = 0;
if last.patient_id then output;
keep patient_id code_change;
run;

proc print data=want noobs;
run;

Result:

patient_     code_
   id       change

    1          0  
    2          1  

Note how I presented your example data in a simple data step. Please do so in the future.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Trusted Advisor
Posts: 1,977

Re: Adding a variable to determine whether a postal code changes

[ Edited ]
Posted in reply to righcoastmike

Here's a data step version of such an analysis, I could also imagine that you could write this in PROC SQL

 

/* UNTESTED CODE */

data want;
    set have;
    by patient_id; /* Assumes data set have is sorted by patient_id */
    if first.patient_id then count=0;
    prev_postal_code=lag(postal_code);
    if prev_postal_code^=postal_code and not first.patient_id then count+1;
run;
--
Paige Miller
Solution
‎08-23-2017 11:47 AM
Super User
Posts: 8,053

Re: Adding a variable to determine whether a postal code changes

Posted in reply to righcoastmike

I'd go about it slightly different:

(make sure that dataset is sorted by patient_id)

data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;

data want;
set have;
by patient_id;
retain code_change;
if post_code ne lag(post_code) then code_change = 1;
if first.patient_id then code_change = 0;
if last.patient_id then output;
keep patient_id code_change;
run;

proc print data=want noobs;
run;

Result:

patient_     code_
   id       change

    1          0  
    2          1  

Note how I presented your example data in a simple data step. Please do so in the future.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,605

Re: Adding a variable to determine whether a postal code changes

Posted in reply to righcoastmike

You could squeeze just a bit more information out of the data in this way:

 

proc sort data=have;

  by patient_id postal_code;

run;

 

data want;

set have;

by patient_id postal_code;

if first.patient_id then count=0;

if last.postal_code then count + 1;

if last.patient_id;

keep patient_id count;

run;

 

That way, you get more than a yes/no, you get a count of how many postal codes exist for each patient.

Super User
Posts: 10,123

Re: Adding a variable to determine whether a postal code changes

Posted in reply to righcoastmike
data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;
proc sql;
select  patient_id,case when count(distinct post_code)=1 then 0 else 1 end as flag
 from have 
  group by  patient_id;
quit;


Occasional Contributor
Posts: 15

Re: Adding a variable to determine whether a postal code changes

Posted in reply to righcoastmike

Thanks so much everyone. It's crazy to see how many different ways there are to get to the same output. 

 

Also, thanks for the heads up about how to present data in the future. I can see how that would make life much easier for people offering solutions. 

 

The help of this community is always much appreciated! 

 

Mike 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 144 views
  • 4 likes
  • 5 in conversation