Solved: Adding a variable to determine whether a postal code changes

righcoastmike · Posted 08-23-2017 08:40 AM

Hi All,

I feel like this is a pretty straightforward one, but I am still pretty new at data steps, so I thought I would call for backup.

Here's the situation.

I have a rather large dataset that has multiple records for about 80,000 individuals. Each idividual has anywhere from 1-15 different records associated with their ID. What I'm trying to determine is how often postal code changes among people with multiple records

Ideally the output would be a table with one record per ID with an indicator variable telling me if all the postal codes for a person are the same, or if there is one that is different.

for example, I would like this:

PATIENT ID POSTAL CODE

1 M4R

2 M4T

2 M4X

2 M4T

to turn into this:

PATIENT ID POSTAL CODE CHANGE

1 'NO' or 0

2 'YES' or 1

I assume the first step would be to sort by patient ID and by Postal Code, followed by so sort of "if"/ "then" statement, but I'm having trouble sorting out exactly how that statement should look.

Any thougths would be much appreciated!

Thanks so much.

Mike

Kurt_Bremser · Posted 08-23-2017 08:56 AM

I'd go about it slightly different:

(make sure that dataset is sorted by patient_id)

data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;

data want;
set have;
by patient_id;
retain code_change;
if post_code ne lag(post_code) then code_change = 1;
if first.patient_id then code_change = 0;
if last.patient_id then output;
keep patient_id code_change;
run;

proc print data=want noobs;
run;

Result:

patient_     code_
   id       change

    1          0  
    2          1

Note how I presented your example data in a simple data step. Please do so in the future.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PaigeMiller · Posted 08-23-2017 08:48 AM

Here's a data step version of such an analysis, I could also imagine that you could write this in PROC SQL

/* UNTESTED CODE */

data want;
    set have;
    by patient_id; /* Assumes data set have is sorted by patient_id */
    if first.patient_id then count=0;
    prev_postal_code=lag(postal_code);
    if prev_postal_code^=postal_code and not first.patient_id then count+1;
run;

--
Paige Miller

Kurt_Bremser · Posted 08-23-2017 08:56 AM

I'd go about it slightly different:

(make sure that dataset is sorted by patient_id)

data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;

data want;
set have;
by patient_id;
retain code_change;
if post_code ne lag(post_code) then code_change = 1;
if first.patient_id then code_change = 0;
if last.patient_id then output;
keep patient_id code_change;
run;

proc print data=want noobs;
run;

Result:

patient_     code_
   id       change

    1          0  
    2          1

Note how I presented your example data in a simple data step. Please do so in the future.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 08-23-2017 08:57 AM

You could squeeze just a bit more information out of the data in this way:

proc sort data=have;

by patient_id postal_code;

run;

data want;

set have;

by patient_id postal_code;

if first.patient_id then count=0;

if last.postal_code then count + 1;

if last.patient_id;

keep patient_id count;

run;

That way, you get more than a yes/no, you get a count of how many postal codes exist for each patient.

Ksharp · Posted 08-23-2017 09:39 AM

data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;
proc sql;
select  patient_id,case when count(distinct post_code)=1 then 0 else 1 end as flag
 from have 
  group by  patient_id;
quit;

righcoastmike · Posted 08-23-2017 11:53 AM

Thanks so much everyone. It's crazy to see how many different ways there are to get to the same output.

Also, thanks for the heads up about how to present data in the future. I can see how that would make life much easier for people offering solutions.

The help of this community is always much appreciated!

Mike

Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

Re: Adding a variable to determine whether a postal code changes

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away