BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
righcoastmike
Quartz | Level 8

Hi All, 

 

I feel like this is a pretty straightforward one, but I am still pretty new at data steps, so I thought I would call for backup. 

 

Here's the situation. 

 

I have a rather large dataset that has multiple records for about 80,000 individuals. Each idividual has anywhere from 1-15 different records associated with their ID. What I'm trying to determine is how often postal code changes among people with multiple records

 

Ideally the output would be a table with one record per ID with an indicator variable telling me if all the postal codes for a person are the same, or if there is one that is different. 

 

for example, I would like this: 

 

PATIENT ID      POSTAL CODE

        1                      M4R

        1                      M4R

        1                      M4R

        2                      M4T

        2                      M4X

        2                      M4T

 

to turn into this:

 

PATIENT ID     POSTAL CODE CHANGE

        1                            'NO' or 0      

        2                            'YES' or 1 

 

I assume the first step would be to sort by patient ID and by Postal Code, followed by so sort of "if"/ "then" statement, but I'm having trouble sorting out exactly how that statement should look. 

 

Any thougths would be much appreciated! 

 

Thanks so much. 

 

Mike

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I'd go about it slightly different:

(make sure that dataset is sorted by patient_id)

data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;

data want;
set have;
by patient_id;
retain code_change;
if post_code ne lag(post_code) then code_change = 1;
if first.patient_id then code_change = 0;
if last.patient_id then output;
keep patient_id code_change;
run;

proc print data=want noobs;
run;

Result:

patient_     code_
   id       change

    1          0  
    2          1  

Note how I presented your example data in a simple data step. Please do so in the future.

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Here's a data step version of such an analysis, I could also imagine that you could write this in PROC SQL

 

/* UNTESTED CODE */

data want;
    set have;
    by patient_id; /* Assumes data set have is sorted by patient_id */
    if first.patient_id then count=0;
    prev_postal_code=lag(postal_code);
    if prev_postal_code^=postal_code and not first.patient_id then count+1;
run;
--
Paige Miller
Kurt_Bremser
Super User

I'd go about it slightly different:

(make sure that dataset is sorted by patient_id)

data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;

data want;
set have;
by patient_id;
retain code_change;
if post_code ne lag(post_code) then code_change = 1;
if first.patient_id then code_change = 0;
if last.patient_id then output;
keep patient_id code_change;
run;

proc print data=want noobs;
run;

Result:

patient_     code_
   id       change

    1          0  
    2          1  

Note how I presented your example data in a simple data step. Please do so in the future.

Astounding
PROC Star

You could squeeze just a bit more information out of the data in this way:

 

proc sort data=have;

  by patient_id postal_code;

run;

 

data want;

set have;

by patient_id postal_code;

if first.patient_id then count=0;

if last.postal_code then count + 1;

if last.patient_id;

keep patient_id count;

run;

 

That way, you get more than a yes/no, you get a count of how many postal codes exist for each patient.

Ksharp
Super User
data have;
input patient_id post_code $;
cards;
1 M4R
1 M4R
1 M4R
2 M4T
2 M4X
2 M4T
;
run;
proc sql;
select  patient_id,case when count(distinct post_code)=1 then 0 else 1 end as flag
 from have 
  group by  patient_id;
quit;


righcoastmike
Quartz | Level 8

Thanks so much everyone. It's crazy to see how many different ways there are to get to the same output. 

 

Also, thanks for the heads up about how to present data in the future. I can see how that would make life much easier for people offering solutions. 

 

The help of this community is always much appreciated! 

 

Mike 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1046 views
  • 4 likes
  • 5 in conversation