Re: How to remove overlapping observations when the variable is coded ...

lizwarr · Posted 03-03-2020 04:30 AM

Hi, I have a dataset that looks something like below:

subject year disease

1 2008 123

1 2009 1234

1 2010 A123

2 2009 2234

2 2009 3334

2 2010 A234

2 2010 A334

2 2010 A444

So basically, 123, 1234 and A123 are same disease but was coded differently every year.

I want to keep only the last observation of each disease per subject.

Like this:

subject year disease

1 2010 A123

2 2010 A234

2 2010 A334

2 2010 A444

Could anyone help me how to solve this??

PeterClemmensen · Posted 03-03-2020 04:41 AM

Here is one way

data have;
input subject year disease $;
datalines;
1 2008 123
1 2009 1234
1 2010 A123
2 2009 2234
2 2009 3334
2 2010 A234
2 2010 A334
2 2010 A444
;

data want (drop=y);
    do _N_ = 1 by 1 until (last.subject);
        set have;
        by subject;
        y = year;
    end;
    do _N_ = 1 to _N_;
        set have;
        if year=y then output;
    end;
run;

Result:

subject year  disease 
1       2010  A123 
2       2010  A234 
2       2010  A334 
2       2010  A444

The DATA to DATA Step Macro
Blog: SASnrd

ed_sas_member · Posted 03-03-2020 05:06 AM

Hi @lizwarr

You can also use a PROC SQL to retrieve those observations:

proc sql;
	create table want as
	select *
	from have
	group by subject
	having year=max(year);
quit;

JackHamilton · Posted 03-03-2020 01:35 PM

Did you simplify the problem? Are there going to be more entries in the input data sets, with additional diseases having multiple codes?

If so, I think you'll need to have a mapping table (or format) mapping all of the codes to uniform disease names. The code will be more complicated.

At first glance I thought this might be an opportunity to take advantage of the rarely-used GROUPFORMAT option, but on reflection I don't think there would be any benefit.

lizwarr · Posted 03-10-2020 01:59 AM

Indeed, I think I may have over-simplified the problem. It's data of hundreds of patients followed for 9 years with all kinds of chronic disease. I do have an excel file with codes of each year, but it's just that there are so many disease, I could't imagine unifying the codes manually...

Patrick · Posted 03-11-2020 06:32 AM

@lizwarr wrote:

Indeed, I think I may have over-simplified the problem. It's data of hundreds of patients followed for 9 years with all kinds of chronic disease. I do have an excel file with codes of each year, but it's just that there are so many disease, I could't imagine unifying the codes manually...

In this Excel file is there any logical way to determine if two codes from two different years represent the same disease? If so then it wouldn't be too hard to create such a mapping table. We could actually simply generate a SAS Format which maps all code variations to the latest version.

Above approach is of course only possible if there is a 1:1 relationship between the code variations over the years. Things will get much harder if codes change because of changing categorisations/higher differentiation of diseases like in year 1 a decease has code A but in year two it gets either code A1 or code A2 because coding becomes more differentiated. So to what would you now map code A from year 1? Code A1 or code A2? ...not a decease but just thinking how coding for a person within the autism spectrum might have changed over the years.

How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Re: How to remove overlapping observations when the variable is coded differently by input year

Registration is open