Solved: Re: Remove Duplicates- Times Series Data

kaythth · Posted 09-11-2017 05:50 PM

Hello!

I am running my head into a wall trying to figure out how to remove select duplicates from my dataset. I want to track locations by a unique identifier, but I dont want to include repeat data from consecutive entries from the same location. Here is an example of the data I have:

ID coll_Dt Fac

1 1/12/17 A

1 1/20/17 A

1 5/6/17 B

1 6/5/17 A

1 7/8/17 C

2 1/26/17 B

2 2/5/17 B

2 4/15/17 C

2 5/2/17 C

2 5/29/17 B

3 2/20/17 A

3 4/19/17 B

3 5/16/17 B

3 6/8/17 C

3 8/1/17 A

And this is what I would like the location to look like: (the entries in bold above removed)

ID coll_dt Fac

1 1/12/17 A

1 5/6/17 B

1 6/5/17 A

1 7/8/17 C

2 1/26/17 B

2 4/15/17 C

2 5/29/17 B

3 2/20/17 A

3 4/19/17 B

3 6/8/17 C

3 8/1/17 A

Nodupkey doesnt work becuase I want to keep duplicate locations for the same ID#- just not when they appear consecutively. I have tried using:

data a; set a;

by id coll_dt fac;

if first.fac;

run;

but that doesnt seem to work either.

Please help!

Thank you 🙂

ballardw · Posted 09-11-2017 06:18 PM

data want;

set a;

by id fac NOTSORTED coll_dt ;

if first.fac;

run;

View solution in original post

Reeza · Posted 09-11-2017 06:03 PM

I think you need the NOTSORTED option.

data a; set a; 
by id coll_dt fac NOTSORTED; 
if first.fac; 
run;

kaythth · Posted 09-11-2017 06:12 PM

That did not work either, sadly 😞

ballardw · Posted 09-11-2017 06:18 PM

data want;

set a;

by id fac NOTSORTED coll_dt ;

if first.fac;

run;

kaythth · Posted 09-11-2017 06:40 PM

Thank you!!

Remove Duplicates- Times Series Data

Re: Remove Duplicates- Times Series Data

Re: Remove Duplicates- Times Series Data

Re: Remove Duplicates- Times Series Data

Re: Remove Duplicates- Times Series Data

Re: Remove Duplicates- Times Series Data

Click image to register for webinar

Classroom Training Available!