- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HI
i have a table that has duplicate entries here is and example.
ID. Lat. Long
001. 0 0
001. 33.4. -112.221
i would want to remove the duplicate that has 0 for lat and long and keep the entry where there is a lat and long.
here is a snippet of what I'm using
proc sort data= table1
out=table1
nodupkey;
where lat <> 0;
by ID;
which works except it also removes some id that's are not dups but have a 0 in the lat and long columns ....thanks again for assistance
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sort first descending by lat then use nodupkey. This will push the observations with values before the ones with 0.
proc sort data=have; by id descending lat;
run;
proc sort data=have nodupkey; by id;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sort first descending by lat then use nodupkey. This will push the observations with values before the ones with 0.
proc sort data=have; by id descending lat;
run;
proc sort data=have nodupkey; by id;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The possibility of negative values can make this tricky. If you can guarantee that each ID has at most one record with all 0s, here's a way to do it:
proc sort data=have;
by id;
run;
data want;
merge have (where=(lat=0 and long=0))
have (where=(lat ne 0 or long ne 0));
by id;
run;
Good luck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
True, but latitude and longitude have very well defined transition points to negative/positive. If I'm dealing with Canadian data for example, I know that all my latitudes will be positive.
For countries that cross boundaries another solution would be required.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How about this:
data have; input ID $ Lat Long ; cards; 001. 0 0 001. 33.4 -112.221 002. 0 0 ; run; proc sort data=have; by id; run; data want; set have; by id; if lat = 0 and not(first.id and last.id) then delete; run;
Ksharp