Solved: Re: SAS programming - Remove duplicates

Pranshu · Posted 05-08-2018 05:23 AM

Hi All,

I have the following data based on distance between the cities.

Source	Destination	Distance
USA	UK	1000
USA	Spain	200
UK	USA	1000
Germany	Spain	500
Spain	USA	200

I want to remove the duplicates where source and destination are same. For Example USA to UK will be same as UK to USA and hence the duplicate value needs to be removed.

Following is the desired output.

Source	Destination	Distance
USA	UK	1000
USA	Spain	200
Germany	Spain	500

Ksharp · Posted 05-08-2018 08:39 AM

data have;
infile cards expandtabs;
input Source $ Destination $ Distance;
cards;
USA	UK	1000
USA	Spain	200
UK	USA	1000
Germany	Spain	500
Spain	USA	200
;
run;
data temp;
 set have;
 call sortc(Source,Destination);
run;
proc sort data=temp out=want nodupkey;
by Source Destination;
run;
proc print;run;

View solution in original post

tomrvincent · Posted 05-08-2018 08:03 AM

concatenate source and destination in alphabetical order and then pick max distance for each pair.

Ksharp · Posted 05-08-2018 08:39 AM

data have;
infile cards expandtabs;
input Source $ Destination $ Distance;
cards;
USA	UK	1000
USA	Spain	200
UK	USA	1000
Germany	Spain	500
Spain	USA	200
;
run;
data temp;
 set have;
 call sortc(Source,Destination);
run;
proc sort data=temp out=want nodupkey;
by Source Destination;
run;
proc print;run;

Pranshu · Posted 05-09-2018 01:23 AM

Thanks for your response. This really solved my query.

Tom · Posted 05-08-2018 08:53 AM

Why not just take those where Source < Destination?

Ksharp · Posted 05-08-2018 09:13 AM

If there is only one obs, and Source > Destination?

Tom · Posted 05-08-2018 09:23 AM

Add
call sortc(source,destination);
Before IF statement.

Classroom Training Available!