BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AshPatel
Fluorite | Level 6

Hello there, Can someone help me with the following.  I have the following two datasets.  Ned to join Dataset_1 with Dataset_2 on ZIP and City.  Need to match only once based on the shortest Distance_mile. If match, then create a dataset (e.g. matched), If do not match, then create a dataset (e.g. Notmatched).  Again, if Clientno match twice, I need to pick only the one with the shortest distance.  Sorry, the Dataset_2 is huge, so only sample fake data here

 

Dataset_1       Dataset_2    
ZIP ClientNo City   ZIP SEQ Distance_Miles City
14215 A52498 Buffalo   14215 1 1.2 Sloan
14213 A09533 Buffalo   14215 2 2.0 Amherst
14207 A63397 Buffalo   14215 3 2.5 Buffalo
14216 A77384 Buffalo   14215 4 6.0 Seneca
14220 A73552 Buffalo   14215 5 10.0 Kenmore
14214 A31165 Buffalo   14213 1 0.4 Buffalo
14211 A06391 Bristol   14213 2 1.5 Kenmore
14206 A71649 Bristol   14213 3 8.0 Blasdell
14222 A66949 Bristol   14213 4 12.0 Buffalo
14210 A43865 Bristol   14213 5 15.0 Blasdell
14201 A95572 Cedar Grove   14201 1 1.1 Cedar Grove
14208 A06957 Cedar Grove   14201 2 5.3 Sloan
14204 A91374 Cedar Grove   14201 3 9.3 Amherst
14209 A58062 Cedar Grove   14201 4 15.5 Buffalo
14212 A54498 Cedar Grove   14201 5 30.1 Mo Town
        14216 1 2.0 Cheektowaga
        14216 2 7.2 Great Hill
        14216 3 8.8 Buffalo
        14216 4 14.2 Fort Erie
        14216 5 21.6 Ajax
        14220 1 3.0 Sloan
        14220 2 4.6 Amherst
        14220 3 8.0 Cheektowaga
        14220 4 14.9 Great Hill
        14220 5 26.2 Milton
1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

If you accept the solution, please assign it as that.

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

What do you mean by "if Clientno match twice" ?

 

ClientNo is not in both data sets ?

 

Can you show us your desired result given this sample data?

Shmuel
Garnet | Level 18

You should post test data as data steps using infile method with datalines - as in my proposed code. Most contributors are short in time and will not trouble to type the data step or copy line by line as I did. Even I did not copy all 2nd data set lines. 

 

Anyway you can try and check next code:

data have1;
  infile datalines dlm='09'x truncover;
  input zip ClientNo $ city $char30.;
datalines;
14215	A52498	Buffalo
14213	A09533	Buffalo
14207	A63397	Buffalo
14216	A77384	Buffalo
14220	A73552	Buffalo
14214	A31165	Buffalo
14211	A06391	Bristol
14206	A71649	Bristol
14222	A66949	Bristol
14210	A43865	Bristol
14201	A95572	Cedar Grove
14208	A06957	Cedar Grove
14204	A91374	Cedar Grove
14209	A58062	Cedar Grove
14212	A54498	Cedar Grove
;run;
data have2;
  infile datalines dlm='09'x truncover;
  input zip SEQ Distance_Miles city $char30.;
datalines;
14215	1	1.2	Sloan
14215	2	2.0	Amherst
14215	3	2.5	Buffalo
14215	4	6.0	Seneca
14215	5	10.0	Kenmore
14213	1	0.4	Buffalo
14213	2	1.5	Kenmore
14213	3	8.0	Blasdell
14213	4	12.0	Buffalo
14213	5	15.0	Blasdell
14201	1	1.1	Cedar Grove
14201	2	5.3	Sloan
14201	3	9.3	Amherst
14201	4	15.5	Buffalo
14201	5	30.1	Mo Town
; run;
proc sort data=have1; by zip city; run;
proc sort data=have2; by zip city; run;
data temp want2;
 merge have1 (in=in1)
       have2 (in=in2);
  by zip city; 
     if in1 then do;
        if in2 then output temp; 
        else output want2;
     end;
run;
proc sort data=temp; by zip city Distance_Miles; run;
data want1;
 set temp;
  by zip city;
     if first.city;
run;
AshPatel
Fluorite | Level 6

sorry, I should have said match based on ZIP and City.  I would like to have the match based on the shortest distance (e.g. in case of zip 14213, the output would be SEQ=1 and not SEQ=3.  If there is no match for a record in Dataset_1, then need to have a seprate output for it (e.g. ZIP 14208)

Tom
Super User Tom
Super User

What is the expected output for that data?

Also please just post the data a text, much easier to handle. 

data dataset_1;
  infile cards dsd dlm='|' truncover ;
  input ZIP :$5. ClientNo :$20. City :$80. ;
cards;
14215|A52498|Buffalo
14213|A09533|Buffalo
14207|A63397|Buffalo
14216|A77384|Buffalo
14220|A73552|Buffalo
14214|A31165|Buffalo
14211|A06391|Bristol
14206|A71649|Bristol
14222|A66949|Bristol
14210|A43865|Bristol
14201|A95572|Cedar Grove
14208|A06957|Cedar Grove
14204|A91374|Cedar Grove
14209|A58062|Cedar Grove
14212|A54498|Cedar Grove
;

data dataset_2;
  infile cards dsd dlm='|' truncover;
  input ZIP :$5. SEQ Distance_Miles City :$80. ;
cards;
14215|1|1.2|Sloan
14215|2|2|Amherst
14215|3|2.5|Buffalo
14215|4|6|Seneca
14215|5|10|Kenmore
14213|1|0.4|Buffalo
14213|2|1.5|Kenmore
14213|3|8|Blasdell
14213|4|12|Buffalo
14213|5|15|Blasdell
14201|1|1.1|Cedar Grove
14201|2|5.3|Sloan
14201|3|9.3|Amherst
14201|4|15.5|Buffalo
14201|5|30.1|Mo Town
14216|1|2|Cheektowaga
14216|2|7.2|Great Hill
14216|3|8.8|Buffalo
14216|4|14.2|Fort Erie
14216|5|21.6|Ajax
14220|1|3|Sloan
14220|2|4.6|Amherst
14220|3|8|Cheektowaga
14220|4|14.9|Great Hill
14220|5|26.2|Milton
;
Shmuel
Garnet | Level 18

@AshPatel wrote:

sorry, I should have said match based on ZIP and City.  I would like to have the match based on the shortest distance (e.g. in case of zip 14213, the output would be SEQ=1 and not SEQ=3.  If there is no match for a record in Dataset_1, then need to have a separate output for it (e.g. ZIP 14208)


That is exactly what my code do. Have you run the code on your data? Have you checked the results?

 

I named the test data have1 (instead dataset_1) and have2 (instead dataset_2).

The output named want1 are the matched data, want2 the not matched.

AshPatel
Fluorite | Level 6

Yes, the solution worked perfactly.  Thank you ever so much.

Shmuel
Garnet | Level 18

If you accept the solution, please assign it as that.

AshPatel
Fluorite | Level 6

The solution that was provided earlier this week worked fine, however; I have been asked to modify the criteria as below.  I tried modifying the data step but unable to get the expected output.  I can take dividual Miles_Rank one at a time in a separate data step but in our actual data file, we have 10 unique Miles_Ranks (1 to10) for each ZIP.  I believe, there must be a way to have the Miles Ranks logic build in one or two data steps.  I am attaching revised raw data (have1 and have2) and also included expected output sheet.  Once again thank you so much!

 

1. Match on zip, city and state (have1 and have2)
2. Want1 all records from have1 with ClientNo where there is a match based on the shortest distance (Miles_Rank)
3. Also, Want2 a list of records from have1 where there is no match

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1034 views
  • 0 likes
  • 4 in conversation