BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sheeba
Lapis Lazuli | Level 10

Hi,

 

Hi,

 

I am using the below code to match records from two to three. I am setting a flag to zero (matchingrecordfound=0) . For each row in dataset two , I am looping through the dataset three and do the processing only if matchingrecordfound is less than 2 (capturing only two matches in the output dataset).  Is there any way I can avoid looping through the  three if matchingrecordfound is equal to 2 for a particular record in two?

 

data two;

input state $ column1;

datalines;

OH 123

TX 234

;

run;

data three;

input state $ column2 column3;

datalines;

OH 111 222

CA 333 444

OH 123 555

OH 123 666

OH 123 777

OH 123 888

;

run;

data temp1 temp2;

matchingrecordfound=0;

set two;

 /* if matchingrecordfound=2 for a record in two i need to move to next record */

do i = 1 to nobs;

set three point=i nobs=nobs;

if matchingrecordfound < 2 then

do;

if column1=column2 then

do;

Matchingrecordfound+1;

output temp1;

end;

else

do;

output temp2;

end;

end;

end;

run;

 

Thanks in advance

 

Regards,

Sheeba

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

A few observations.  First, if  you were to accomplish what you ask for, the number of observations in the data set TEMP2 would change.

 

Second, why output anything at all to TEMP2?  Why not just output the observations that match to TEMP1 and skip creating TEMP2 entirely?

 

Third, to answer  your original question, it's fairly easy:

 

do i = 1 to nobs until(matchrecordfound=2);

 

Finally, SAS has tools that support detection of matches/nonmatches using fairly straightforward code.  Try it this way.

 

data two;

input state $ column1;

datalines;

OH 123

TX 234

;

 

data three;

input state $ column1 column3;

datalines;

OH 111 222

CA 333 444

OH 123 555

OH 123 666

OH 123 777

OH 123 888

;

 

proc sort data=two;

by state column1;

run;

 

proc sort data=three;

by state column1;

run;

 

data two_matches;

merge two (in=in2) three (in=in3);

by state column1;

if in2 and in3;

if first.column1 then matching_record=1;

else matching_record + 1;

if matching_record <= 2;

run;

 

Note that the variable name changed to COLUMN1 on one of the INPUT statements.  As long as you are allowed to sort your data sets, this would be an easy way to match.

View solution in original post

6 REPLIES 6
Astounding
PROC Star

A few observations.  First, if  you were to accomplish what you ask for, the number of observations in the data set TEMP2 would change.

 

Second, why output anything at all to TEMP2?  Why not just output the observations that match to TEMP1 and skip creating TEMP2 entirely?

 

Third, to answer  your original question, it's fairly easy:

 

do i = 1 to nobs until(matchrecordfound=2);

 

Finally, SAS has tools that support detection of matches/nonmatches using fairly straightforward code.  Try it this way.

 

data two;

input state $ column1;

datalines;

OH 123

TX 234

;

 

data three;

input state $ column1 column3;

datalines;

OH 111 222

CA 333 444

OH 123 555

OH 123 666

OH 123 777

OH 123 888

;

 

proc sort data=two;

by state column1;

run;

 

proc sort data=three;

by state column1;

run;

 

data two_matches;

merge two (in=in2) three (in=in3);

by state column1;

if in2 and in3;

if first.column1 then matching_record=1;

else matching_record + 1;

if matching_record <= 2;

run;

 

Note that the variable name changed to COLUMN1 on one of the INPUT statements.  As long as you are allowed to sort your data sets, this would be an easy way to match.

Sheeba
Lapis Lazuli | Level 10

Hi Astounding,

 

Thanks a lot for  for the detailed reply.

 

Actually I am using this approach for join due to the nature of data. When i try to join the  datasets based on column1, it results in many to many merge and when we use merge step for joining, it results in the loss of data.. The data which i put in the example is a sample and does not illustrate the many to many join .

 

I am creating two outputs - temp1 for matches and temp2 for notmatches. the requirement is to capture both matches and notmatches and finally combine everything into a single dataset. There are cases when a single record from dataset two falls in both temp1 and temp2 but I am taking care of this in the next step by deleting those obersevations from temp2 which are already present in temp1. so by the end of this step i get matches and notmatches.

 

Thanks again,

Regards,

Sheeba

Astounding
PROC Star

There's more than one way to interpret the final goal here.  It would be helpful to take a data set with duplicates:

 

OH 123

OH 123

TX 234

 

Illustrate (using the same THREE data set) what you would like the very final outcome to be.  Intermediate data sets are not required, just the very final result.

Sheeba
Lapis Lazuli | Level 10

Hi,

 

Two

state column1 id

OH   123        1

OH   123        2

TX    234       3

 

state col2 col3
OH   111 222

CA   333 444

OH   123 555

OH   123 666

OH   123 777

OH   123 888

 

Final output

OH 123 555 1
OH 123 666 1
OH 123 555 2
OH 123 666 2
TX  234 .     3

 

The code which i have posted here as a part of question does not catch the last record .

 

data temp1 temp2;

matchingrecordfound=0;

set two;

 

do i = 1 to nobs until(matchingrecordfound=2);;

set three point=i nobs=nobs;

if matchingrecordfound < 2 then

do;

/* the below if condition is dynamic . created at the time of join and it may vary depending on data*/

if column1=column2 then

do;

Matchingrecordfound+1;

output temp1;

end;

else

do;

output temp2;

end;

end;

end;

run;

 

Regards,

Sheeba

Astounding
PROC Star

OK, just a few changes can get you the final output directly.  The code is untested but looks like it should work:

 

data final_output;

matchingrecordfound=0;

set two;

 

do i = 1 to nobs until(matchingrecordfound=2);

set three (rename=(state=state2)) point=i nobs=nobs;

if state=state2 and column1=column2 then

do;

   Matchingrecordfound+1;

   output;

end;

end;

if Matchingrecordfound=0 then do;

   column2 = column1;

   column3 = .;

   output;

run;

keep state column2 column3 id;

run;

Sheeba
Lapis Lazuli | Level 10

Hi,

 

Thanks for the details.

 

It is working as expected.

 

Regards,

Sheeba

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 8131 views
  • 2 likes
  • 2 in conversation