Hi,
Hi,
I am using the below code to match records from two to three. I am setting a flag to zero (matchingrecordfound=0) . For each row in dataset two , I am looping through the dataset three and do the processing only if matchingrecordfound is less than 2 (capturing only two matches in the output dataset). Is there any way I can avoid looping through the three if matchingrecordfound is equal to 2 for a particular record in two?
data two;
input state $ column1;
datalines;
OH 123
TX 234
;
run;
data three;
input state $ column2 column3;
datalines;
OH 111 222
CA 333 444
OH 123 555
OH 123 666
OH 123 777
OH 123 888
;
run;
data temp1 temp2;
matchingrecordfound=0;
set two;
/* if matchingrecordfound=2 for a record in two i need to move to next record */
do i = 1 to nobs;
set three point=i nobs=nobs;
if matchingrecordfound < 2 then
do;
if column1=column2 then
do;
Matchingrecordfound+1;
output temp1;
end;
else
do;
output temp2;
end;
end;
end;
run;
Thanks in advance
Regards,
Sheeba
A few observations. First, if you were to accomplish what you ask for, the number of observations in the data set TEMP2 would change.
Second, why output anything at all to TEMP2? Why not just output the observations that match to TEMP1 and skip creating TEMP2 entirely?
Third, to answer your original question, it's fairly easy:
do i = 1 to nobs until(matchrecordfound=2);
Finally, SAS has tools that support detection of matches/nonmatches using fairly straightforward code. Try it this way.
data two;
input state $ column1;
datalines;
OH 123
TX 234
;
data three;
input state $ column1 column3;
datalines;
OH 111 222
CA 333 444
OH 123 555
OH 123 666
OH 123 777
OH 123 888
;
proc sort data=two;
by state column1;
run;
proc sort data=three;
by state column1;
run;
data two_matches;
merge two (in=in2) three (in=in3);
by state column1;
if in2 and in3;
if first.column1 then matching_record=1;
else matching_record + 1;
if matching_record <= 2;
run;
Note that the variable name changed to COLUMN1 on one of the INPUT statements. As long as you are allowed to sort your data sets, this would be an easy way to match.
A few observations. First, if you were to accomplish what you ask for, the number of observations in the data set TEMP2 would change.
Second, why output anything at all to TEMP2? Why not just output the observations that match to TEMP1 and skip creating TEMP2 entirely?
Third, to answer your original question, it's fairly easy:
do i = 1 to nobs until(matchrecordfound=2);
Finally, SAS has tools that support detection of matches/nonmatches using fairly straightforward code. Try it this way.
data two;
input state $ column1;
datalines;
OH 123
TX 234
;
data three;
input state $ column1 column3;
datalines;
OH 111 222
CA 333 444
OH 123 555
OH 123 666
OH 123 777
OH 123 888
;
proc sort data=two;
by state column1;
run;
proc sort data=three;
by state column1;
run;
data two_matches;
merge two (in=in2) three (in=in3);
by state column1;
if in2 and in3;
if first.column1 then matching_record=1;
else matching_record + 1;
if matching_record <= 2;
run;
Note that the variable name changed to COLUMN1 on one of the INPUT statements. As long as you are allowed to sort your data sets, this would be an easy way to match.
Hi Astounding,
Thanks a lot for for the detailed reply.
Actually I am using this approach for join due to the nature of data. When i try to join the datasets based on column1, it results in many to many merge and when we use merge step for joining, it results in the loss of data.. The data which i put in the example is a sample and does not illustrate the many to many join .
I am creating two outputs - temp1 for matches and temp2 for notmatches. the requirement is to capture both matches and notmatches and finally combine everything into a single dataset. There are cases when a single record from dataset two falls in both temp1 and temp2 but I am taking care of this in the next step by deleting those obersevations from temp2 which are already present in temp1. so by the end of this step i get matches and notmatches.
Thanks again,
Regards,
Sheeba
There's more than one way to interpret the final goal here. It would be helpful to take a data set with duplicates:
OH 123
OH 123
TX 234
Illustrate (using the same THREE data set) what you would like the very final outcome to be. Intermediate data sets are not required, just the very final result.
Hi,
Two
state column1 id
OH 123 1
OH 123 2
TX 234 3
state col2 col3
OH 111 222
CA 333 444
OH 123 555
OH 123 666
OH 123 777
OH 123 888
Final output
OH 123 555 1
OH 123 666 1
OH 123 555 2
OH 123 666 2
TX 234 . 3
The code which i have posted here as a part of question does not catch the last record .
data temp1 temp2;
matchingrecordfound=0;
set two;
do i = 1 to nobs until(matchingrecordfound=2);;
set three point=i nobs=nobs;
if matchingrecordfound < 2 then
do;
/* the below if condition is dynamic . created at the time of join and it may vary depending on data*/
if column1=column2 then
do;
Matchingrecordfound+1;
output temp1;
end;
else
do;
output temp2;
end;
end;
end;
run;
Regards,
Sheeba
OK, just a few changes can get you the final output directly. The code is untested but looks like it should work:
data final_output;
matchingrecordfound=0;
set two;
do i = 1 to nobs until(matchingrecordfound=2);
set three (rename=(state=state2)) point=i nobs=nobs;
if state=state2 and column1=column2 then
do;
Matchingrecordfound+1;
output;
end;
end;
if Matchingrecordfound=0 then do;
column2 = column1;
column3 = .;
output;
run;
keep state column2 column3 id;
run;
Hi,
Thanks for the details.
It is working as expected.
Regards,
Sheeba
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.