Hi Everyone
I have the below very simple proc sort and one merge statement. In the second proc statement, there is a group of records that is not being included in the final merge data step. I used this code elsewhere with different data sets and it worked fine.
I know I did not include the data sets to review, but is there something here syntax-wise that would eliminate certain records?
Paul
proc sort data=s1ageOutALLAGEALL out=s1ageOutSortALLAGEALL;
by cnty_name startyear;
run;
proc sort data=s1AgeCleanALLAGEALL out=s1AgeCleanSortALLAGEALL;
by cnty_name startyear;
run;
data s1AgeCleanOutALLAGEALL;
merge s1AgeCleanSortALLAGEALL s1ageOutSortALLAGEALL;
by cnty_name startyear;
run;
I think if you merge and use the in option you can find the reason why....
data s1AgeCleanOutALLAGEALL;
merge s1AgeCleanSortALLAGEALL (in=clean) s1ageOutSortALLAGEALL (in=all) indsname=source;
by cnty_name startyear;
if clean or all;
source_file=source;
run;
Hi,
Your code looks fine to me. I suspect that the BY variables of that last group got no match. Sometimes they look same while they are NOT. Your could try using hex. format to find the hidden culprit.
Haikuo
I think if you merge and use the in option you can find the reason why....
data s1AgeCleanOutALLAGEALL;
merge s1AgeCleanSortALLAGEALL (in=clean) s1ageOutSortALLAGEALL (in=all) indsname=source;
by cnty_name startyear;
if clean or all;
source_file=source;
run;
I ran the below code with the 'in' statements and the missing records were were still not in the final record set:
data s1AgeCleanOutALLAGEALL;
merge s1AgeCleanSortALLAGEALL (in=clean) s1ageOutSortALLAGEALL (in=all);
by cnty_name startyear;
if clean;
if all;
run;
You need to have if clean or all;
not if clean; if all;
The first one selects records in either dataset, the second only grabs records in both datasets.
It didn't work that way either. What is happening is that one set of records is overwriting the 'missing' ones. I don't understand why though, but I am trying a merge after the merge.
Paul
I think you need to provide more info.
Something to check:
Do you have unique records per county/year in each file or do both have multiple county/years?
For the records that aren't being included you say they're being overwritten...that means they're being included somehow, but not what you want?
Does SQL give you what you want:
proc sql;
create table merged as
select * from
s1ageOutALLAGEALL as a
join s1AgeCleanALLAGEALL as b
on a.cnty_name=b.cnty_name
and a.year=b.year;
quit;
If there was no match, there would be other records missing also. Its just this one group.
Can you post a copy of your log?
One reason why you may not see all records in your MERGE is if cnty_name and startyear BY variables do not uniquely identify the rows in at least one of your input tables. MERGE cannot handle a many-to-many join, only one-to-one or one-to-many.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.