I have two datasets one has 65016 observation and the other has 289 when using the code below I am not getting the right number for match and un-match. Is my code correct?
data sas_perm.matched_&mon._&wk
sas_perm.did_not_match_&mon._&wk;
merge agen (in=a)
dai (in=d);
by ipac;
if a and d then output sas_perm.matched_&mon._&wk;
else output sas_perm.did_not_match_&mon._&wk;
run;
NOTE: There were 65016 observations read from the data set WORK.AGEN.
NOTE: There were 289 observations read from the data set WORK.DAI.
NOTE: The data set SAS_PERM.MATCHED_AGUST_WK1 has 32412 observations and 80 variables.
NOTE: The data set SAS_PERM.DID_NOT_MATCH_AGUST_WK1 has 32688 observations and 80 variables.
NOTE: DATA statement used (Total process time):
real time 0.11 seconds
cpu time 0.09 seconds
Since the results of this merge are extremely dependent on the values of the IPAC variable in both data sets we cannot tell with the data and what your expectations actually are.
What makes you think that you are "not getting the right number for match and un-match.".
As in example data values, not some "the numbers aren't right" statement.
Provide an example of IPAC values that appear in the matched data that shouldn't and values of IPAC that should not appear in the matched.
Some common issues generally related to the BY variable not being exactly the same value in both sets.
If IPAC is numeric then there may be a decimal portion that you don't see that has values that might appear identical that are not. The fix if this is the case is to round the variables in both sets before merging to a standard number of decimal places.
For character variables the BY is going to be case sensitive, "ABC" does not match "Abc". The fix here is make sure the case is the same in both data sets, use any of UPCASE, LOWCASE or PROPCASE functions on the variable in both sets.
Sometimes you may have a leading space(or multiples) and " ABC" does not match "ABC". Trailing spaces won't have any impact on the comparison but leading spaces does. The fix may be as simple as using the LEFT function on both datasets. If the "space" is actually a different non-printable character this may get more complicated.
If the format for the IPAC variable is shorter than the length you may see "ABC" (format $3.) when the value is actually "ABCDEF" (length 6 or more). This would likely require either changing the values of the variable or possibly creating a separate variable in each set to have identical values to merge on.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.