BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NealTinWA
Calcite | Level 5

I am trying to match-merge a small file consisting of lastname and firstname (plus other stuff) with a large reference table. The match is to be done on lastname and firstname (in that order).  Basically, I need to add some information from the reference to the records in the small file.

I've done everything I can think of to assure that the match variables are formatted identically -- same informat, same format, trimmed out leading and trailing blanks -- but the match-merge doesn't work. Yes, the two files are both sorted on the two match variables. Yes, I did include a BY statement, using the same two variables in the same order. It looks like SAS is concatenating the files instead of merging them -- I get all the records from one file with all variables from the other as missing values, followed by the records from the other table (without the non-matched variables from the first one). If I use the (IN=) option for the small file, I simply get the same file with the addition of blanks in the variables brought in from the other file. The reverse happens if I use (IN=) on the large file.

To summarize, the small file was created this way:

DATA WORK.small;

    INFILE '[file path].csv  DELIMITER= ','  MISSOVER  DSD  LRECL=32767  FIRSTOBS=2;

        INFORMAT ID best32. ;

        INFORMAT Lastname $30. ;

        INFORMAT Firstname $30. ;

        FORMAT ID best12. ;

        FORMAT Lastname $30. ;

        FORMAT Firstname $30. ;

    INPUT  ID  Lastname $  Firstname $ ;

run;

And here is how the reference file was read in:

DATA WORK.large;

     INFILE '[file path].csv'  DELIMITER= ','  MISSOVER  DSD  LRECL=32767  FIRSTOBS=2;

        INFORMAT add_this_info  $11. ;

        INFORMAT Lastname $30. ;

        INFORMAT Firstname $30. ;

        INFORMAT Name $40. ;

        FORMAT add_this_info  $11. ;

        FORMAT Lastname $30. ;

        FORMAT Firstname $30. ;

        FORMAT Name $40. ;

        FORMAT add_this_info  $11. ;

    INPUT  add_this_info $  Lastname $  Firstname $  Name $ ;

run;

Both files were sorted on lastname and firstname (in that order), and then I wrote the match-merge:

DATA match_merge;

     MERGE large small(IN=keep);

     BY lastname firstname;

     IF keep=1;

run;

For testing, the IF statement may or may not appear.

I've even concatenated the lastname and firstname into a single field (that's not the "Name" field in the large file), using the COMPRESS and TRIM functions to assure that the fields are exactly alike. Same thing happens.

It is absolutely certain that some records in the small file have corresponding records in the large file -- I can eyeball that. But SAS doesn't see what I do.

Why??

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

When you have problems matching variables, convert their values into new variables with a $HEX. format and look at the hex values. You will see previously undetectable characters/circumstances that cause the comparison to fail.

View solution in original post

7 REPLIES 7
ballardw
Super User

did you try

MERGE  small(IN=keep) large;

instead of

MERGE large small(IN=keep);

order of appearance in the merge statement is important

Reeza
Super User

Try a SQL merge to test if it's something wrong with your data or your data step.

proc sql;

create table want as

select a.*, b.ID

from a

left join b

on a.first_name=b.first_name

and a.last_name=b.last_name

order by b.last_name;

quit;

Astounding
PROC Star

Have you tried getting rid of the INFORMAT statements?

The FORMAT statements should be sufficient to establish the lengths of the variables.

Kurt_Bremser
Super User

When you have problems matching variables, convert their values into new variables with a $HEX. format and look at the hex values. You will see previously undetectable characters/circumstances that cause the comparison to fail.

TomKari
Onyx | Level 15

Very nice! That's definitely going in my toolbox.

Tom

NealTinWA
Calcite | Level 5

Yes! That was the problem. There was a hidden hex character in the small file's lastname field.

Interestingly (or not) I couldn't see it if I read the flat file as a CSV, but it was visible if read as a TXT file.

So in the end, I did make a mistake ... but a far different one than I thought I might be making.

ballardw
Super User

Let me guess: you looked at the CSV with a spreadsheet program. They do tend to obscure some things.

You can look at CSV with NOTEPAD or similar plain text editors (not Wordpad which supports additional appearance stuff) as well.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 4401 views
  • 8 likes
  • 6 in conversation