BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Wolverine
Pyrite | Level 9

I'm trying to match a file with each person's contract number with another file that has information about the contracts. The problem is that about 3% of records in the person file are not matching with the contract file, even though their contract ID is present in both files (verified by manual review).

 

These are alphanumeric codes, and both variables are character. The person file variable is char9, while the contract file variable is char5. I thought there might be padding blanks that were preventing a match, but I added COMPRESS commands and it didn't make any difference. I ran frequencies of the results produced by the COMPRESS commands and they look right to me. All letters are capitalized in both files. Any other ideas about what might be preventing a match? Examples of the contracts that are not matching include 90091, H9585, and R5329.

 

Unfortunately, the data is confidential and I can't share any of it. I thought about creating example data, but this is only happening in about 3% of cases and I don't know how to make it happen with example data.

PROC SQL;
	Create table want
		as Select a.*, b.*
	From person_file. a JOIN contract_file b
	On COMPRESS(a.person_contract) = COMPRESS(b.contract_id);
QUIT;
1 ACCEPTED SOLUTION

Accepted Solutions
Wolverine
Pyrite | Level 9

Well I think I found the problem... the version of the file I was manually looking at was not the most recent version. The most recent version had been re-created and that re-introduced an earlier error I had already corrected. In my defense, I'm working with someone else's code -- I always create a new dataset name when I add or remove cases, whereas this code re-uses the same name. So it's a lot harder to tell if I'm working with a version of the dataset from before or after a critical step in the code.

View solution in original post

7 REPLIES 7
Wolverine
Pyrite | Level 9

Oops, made a typo... here is the corrected code (which still doesn't work!)

PROC SQL;
	Create table want
		as Select a.*, b.*
	From person_file a JOIN contract_file b
	On COMPRESS(a.person_contract) = COMPRESS(b.contract_id);
QUIT;
Kurt_Bremser
Super User

Use the "kad" modifier in the COMPRESS function, so it keeps letters and digits; this will filter out stuff like carriage return and linefeed characters which are not visible.

Wolverine
Pyrite | Level 9

I'm still getting the same results. Here is my updated code:

PROC SQL;
	Create table want
		as Select a.*, b.*
	From person_file a JOIN contract_file b
	On COMPRESS(a.person_contract,,'kad') = COMPRESS(b.contract_id,,'kad');
QUIT;

 

Quentin
Super User

One simple check is to see if a WHERE statement will find records in both datasets.


If you run:

proc print data=person_file;
  where person_contract='90091';
run;

proc print data=contract_file;
  where contract_id='90091';
run;

Do both PROC PRINT steps return rows?  Hopefully one of them does not, meaning the value '90091' is not in one of the datasets. 

Wolverine
Pyrite | Level 9

Well I think I found the problem... the version of the file I was manually looking at was not the most recent version. The most recent version had been re-created and that re-introduced an earlier error I had already corrected. In my defense, I'm working with someone else's code -- I always create a new dataset name when I add or remove cases, whereas this code re-uses the same name. So it's a lot harder to tell if I'm working with a version of the dataset from before or after a critical step in the code.

Quentin
Super User

Make sense.  Overwriting datasets:

data foo;
  set foo;
run;

can cause all kinds of confusion.  Glad you got it sorted.

mkeintz
PROC Star

@Wolverine 

 

Glad you identified the problem.  Mark your own explanation as the solution, so that this topic no longer appears as unsolved.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 389 views
  • 2 likes
  • 4 in conversation