Issues when merging two data sets

Reply
Contributor
Posts: 41

Issues when merging two data sets

Hello,

I have two datasets one called. A an the other called B, which I need to merge using DATA/MERGE statement.

[UPDATE: problem was with the original data file which I imported, which seemed to be broken.]

Sorry for the inconvenience.

PROC Star
Posts: 7,474

Re: Issues when merging two data sets

You are probably confusing many of us by introducing some possibly irrelevant facts.  From your code you are matching by ID, but you explain that one set of IDs are related to record order, while the other set isn't related to record order.

The real question is whether an ID from file A is related to record(s) from file B that share the same ID value.

From your explanation, there can't be duplicates in file A, but may or may not be in File B.  Why are you surprised that you ended up with some duplicates?

For you to accomplish the task, both files have to be sorted in ID order.  Were they?

Contributor
Posts: 41

Re: Issues when merging two data sets

Hello,

Sorry for the irrelevant information.

The fact is that YES, I did sorted them BEFORE the merge.

What I meant to say is that in dataset A, the ordering of the ID column seems to be sequential and in ascending order (from 1 to 534 without missing any values).

In dataset B on the other hand, I have 533 after import and they were not in sequential order. I sorted them out with a PROC SORT, though.

PROC Star
Posts: 7,474

Re: Issues when merging two data sets

Then, you could have either duplicate IDs and/or IDs from one file that don't match any record(s) in the other file.

What do you want in the resulting file?  Do you only want records from A that have a match in B, or do you want at least one record for every id that exists in either file?

And, in either case, are there other variables (besides ID), in the two files, that also exist in the other file?  If there are, do you want to keep the value from A or from B?

Contributor
Posts: 41

Re: Issues when merging two data sets

Hi,

I think there is something wrong with the original data source (file) that I imported for dataset B. A new file was sent to me and that one include the same amount of rows as the dataset A.

Sorry about the inconvenience.

Regards,

P

PROC Star
Posts: 7,474

Re: Issues when merging two data sets

You didn't cause any inconvenience, just not enough info on the one hand, and probably too much info on the other hand, for anyone on the forum to give you advice you might need.  Glad to hear that it was a data problem, but you really should consider the part about whether the files share any variables besides ID and which records you want in the event that there are some records that don't match.

Ask a Question
Discussion stats
  • 5 replies
  • 273 views
  • 0 likes
  • 2 in conversation