I have three data sets ONE, TWO and THREE.
They have the same variables: ID, VAR1, VAR2, VAR3 and VAR4.
Each ID can have multiple records. Each ID can be member of more than one of the data sets: 1, 2 or all three. If so, the ID:s records are the same in all the data sets where the ID occurs.
I want to have each ID:s all records in one data set: RESULT.
If I first sort ONE, TWO and THREE by ID, will the following code produce what I want, or can it be done even simpler?
DATA RESULT;
SET ONE TWO THREE;
BY ID;
RUN;
Your solution will give you work.result sorted by id and (implicitly) by occurence in one, two, three.
Or do the sort after the data - set step.
And you wish to keep all records, including duplicates?
I want duplicates only if the duplicates were there from the start.
Your question indicates that my code example create duplicates.
Not creates, preserves rather...
The question is, should dataset result have x records where x is the sum of all records of one,two,three, regardless of the contents, or should records be eliminated if two identical records are found in different datasets?
I give an example:
ID = 1 have a total of 2 reccords in data set ONE.
ID = 1 does not occur in data set TWO..
ID = 1 occurs also in data set THREE with the same 2 records as in data set ONE.
In data set RESULT, ID = 1 should have the same 2 records as in data set ONE and THREE, and NOT those 2 records twice.
Is it possible to have completely identical records in one of the datasets?
Then you need to add a PROC SORT NODUPRECS after the data - set step.
What about using an update statement instead?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.