Re: Combining data sets.

attjooo · Posted 04-29-2014 04:31 AM

I have three data sets ONE, TWO and THREE.

They have the same variables: ID, VAR1, VAR2, VAR3 and VAR4.

Each ID can have multiple records. Each ID can be member of more than one of the data sets: 1, 2 or all three. If so, the ID:s records are the same in all the data sets where the ID occurs.

I want to have each ID:s all records in one data set: RESULT.

If I first sort ONE, TWO and THREE by ID, will the following code produce what I want, or can it be done even simpler?

DATA RESULT;

SET ONE TWO THREE;

BY ID;
RUN;

Kurt_Bremser · Posted 04-29-2014 04:42 AM

Your solution will give you work.result sorted by id and (implicitly) by occurence in one, two, three.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

LinusH · Posted 04-29-2014 04:44 AM

Or do the sort after the data - set step.

And you wish to keep all records, including duplicates?

Data never sleeps

attjooo · Posted 04-29-2014 06:35 AM

I want duplicates only if the duplicates were there from the start.

Your question indicates that my code example create duplicates.

LinusH · Posted 04-29-2014 07:17 AM

Not creates, preserves rather...

Data never sleeps

Kurt_Bremser · Posted 04-29-2014 07:53 AM

The question is, should dataset result have x records where x is the sum of all records of one,two,three, regardless of the contents, or should records be eliminated if two identical records are found in different datasets?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

attjooo · Posted 04-29-2014 08:57 AM

I give an example:

ID = 1 have a total of 2 reccords in data set ONE.

ID = 1 does not occur in data set TWO..

ID = 1 occurs also in data set THREE with the same 2 records as in data set ONE.

In data set RESULT, ID = 1 should have the same 2 records as in data set ONE and THREE, and NOT those 2 records twice.

Kurt_Bremser · Posted 04-29-2014 09:00 AM

Is it possible to have completely identical records in one of the datasets?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

LinusH · Posted 04-29-2014 09:04 AM

Then you need to add a PROC SORT NODUPRECS after the data - set step.

Data never sleeps

Reeza · Posted 04-29-2014 10:26 AM

What about using an update statement instead?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away