combine data sets with different columns

Occasional Contributor
Posts: 7

combine data sets with different columns

I have 10 SAS data set with different columns (Some columns are same, and some are different). All these data sets are without identical IDs. My goal is to combine these 10 SAS data sets to 1 single master data set with identical IDs. By doing this, I will be able to retrieve any records with their original columns from master data set by select the unique IDs.

Like say, these 10 data set are from year 1990 to 2000.(e.g. DSN90 : DSN00), and I already created IDs '9000001','0900002'...for data set DSN90, and "9100001','9100002' for data set DSN91 and so on.

What I want is:

If I select ID='9000001' from master data set, I will get the first record of data set DSN90 with the its columns.

If I select ID='9100001' from master data set, I will get the first record of data set DSN91 with the its columns.

However, the columns from DSN90 is different with DSN91. What my master data set should look like? Please advise. Thanks in advance.

Super User
Posts: 10,466

Re: combine data sets with different columns

If you combine datasets as described for your example ID='9000001' selection result will have the columns associated with all of the other data sets as well but with all missing values for those that weren't in DSN90. If you don't want those extra columns you would have to post process the data to remove any column with all missing data. If you don't mind the blank columns then something like

data want;

     set dsn90 dsn91 <etc>;



Either that or write a very complex extraction program that would require knowing which original data set an id is from.

I have a very hard time with the "advantage" of this structure may be.

I would be very tempted to consider a data set of ID and Source data set and use that for a control to pull records from the original

Super User
Posts: 17,750

Re: combine data sets with different columns

chenyiwen1717 wrote:

Some columns are same, and some are different)

That is your main issue. How do you want the output to look like for the column names that are the same? Where the columns are different?

If you already have ID's, you can use a Data Step to combine your 10 datasets relatively easily as well as to identify the source data set.

See the INDSNAME option highlighted by Rick yesterday:

Contributor ndp
Posts: 61

Re: combine data sets with different columns

this is where macro/macro variables com in handy. You can change following code to suit your needs but it should help you subset data and variables.

data dsn90;

data dsn91;

data all; ** will include both variables var1+var2;
set dsn90 dsn91;

%macro subset(_id=9100001);

/* macro variable for yr */
%let _yr=%substr(&_id,1,2);

/* macro variable for list of variables in original data for that year*/
proc sql noprint;
select distinct(name) into: _varlst separated by " "
from sashelp.vcolumn
where upcase(libname)="WORK" and upcase(memname)="DSN&_YR.";

data subset;
set all;
where id="&_id.";
keep &_varlst.;


options symbolgen mlogic mprint;

Ask a Question
Discussion stats
  • 3 replies
  • 4 in conversation