Using SAS 9.4
I have 14 data sets that I need to "merge". I put merge in quotations because it is not a true merge where I would combining information I more just need to put all of the information from the data sets into one large data set so that I can run a frequency. The hard part is some of the variables are in one data set and not the other:
data set 1- 5213 observations and 953 variables
2 -90 observations and 834 variables
3- 812 observations and 973 variables
4- 30 observations and 935 variables
5- 275 observations and 1163 variables
6- 6612 observations and 1058 variables
7- 78 observations and 76 variables
8- 1111 observations and 76 variables
9- 70 observations and 1031 variables
10- 452 observations and 871 variables
11- 360 observations and 2277 variables
12- 2060 observations and 789 variables
13- 2222 observations and 1880 variables
14- 60 observations and 1237 variables
So my question is with the different number of variables across all of the data sets is their a good way to combine the data without losing any of the variables, understanding (and wanting) that this will create blank observations in the new data set. Thank you
If these are already SAS datasets, then SAS knows a good deal of the metadata - in particular all the variable names in each data set.
Let's say you have two data sets:
If you want to produce a dataset C with 8 obs and variables A B X Y and Z, then the SET statement works fine, as in:
data C;
set A B;
run;
The 3 obs from A will precede the 5 obs from B (A is listed before B in the SET statement). And of course the 3 obs from A will now have 2 additional variables A and B, but SAS will assign missing values to them. And correspondingly the 5 obs from B will have 2 additional variables Y and Z with missing values.
The SET statement can accommodate more than 2 dataset names, so you presumably can list the names of all your input datasets.
The only concern. If any common variable (X here) is a different variable type (i.e. character vs numeric) sas will detect it and will not make data set C. So you should be sure your datasets do not have a type-conflict in any common variables.
regards,
Mark
It seems you want to stack them. That means concatenating them using set instead of merge.
If these are already SAS datasets, then SAS knows a good deal of the metadata - in particular all the variable names in each data set.
Let's say you have two data sets:
If you want to produce a dataset C with 8 obs and variables A B X Y and Z, then the SET statement works fine, as in:
data C;
set A B;
run;
The 3 obs from A will precede the 5 obs from B (A is listed before B in the SET statement). And of course the 3 obs from A will now have 2 additional variables A and B, but SAS will assign missing values to them. And correspondingly the 5 obs from B will have 2 additional variables Y and Z with missing values.
The SET statement can accommodate more than 2 dataset names, so you presumably can list the names of all your input datasets.
The only concern. If any common variable (X here) is a different variable type (i.e. character vs numeric) sas will detect it and will not make data set C. So you should be sure your datasets do not have a type-conflict in any common variables.
regards,
Mark
I have run into the concern you have mentioned where the same intended variable is a character in 1 data set and numeric in the other. What would be the best method to convert the numeric variable into a character? The final variable should be character but in one data set it is entered as a binary 1/0 for yes/no so what would be the best way to convert to character without losing the numeric data? Thank you
Just to add another concern: If any common variable has the same type, but different lengths in two datasets and the dataset with the smaller length happens to precede the dataset with the greater length in the SET statement, SAS will detect it and issue a warning message:
WARNING: Multiple lengths were specified for the variable X by input data set(s). This can cause truncation of data.
In this case you may want to reorder the dataset names in the SET statement or insert an appropriate LENGTH statement before the SET statement (which could be created dynamically) or switch to PROC SQL (using OUTER UNION CORR), cf. this old thread.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.