BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

Using SAS 9.4

 

I have 14 data sets that I need to "merge". I put merge in quotations because it is not a true merge where I would combining information I more just need to put all of the information from the data sets into one large data set so that I can run a frequency. The hard part is some of the variables are in one data set and not the other:

 

data set 1- 5213 observations and 953 variables

2 -90 observations and 834 variables

3- 812 observations and 973 variables

4- 30 observations and 935 variables

5- 275 observations and 1163 variables

6- 6612 observations and 1058 variables

7- 78 observations and 76 variables

8- 1111 observations and 76 variables

9- 70 observations and 1031 variables

10- 452 observations and 871 variables

11- 360 observations and 2277 variables

12- 2060 observations and 789 variables

13- 2222 observations and 1880 variables

14- 60 observations and 1237 variables

 

So my question is with the different number of variables across all of the data sets is their a good way to combine the data without losing any of the variables, understanding (and wanting) that this will create blank observations in the new data set. Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If these are already SAS datasets, then SAS knows a good deal of the metadata - in particular all the variable names in each data set.

 

Let's say you have two data sets:

  1. data set A, with 3 obs and  variables X Y and Z.
  2. data set B, with 5 obs and variables A B and X

If you want to produce a dataset C with 8 obs and variables A B X Y and Z, then the SET statement works fine, as in:

data C;
  set A B;
run;

The 3 obs from A will precede the 5 obs from B (A is listed before B in the SET statement).  And of course the 3 obs from A will now have 2 additional variables A and B, but SAS will assign missing values to them.  And correspondingly the 5 obs from B will have 2 additional variables Y and Z with missing values.

 

The SET statement can accommodate more than 2 dataset names, so you presumably can list the names of all your input datasets.

 

The only concern.  If any common variable (X here) is a different variable type (i.e. character vs numeric) sas will detect it and will not make data set C.  So you should be sure your datasets do not have a type-conflict in any common variables.

 

regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mauri0623
Quartz | Level 8

It seems you want to stack them. That means concatenating them using set instead of merge.

mkeintz
PROC Star

If these are already SAS datasets, then SAS knows a good deal of the metadata - in particular all the variable names in each data set.

 

Let's say you have two data sets:

  1. data set A, with 3 obs and  variables X Y and Z.
  2. data set B, with 5 obs and variables A B and X

If you want to produce a dataset C with 8 obs and variables A B X Y and Z, then the SET statement works fine, as in:

data C;
  set A B;
run;

The 3 obs from A will precede the 5 obs from B (A is listed before B in the SET statement).  And of course the 3 obs from A will now have 2 additional variables A and B, but SAS will assign missing values to them.  And correspondingly the 5 obs from B will have 2 additional variables Y and Z with missing values.

 

The SET statement can accommodate more than 2 dataset names, so you presumably can list the names of all your input datasets.

 

The only concern.  If any common variable (X here) is a different variable type (i.e. character vs numeric) sas will detect it and will not make data set C.  So you should be sure your datasets do not have a type-conflict in any common variables.

 

regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

I have run into the concern you have mentioned where the same intended variable is a character in 1 data set and numeric in the other. What would be the best method to convert the numeric variable into a character? The final variable should be character but in one data set it is entered as a binary 1/0 for yes/no so what would be the best way to convert to character without losing the numeric data? Thank you

FreelanceReinh
Jade | Level 19

Just to add another concern: If any common variable has the same type, but different lengths in two datasets and the dataset with the smaller length happens to precede the dataset with the greater length in the SET statement, SAS will detect it and issue a warning message:

WARNING: Multiple lengths were specified for the variable X by input data set(s). This can cause truncation of data.

In this case you may want to reorder the dataset names in the SET statement or insert an appropriate LENGTH statement before the SET statement (which could be created dynamically) or switch to PROC SQL (using OUTER UNION CORR), cf. this old thread.

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 963 views
  • 2 likes
  • 4 in conversation