- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We would like to merge datasets with common variables. The output needs to have a variable that describes where each row is coming from. DS1 and DS2 have the same variables, while DS3 and DS4 share the same variables. The only variable in common is ID. There are the datasets:
data ds1;
input ID months time_point;
datalines;
1 2 3
1 3 3
1 3.5 3
1 1.5 3
1 6 6
1 7.5 6
2 1 .
2 3.4 3
2 7.5 9
2 9 9
;
run;
data ds2;
input ID months time_point;
datalines;
1 3 3
1 6 6
2 1 .
2 3.4 3
2 9 9
;
run;
data ds3;
input ID var3;
datalines;
1 1
2 2
;
run;
data ds4;
input ID var3;
datalines;
1 0
2 3
;
run;
This is the output we are looking for:
ID | DS | months | time_point | var3 |
1 | DS1 | 2 | 3 | . |
1 | DS1 | 3 | 3 | . |
1 | DS1 | 3.5 | 3 | . |
1 | DS1 | 1.5 | 3 | . |
1 | DS1 | 6 | 6 | . |
1 | DS1 | 7.5 | 6 | . |
1 | DS2 | 3 | 3 | . |
1 | DS2 | 6 | 6 | . |
1 | DS3 | . | . | 1 |
1 | DS4 | . | . | 0 |
2 | DS1 | 1 | . | . |
2 | DS1 | 3.4 | 3 | . |
2 | DS1 | 7.5 | 9 | . |
2 | DS1 | 9 | 9 | . |
2 | DS2 | 1 | . | . |
2 | DS2 | 3.4 | 3 | . |
2 | DS2 | 9 | 9 | . |
2 | DS3 | . | . | 2 |
2 | DS4 | . | . | 3 |
Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Some SAS terminology you need to keep in mind: since SAS has a specific MERGE statement with data set that means match row by row your data is NOT a merge. You are stacking things vertically which is an append or accomplished with SET statement.
data want; set ds1 ds2 ds3 ds4 indsname=dsn; length ds $ 10.; ds = scan(dsn,2); run; proc sort data=want; by id ds; run;
The set statement adds the data set observations in the order of appearance on the statement. The INDSNAME= creates a temporary variable with the name of the data set contributing the current observation. These would look like WORK.DS1 WORK.DS2 etc by default as no provided library name means use WORK as the library. So use SCAN to extract just the set name into a variable kept in the data..
The sort places the records in the order of the ID and DS variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Some SAS terminology you need to keep in mind: since SAS has a specific MERGE statement with data set that means match row by row your data is NOT a merge. You are stacking things vertically which is an append or accomplished with SET statement.
data want; set ds1 ds2 ds3 ds4 indsname=dsn; length ds $ 10.; ds = scan(dsn,2); run; proc sort data=want; by id ds; run;
The set statement adds the data set observations in the order of appearance on the statement. The INDSNAME= creates a temporary variable with the name of the data set contributing the current observation. These would look like WORK.DS1 WORK.DS2 etc by default as no provided library name means use WORK as the library. So use SCAN to extract just the set name into a variable kept in the data..
The sort places the records in the order of the ID and DS variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could do this using a PROC SQL statement; add the additional (missing) variables you need for each table (making a different SELECT statement for each table) and then you could use UNION to combine the results