@mcmaxwell wrote:
Hello, I am new to SAS Studio and would appreciate any help/information!
For my research I have many different CSV files that contain different variables. Below is an example of the code I might use to merge these files:
/** outer full merge the data **/
data new;
update gb20 (in=left) PLTL (in=right);
by SUID;
run;
/** left merge the data **/
data new;
update gb20 (in=left) PLTL (in=right);
by SUID;
if left;
run;
/** right merge the data **/
data new;
update gb20 (in=left) PLTL (in=right);
by SUID;
if right;
run;
/** inner merge the data **/
data new;
update gb20 (in=left) PLTL (in=right);
by SUID;
if left and right;
run;
Where "gb20" and "PLTL" are the names of the data sets, "new" is the name of the newly merged data set, and "SUID" is the key.
Now, let's say there is a third data set called "demo" that I want to merge with "gb20" and "PLTL" as well. Is there a way that I can merge all three data sets in one step? Or, do I need to add them together one by one? (See example below)
/** outer full merge the data again **/
data full;
update demo (in=left) new (in=right);
by SUID;
run;
Thanks in advance!
The workable approach is dependent to a fair extent on the data structure and content.
UPDATE is not the same as MERGE or MODIFY.
Update if you have multiple records with the same BY variables in the updating set (the second one) will apply all the updates to the single record in the master data set. MERGE if you have multiple records with the same by values in the second set will have multiple records.
Example:
data master;
input x y;
datalines;
1 11
2 22
3 33
;
data transaction;
input x y;
datalines;
1 5
1 6
1 7
3 16
3 8
;
data updated;
update master transaction;
by x;
run;
data merged;
merge master transaction;
by x;
run;
Update only allows a master and transaction data set. Also the behavior of missing values in the transaction data set can be modified in Update but not merge. Modify is somewhat similar to Update with additional features.
Also the order of multiple data sets on a merge statement is important. Adding to the above:
data other;
input x y z;
datalines;
1 111 345
2 222 333
2 333 444
2 . 345
;
data merged2;
merge master transaction other;
by x;
run;
data merged3;
merge master other transaction;
by x;
run;
Check the value of Y in the first record of Merged2 and Merged3 as an example.
So whether a merge would accomplish what you want depends on contents of the data sets.