Hi,
I have several datasets with each one containing the data of a variable. The file name and its no. of observations are as follows:
var1(2900 obs), var2(2800 obs), var3(2500 obs).
Note that IDs of var1(i.e. 2900) may or may not cover IDs of var2/var3.
What I want is to merge them together to a whole file and the program is like below:
data whole;
merge var2
var1
var3
;
by ID;
run;
My questions:
Is it the length(no. of rows) of whole determined by the length of the FIRST input dataset or the LONGEST one or the sum of the non-duplicated IDs in these three datasets?
Thanks.
Hi,
Assumption: if ID is unique (with no duplicates) within either of 3 data sets
Answer : 'the total number of non-duplicate ids'.
Haikuo
You didn't provide enough info. Do all three files only contain one variable, each, and do they all share the same variable name?
What are you trying to do? What do you hope to accomlish? At least the initial answers will be needed for anyone to answer your question without making a number of assumptions.
Hi Arthur,
Thanks for your reply and questions.
file var1 inlcudes variables ID and var1 and there are 2900 obs.
file var2 inlcudes variables ID and var2 and there are 2800 obs.
file var3 inlcudes variables ID and var3 and there are 2500 obs.
I want to merage them together to a file with variables ID, var1, var2, and var3.
How is the length of the output dataset determined?
Hi,
Assumption: if ID is unique (with no duplicates) within either of 3 data sets
Answer : 'the total number of non-duplicate ids'.
Haikuo
I would clarify Haikuo's response a bit. No assumption is needed as long as there aren't duplicate ids within any of the three datasets (Note: this statement was clarified based on Linlin's subsequent post)! You will obtain one record for each unique ID across the 3 datasets.
One thing to be concerned about, however, is the definition of "unique". If the ID field has different lengths across the three files, IDs that appear to be the same may not be considered to be unique.
Hi Art,
Can I disagree with your statement "No assumption is needed!" ?
data have1;
input id @@;
cards;
1 2 3 3 4 5
;
data have2;
input id @@;
cards;
1 6 6
;
data have3;
input id @@;
cards;
1 2 3
;
data want;
merge have1 have2 have3;
by id;
run;
title with dupkey;
proc print;run;
proc sort nodupkey;
by id;
title without dupkey;
proc print;run;
@Linlin: Since you are correct of course you can disagree! The possibility of one to many, or many-to-many, could easily make it so that merge within a datastep couldn't even be used without extra coding.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.