Help using Base SAS procedures

question about merge

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

question about merge

Hi,

I have several datasets with each one containing the data of a variable. The file name and its no. of observations are as follows:

var1(2900 obs), var2(2800 obs), var3(2500 obs).

Note that IDs of var1(i.e. 2900) may or may not cover IDs of var2/var3.

What I want is to merge them together to a whole file and the program is like below:

data whole; 

     merge var2

               var1

               var3

     ;

     by ID;

     run;

My questions:

Is it the length(no. of rows) of whole determined by the length of the FIRST input dataset or the LONGEST one or the sum of the non-duplicated IDs in these three datasets?

Thanks.


Accepted Solutions
Solution
‎07-26-2012 11:43 PM
Respected Advisor
Posts: 3,156

Re: question about merge

Posted in reply to comeon2012

Hi,

Assumption: if ID is unique (with no duplicates) within either of 3 data sets

Answer : 'the total number of non-duplicate ids'.

Haikuo

View solution in original post


All Replies
PROC Star
Posts: 7,492

Re: question about merge

Posted in reply to comeon2012

You didn't provide enough info.  Do all three files only contain one variable, each, and do they all share the same variable name?

What are you trying to do?  What do you hope to accomlish?  At least the initial answers will be needed for anyone to answer your question without making a number of assumptions.

Contributor
Posts: 38

Re: question about merge

Hi Arthur,

Thanks for your reply and questions.

file var1 inlcudes variables ID and var1 and there are 2900 obs.

file var2 inlcudes variables ID and var2 and there are 2800 obs.

file var3 inlcudes variables ID and var3 and there are 2500 obs.

I want to merage them together to a file with variables ID, var1, var2, and var3.

How is the length of the output dataset determined?

Solution
‎07-26-2012 11:43 PM
Respected Advisor
Posts: 3,156

Re: question about merge

Posted in reply to comeon2012

Hi,

Assumption: if ID is unique (with no duplicates) within either of 3 data sets

Answer : 'the total number of non-duplicate ids'.

Haikuo

PROC Star
Posts: 7,492

Re: question about merge

Posted in reply to comeon2012

I would clarify Haikuo's response a bit.  No assumption is needed as long as there aren't duplicate ids within any of the three datasets (Note: this statement was clarified based on Linlin's subsequent post)!  You will obtain one record for each unique ID across the 3 datasets.

One thing to be concerned about, however, is the definition of "unique".  If the ID field has different lengths across the three files, IDs that appear to be the same may not be considered to be unique.

Super Contributor
Posts: 1,636

Re: question about merge

Hi Art,

Can I disagree with your statement "No assumption is needed!" ?

data have1;

input id @@;

cards;

1 2 3 3 4 5

;

data have2;

input id @@;

cards;

1 6 6

;

data have3;

input id @@;

cards;

1 2 3

;

data want;

merge have1 have2 have3;

by id;

run;

title with dupkey;

proc print;run;

proc sort nodupkey;

by id;

title without dupkey;

proc print;run;

PROC Star
Posts: 7,492

Re: question about merge

@Linlin: Since you are correct of course you can disagree!  The possibility of one to many, or many-to-many, could easily make it so that merge within a datastep couldn't even be used without extra coding.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 236 views
  • 1 like
  • 4 in conversation