BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
comeon2012
Fluorite | Level 6

Hi,

I have several datasets with each one containing the data of a variable. The file name and its no. of observations are as follows:

var1(2900 obs), var2(2800 obs), var3(2500 obs).

Note that IDs of var1(i.e. 2900) may or may not cover IDs of var2/var3.

What I want is to merge them together to a whole file and the program is like below:

data whole; 

     merge var2

               var1

               var3

     ;

     by ID;

     run;

My questions:

Is it the length(no. of rows) of whole determined by the length of the FIRST input dataset or the LONGEST one or the sum of the non-duplicated IDs in these three datasets?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Hi,

Assumption: if ID is unique (with no duplicates) within either of 3 data sets

Answer : 'the total number of non-duplicate ids'.

Haikuo

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

You didn't provide enough info.  Do all three files only contain one variable, each, and do they all share the same variable name?

What are you trying to do?  What do you hope to accomlish?  At least the initial answers will be needed for anyone to answer your question without making a number of assumptions.

comeon2012
Fluorite | Level 6

Hi Arthur,

Thanks for your reply and questions.

file var1 inlcudes variables ID and var1 and there are 2900 obs.

file var2 inlcudes variables ID and var2 and there are 2800 obs.

file var3 inlcudes variables ID and var3 and there are 2500 obs.

I want to merage them together to a file with variables ID, var1, var2, and var3.

How is the length of the output dataset determined?

Haikuo
Onyx | Level 15

Hi,

Assumption: if ID is unique (with no duplicates) within either of 3 data sets

Answer : 'the total number of non-duplicate ids'.

Haikuo

art297
Opal | Level 21

I would clarify Haikuo's response a bit.  No assumption is needed as long as there aren't duplicate ids within any of the three datasets (Note: this statement was clarified based on Linlin's subsequent post)!  You will obtain one record for each unique ID across the 3 datasets.

One thing to be concerned about, however, is the definition of "unique".  If the ID field has different lengths across the three files, IDs that appear to be the same may not be considered to be unique.

Linlin
Lapis Lazuli | Level 10

Hi Art,

Can I disagree with your statement "No assumption is needed!" ?

data have1;

input id @@;

cards;

1 2 3 3 4 5

;

data have2;

input id @@;

cards;

1 6 6

;

data have3;

input id @@;

cards;

1 2 3

;

data want;

merge have1 have2 have3;

by id;

run;

title with dupkey;

proc print;run;

proc sort nodupkey;

by id;

title without dupkey;

proc print;run;

art297
Opal | Level 21

@Linlin: Since you are correct of course you can disagree!  The possibility of one to many, or many-to-many, could easily make it so that merge within a datastep couldn't even be used without extra coding.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 777 views
  • 1 like
  • 4 in conversation