BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
geronimo44
Fluorite | Level 6

Hi, 

 

I want to merge 3 datasets dataA, dataB and dataC, keeping records only found in dataA or dataB and exclude anything found in  dataC. What approch would be more effective? SET or MERGE? I'm not able to run code at the moment so as to test the effeciency. Thus, I am trying to figure out through support and Help the merging funcions. I could do it with two datasets but still struggle with keeping only records found in A and B. I'd appreciate your ideas

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Close, but no cigar Woman Tongue

 

Double check your output, especially if you have duplicate X's in any of your datasets.

 

Data Merged_data
Merge dataA(IN=fromdataA) 
           dataB(IN=fromdataB) 
           dataC(in=fromdataC)
BY x;

*Create indicator variables to show in which dataset the observation is present;
SourceA=fromdataA;
SourceB=fromdataB;
SourceC=fromdataC;

*keep only from A and B;
if fromdataA AND fromdataB;

run; 

View solution in original post

5 REPLIES 5
Reeza
Super User
Are you merging (side by side) or appending(adding more rows)?
geronimo44
Fluorite | Level 6

It's an exercise and the description doesn't give more details on that. In my understanding it is asked to merge the datasets side by side

Reeza
Super User

Well, set appends data sets so you're looking for a MERGE then. Look at the dataset IN option,example 4.
http://www.ats.ucla.edu/stat/sas/modules/merge.htm

geronimo44
Fluorite | Level 6

So, something like the following would be correct? inputs and numbers are totally random

 

Data dataA

input x, y, z;

cards;

5, 10, 15

55, 24, 29

run; 

 

proc sort data=dataA;

by x;

run; 

 

Data dataB

input x, y, z;

cards;

105, 110, 150

run; 

 

proc sort data=dataB;

by x;

run; 

 

Data dataB

input x, y, z;

cards;

665, 1110, 1150

780, 455, 447

159, 668, 31

;

run; 

 

proc sort data=dataC

by x;

run; 

 

Data Merged_data

Merge dataA(IN fromdataA) dataB(IN fromdataB) dataC 

BY x;

fromdataA=fromdataB; 

run; 

 

Reeza
Super User

Close, but no cigar Woman Tongue

 

Double check your output, especially if you have duplicate X's in any of your datasets.

 

Data Merged_data
Merge dataA(IN=fromdataA) 
           dataB(IN=fromdataB) 
           dataC(in=fromdataC)
BY x;

*Create indicator variables to show in which dataset the observation is present;
SourceA=fromdataA;
SourceB=fromdataB;
SourceC=fromdataC;

*keep only from A and B;
if fromdataA AND fromdataB;

run; 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3791 views
  • 1 like
  • 2 in conversation