BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
geronimo44
Fluorite | Level 6

Hi, 

 

I want to merge 3 datasets dataA, dataB and dataC, keeping records only found in dataA or dataB and exclude anything found in  dataC. What approch would be more effective? SET or MERGE? I'm not able to run code at the moment so as to test the effeciency. Thus, I am trying to figure out through support and Help the merging funcions. I could do it with two datasets but still struggle with keeping only records found in A and B. I'd appreciate your ideas

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Close, but no cigar Woman Tongue

 

Double check your output, especially if you have duplicate X's in any of your datasets.

 

Data Merged_data
Merge dataA(IN=fromdataA) 
           dataB(IN=fromdataB) 
           dataC(in=fromdataC)
BY x;

*Create indicator variables to show in which dataset the observation is present;
SourceA=fromdataA;
SourceB=fromdataB;
SourceC=fromdataC;

*keep only from A and B;
if fromdataA AND fromdataB;

run; 

View solution in original post

5 REPLIES 5
Reeza
Super User
Are you merging (side by side) or appending(adding more rows)?
geronimo44
Fluorite | Level 6

It's an exercise and the description doesn't give more details on that. In my understanding it is asked to merge the datasets side by side

Reeza
Super User

Well, set appends data sets so you're looking for a MERGE then. Look at the dataset IN option,example 4.
http://www.ats.ucla.edu/stat/sas/modules/merge.htm

geronimo44
Fluorite | Level 6

So, something like the following would be correct? inputs and numbers are totally random

 

Data dataA

input x, y, z;

cards;

5, 10, 15

55, 24, 29

run; 

 

proc sort data=dataA;

by x;

run; 

 

Data dataB

input x, y, z;

cards;

105, 110, 150

run; 

 

proc sort data=dataB;

by x;

run; 

 

Data dataB

input x, y, z;

cards;

665, 1110, 1150

780, 455, 447

159, 668, 31

;

run; 

 

proc sort data=dataC

by x;

run; 

 

Data Merged_data

Merge dataA(IN fromdataA) dataB(IN fromdataB) dataC 

BY x;

fromdataA=fromdataB; 

run; 

 

Reeza
Super User

Close, but no cigar Woman Tongue

 

Double check your output, especially if you have duplicate X's in any of your datasets.

 

Data Merged_data
Merge dataA(IN=fromdataA) 
           dataB(IN=fromdataB) 
           dataC(in=fromdataC)
BY x;

*Create indicator variables to show in which dataset the observation is present;
SourceA=fromdataA;
SourceB=fromdataB;
SourceC=fromdataC;

*keep only from A and B;
if fromdataA AND fromdataB;

run; 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3269 views
  • 1 like
  • 2 in conversation