BookmarkSubscribeRSS Feed
kfbaker0206
Fluorite | Level 6
proc sort data =MergeA 
by ID;
run;

proc sort data =MergeB
by ID;
run;

Data MergeC
set MergeA MergeB;
by ID; if first.id=0 or last.id=0 then Code='H'; run; proc sort data=MergeC nodupkey; by id; run;

Hello,

 

I have two datasets: Dataset A and Dataset B. Both datasets contains the variables ID and Code. However, Dataset A's code contains different characters while Dataset B code contains the character 'H'. I want to merge the two datasets together to see what duplicate ID's have different characters and change all of the codes to the character 'H'. Once I changed all of the Codes to 'H', I want to remove the duplicates. When I removed the duplicates, they still remained. Please see code and what I am trying to do below.

 

DataSet ADataSet B   
IDCodeIDCode   
12345I12345H   
67890J67890H   
11223 11223H   
45678K45678H   
       

I want to Merge A & B together

     
IDCode     
12345I
 
   
 
 
12345H 
67890J 
67890H 
11223      
11223H     
45678K     
45678H     
       
Then I want to change all of the codes to H     
IDCode     
12345H
 
   
 
12345H
67890H
67890H
11223H
11223H     
45678H     
45678H     
       
Then I want to remove the duplicate IDs     
IDCode
 
   
 
 
12345H 
67890H 
11223H 
45678H 
   
6 REPLIES 6
novinosrin
Tourmaline | Level 20
data a;
input ID	;
cards; 
12345
67890
11223
45678
;
 
data b;
input	Code $	ID	Code2 $	;
cards; 
I	12345	H
J	67890	H
.	11223	H
K	45678	H
;
proc sort data=a;
by id;
run;

proc sort data=b;
by id;
run;

/*you want to Merge A & B together*/
data want;
merge a b;
by id ;
output;
code=code2;
output;
drop code2;
run;
/*Then you want to change all of the codes to H*/
data want1;
set want;
code='H';
run;
/*Then you want to remove the duplicate IDs*/
proc sort data=want1 out=want2 nodupkey;
by id ;
run;

kfbaker0206
Fluorite | Level 6

I should have stated that I am new to sas. I have 4K ID's that are like this. What are the cards and should I do it for all 4K?

novinosrin
Tourmaline | Level 20

The solution starts only from sort procedure. The first 2 are just samples that you gave us. Use the solution and replicate on your real data

Tom
Super User Tom
Super User

if you are changing all of the CODE values to 'H' why do you care what value they had before?

It also does not sound like you want to "merge" the datasets. Instead it sounds more like you want to "set" them together.

If they are already sorted by ID then you can combine them and find the set of unique ID values in one step.

data want ;
  set A B ;
  by id ;
  if first.id ;
  code = 'H';
run;

 

kfbaker0206
Fluorite | Level 6

I care because in Dataset A there are 6K id numbers 4k of which match Dataset B. The original data sets (A and B) cannot be modified. They are received from two different areas. I want the 2K from dataset A to keep their original CODE. The 4K (that have the same IDs) from Datasets A & B needs to be changed to 'H'.

Tom
Super User Tom
Super User

@kfbaker0206 wrote:

I care because in Dataset A there are 6K id numbers 4k of which match Dataset B. The original data sets (A and B) cannot be modified. They are received from two different areas. I want the 2K from dataset A to keep their original CODE. The 4K (that have the same IDs) from Datasets A & B needs to be changed to 'H'.


Use the IN= dataset option.

There are three possible sets of records.  For those INA and NOT INB you want the CODE from A.  For those in INA and INB you want CODE set to 'H'.  What do you want to happen for those that are INB and NOT INA?

 

You could try merging the data.  

data want ;
  merge a(in=in1 keep=id code) b(in=in2 keep=id);
  by id;
  if in1 and in2 then code='H';
  else if not in1 then do;
* What do you want to happen with these ? ;
  end;
run;

If it is still not clear then post sample data for four or five ids that cover all of the possible scenarios and show what values you want to get out.

 

Or you could adapt the SET method I posted before to reference the B set first.  That way you can tell if the there are any records in B for this particular value of ID.

data want ;
  set B (in=in2) A;
  by id;
  if first.id;
  if in2 then code='H';
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1948 views
  • 1 like
  • 3 in conversation