Desktop productivity for business analysts and programmers

Merge two datasets with same variable, change variable character, remove duplicates

Reply
Occasional Contributor
Posts: 13

Merge two datasets with same variable, change variable character, remove duplicates

proc sort data =MergeA 
by ID;
run;

proc sort data =MergeB
by ID;
run;

Data MergeC
set MergeA MergeB;
by ID; if first.id=0 or last.id=0 then Code='H'; run; proc sort data=MergeC nodupkey; by id; run;

Hello,

 

I have two datasets: Dataset A and Dataset B. Both datasets contains the variables ID and Code. However, Dataset A's code contains different characters while Dataset B code contains the character 'H'. I want to merge the two datasets together to see what duplicate ID's have different characters and change all of the codes to the character 'H'. Once I changed all of the Codes to 'H', I want to remove the duplicates. When I removed the duplicates, they still remained. Please see code and what I am trying to do below.

 

DataSet ADataSet B   
IDCodeIDCode   
12345I12345H   
67890J67890H   
11223 11223H   
45678K45678H   
       

I want to Merge A & B together

     
IDCode     
12345I
 
   
 
 
12345H 
67890J 
67890H 
11223      
11223H     
45678K     
45678H     
       
Then I want to change all of the codes to H     
IDCode     
12345H
 
   
 
12345H
67890H
67890H
11223H
11223H     
45678H     
45678H     
       
Then I want to remove the duplicate IDs     
IDCode
 
   
 
 
12345H 
67890H 
11223H 
45678H 
   
Super User
Posts: 2,068

Re: Merge two datasets with same variable, change variable character, remove duplicates

Posted in reply to kfbaker0206
data a;
input ID	;
cards; 
12345
67890
11223
45678
;
 
data b;
input	Code $	ID	Code2 $	;
cards; 
I	12345	H
J	67890	H
.	11223	H
K	45678	H
;
proc sort data=a;
by id;
run;

proc sort data=b;
by id;
run;

/*you want to Merge A & B together*/
data want;
merge a b;
by id ;
output;
code=code2;
output;
drop code2;
run;
/*Then you want to change all of the codes to H*/
data want1;
set want;
code='H';
run;
/*Then you want to remove the duplicate IDs*/
proc sort data=want1 out=want2 nodupkey;
by id ;
run;

Occasional Contributor
Posts: 13

Re: Merge two datasets with same variable, change variable character, remove duplicates

Posted in reply to novinosrin

I should have stated that I am new to sas. I have 4K ID's that are like this. What are the cards and should I do it for all 4K?

Super User
Posts: 2,068

Re: Merge two datasets with same variable, change variable character, remove duplicates

Posted in reply to kfbaker0206

The solution starts only from sort procedure. The first 2 are just samples that you gave us. Use the solution and replicate on your real data

Super User
Super User
Posts: 8,287

Re: Merge two datasets with same variable, change variable character, remove duplicates

[ Edited ]
Posted in reply to kfbaker0206

if you are changing all of the CODE values to 'H' why do you care what value they had before?

It also does not sound like you want to "merge" the datasets. Instead it sounds more like you want to "set" them together.

If they are already sorted by ID then you can combine them and find the set of unique ID values in one step.

data want ;
  set A B ;
  by id ;
  if first.id ;
  code = 'H';
run;

 

Occasional Contributor
Posts: 13

Re: Merge two datasets with same variable, change variable character, remove duplicates

I care because in Dataset A there are 6K id numbers 4k of which match Dataset B. The original data sets (A and B) cannot be modified. They are received from two different areas. I want the 2K from dataset A to keep their original CODE. The 4K (that have the same IDs) from Datasets A & B needs to be changed to 'H'.

Super User
Super User
Posts: 8,287

Re: Merge two datasets with same variable, change variable character, remove duplicates

[ Edited ]
Posted in reply to kfbaker0206

@kfbaker0206 wrote:

I care because in Dataset A there are 6K id numbers 4k of which match Dataset B. The original data sets (A and B) cannot be modified. They are received from two different areas. I want the 2K from dataset A to keep their original CODE. The 4K (that have the same IDs) from Datasets A & B needs to be changed to 'H'.


Use the IN= dataset option.

There are three possible sets of records.  For those INA and NOT INB you want the CODE from A.  For those in INA and INB you want CODE set to 'H'.  What do you want to happen for those that are INB and NOT INA?

 

You could try merging the data.  

data want ;
  merge a(in=in1 keep=id code) b(in=in2 keep=id);
  by id;
  if in1 and in2 then code='H';
  else if not in1 then do;
* What do you want to happen with these ? ;
  end;
run;

If it is still not clear then post sample data for four or five ids that cover all of the possible scenarios and show what values you want to get out.

 

Or you could adapt the SET method I posted before to reference the B set first.  That way you can tell if the there are any records in B for this particular value of ID.

data want ;
  set B (in=in2) A;
  by id;
  if first.id;
  if in2 then code='H';
run;
Ask a Question
Discussion stats
  • 6 replies
  • 84 views
  • 1 like
  • 3 in conversation