BookmarkSubscribeRSS Feed
France
Quartz | Level 8

Dear all,

 

How to exclude the observations in both dataset A and B when they have matched with each other.

 

For example, 

in dataset A

 

name, number 

apple,10

pear,11

pen,12

 

in dataset B

 

name, age 

apple,5

fight,19

air,7

 

I would like to create a new dataset C, which includes

name, number, age 

apple,10,5

 

and then exclude observations apple from dataset B and C, like

 

dataset B

name, number

pear,11

pen,12

 

dataset C

name, age 

fight,19

air,7

 

Could you please give me some suggestions about this.

thanks in advance

 

 

 

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Pear started off in data set A, how did it wind up in data set B?

--
Paige Miller
AnnaHawgood
Fluorite | Level 6

If I have understood your question correctly, you should get the datasets you want (a1, b1 and c) with the the following code:

 

proc sort data=a;
 by name;
run;

proc sort data=b;
 by name;
run;

data a1 b1 c;
 merge a (in=a) b (in=b);
 by name;
 if a and not b then output a1;
 if b and not a then output b1;
 if a and b then output c;
run;

Reeza
Super User
Can you have duplicates within a single file? If not, consider appending the data and using PROC SORT with the UNIQUEKEY and DUPOUT options.
France
Quartz | Level 8
Dear Reeza,
thanks for your advice. Could you please explain the meaning in detail for me? Especially, what is the meaning of 'duplicates within a single file' ?
Reeza
Super User

1. Append the files - use a data step with INDSNAME option to see the source input file.

2. Read up on PROC SORT - specifcially UNIQUEKEY and NOUNIQUE

3. Use those options to push the duplicate keys to one data set and the others to a different data set. 

 


@France wrote:
Dear Reeza,
thanks for your advice. Could you please explain the meaning in detail for me? Especially, what is the meaning of 'duplicates within a single file' ?

 

ballardw
Super User

I think you start getting your data set names messed up a bit.

This does what you ask, using different final data set names, for your example data.

data a;
   infile datalines dlm=',';
   input name $ number ;
datalines;
apple,10
pear,11
pen,12
;
run;
 
data b;
   infile datalines dlm=',';
   input name $ age ;
datalines;
apple,5
fight,19
air,7
;
run;

proc sort data=a;
  by name;
run;

proc sort data=b;
  by name;
run;

data n1 n2 (drop=age) n3 (drop=number);
   merge a (in=ina)
         b( in=inb )
   ;
   by name;
   if ina and inb then output n1;
   else if ina then output n2;
   else if inb then output n3;
run;

BTW almost any operation that reads dataset named XXX and writes a modified version back to XXX is very dangerous as you replace the existing data set and lose the starting data.

 

France
Quartz | Level 8

Dear ballardw,

 

thank you for your advice. I am using the following codes,

 
data sa_step6.Distinct_name_ori1;
  set Result.Distinct_name;
  rename psn_name=name;
run;

proc sort data=step1.WSCOPEUK;
  by name;
run;

proc sort data=sa_step6.Distinct_name_ori1;
  by name;
run;

data SA_STEP5.WSCOPEUK_TRY1 sa_step6.Distinct_name_ori2 sa_step6.WSCOPEUK_ori2 ;
   merge sa_step6.Distinct_name_ori1 (in=ina)
         step1.WSCOPEUK (in=ina)
   ;
   by name;
   if ina and inb then output SA_STEP5.WSCOPEUK_TRY1;
   else if ina then output sa_step6.Distinct_name_ori2;
   else if inb then output sa_step6.WSCOPEUK_ori2;
run;

however, I get the following result.

 

858
859  data sa_step6.Distinct_name_ori1;
860    set Result.Distinct_name;
861    rename psn_name=name;
862  run;

NOTE: There were 881490 observations read from the data set RESULT.DISTINCT_NAME.
NOTE: The data set SA_STEP6.DISTINCT_NAME_ORI1 has 881490 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           39.55 seconds
      cpu time            2.35 seconds


863
864  proc sort data=step1.WSCOPEUK;
865    by name;
866  run;

NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


867
868  proc sort data=sa_step6.Distinct_name_ori1;
869    by name;
870  run;

NOTE: There were 881490 observations read from the data set SA_STEP6.DISTINCT_NAME_ORI1.
NOTE: The data set SA_STEP6.DISTINCT_NAME_ORI1 has 881490 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           21.02 seconds
      cpu time            4.84 seconds


871
872  data SA_STEP5.WSCOPEUK_TRY1 sa_step6.Distinct_name_ori2 sa_step6.WSCOPEUK_ori2 ;
873     merge sa_step6.Distinct_name_ori1 (in=ina)
874           step1.WSCOPEUK (in=ina)
875     ;
876     by name;
877     if ina and inb then output SA_STEP5.WSCOPEUK_TRY1;
878     else if ina then output sa_step6.Distinct_name_ori2;
879     else if inb then output sa_step6.WSCOPEUK_ori2;
880  run;

NOTE: Variable inb is uninitialized.
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 881490 observations read from the data set SA_STEP6.DISTINCT_NAME_ORI1.
NOTE: There were 5664 observations read from the data set STEP1.WSCOPEUK.
NOTE: The data set SA_STEP5.WSCOPEUK_TRY1 has 0 observations and 15 variables.
NOTE: The data set SA_STEP6.DISTINCT_NAME_ORI2 has 887001 observations and 15 variables.
NOTE: The data set SA_STEP6.WSCOPEUK_ORI2 has 0 observations and 15 variables.
NOTE: DATA statement used (Total process time):
      real time           35.96 seconds
      cpu time            2.79 seconds


could you please give me some suggestion to fix this problem.

 

thanks in advanece

SuzanneDorinski
Lapis Lazuli | Level 10

You used in=ina for both your data sets on the merge statement.

 

   merge sa_step6.Distinct_name_ori1 (in=ina)
         step1.WSCOPEUK (in=ina)

Change one of them to inb.  I can't say at a glance which one  you want to be inb.  

 

That's why you have the note in the log that inb is uninitialized.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4120 views
  • 3 likes
  • 6 in conversation