Desktop productivity for business analysts and programmers

Unique rows between two files

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 94
Accepted Solution

Unique rows between two files

I am trying to get the rows that are NOT in both datasets.

**************************************

proc sql;

create table libname.file_discrepancies as select *

from libname.file_a t1, libname.file_b t2

where GEID.t1 NE GEID.t2;

run;

************************************

However, I get this error which I cannot figure out.  GEID exists as numeric in both datasets.

 

ERROR: Unresolved reference to table/correlation name GEID.

ERROR: Unresolved reference to table/correlation name GEID.

Any advice is appreciated.

Thanks!


Accepted Solutions
Solution
‎01-17-2013 07:45 PM
Super Contributor
Posts: 1,636

Re: Unique rows between two files

or by merging:

proc sort data=file_a; by geid;

proc sort data=file_b; by geid;

run;

data want;

  merge file_a(in=a) file_b(in=b);

  by geid;

  if a and b then delete;

  run;

View solution in original post


All Replies
PROC Star
Posts: 7,431

Re: Unique rows between two files

: I don't think your code will work or do what you want, but the error you are getting is because you have the roles reversed between file reference and variable.  They should be something like t1.geid, not the way you have them (reversed).

Conversely, something like the following might achieve what you want:

data file_a;

  input geid;

  cards;

1

2

3

4

5

;

data file_b;

  input geid;

  cards;

1

2

7

8

9

;

proc sql;

  create table work.file_discrepancies as

   (select geid from work.file_a

         except

         select geid from work.file_b)

   union

   (select geid from work.file_b

         except

         select geid from work.file_a)

  ;

quit;

Frequent Contributor
Posts: 94

Re: Unique rows between two files

Hi Arthur - what does the "cards" mean?

input geid;

cards;

I tried your code.  No errors, but the resutls is only the GEID variable.  I want all the columns/variables.

Let's say if the GEID matches between the two files but other variables do not match, would changing the "select GEID" in your proc sql to "select *" give me only the rows where all the variables do not match?

Solution
‎01-17-2013 07:45 PM
Super Contributor
Posts: 1,636

Re: Unique rows between two files

or by merging:

proc sort data=file_a; by geid;

proc sort data=file_b; by geid;

run;

data want;

  merge file_a(in=a) file_b(in=b);

  by geid;

  if a and b then delete;

  run;

Frequent Contributor
Posts: 94

Re: Unique rows between two files

Linlin,

I used your code, which was easier for me to understand considering my SAS knowledge level.  Anyway, I got this error:  ERROR: BY variables are not properly sorted on data set JEN2.COMBINED_GDW_PRIOR_A. (which is file_b in your code above).

The error pointed out to one specific employee.  However, when I look at the data row, it looks right.  What does a=1 b=1 mean in the error below? 

 

a=1 b=1 GEID=1010353890 SOEID=SK53890 Full_Name=K,Suresh......(there are 135 columns/variables).

 

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 104183 observations read from the data set JEN1.COMBINED_GDW_CURRENT_A.

NOTE: There were 35472 observations read from the data set JEN2.COMBINED_GDW_PRIOR_A.

 

WARNING: The data set JEN1.COMBINED_GDW_DISCREPANCIES may be incomplete. When this step was stopped there were 71599 observations

and 135 variables.

 

WARNING: Data set JEN1.COMBINED_GDW_DISCREPANCIES was not replaced because this step was stopped.

PROC Star
Posts: 7,431

Re: Unique rows between two files

: Linlin's suggestion included first sorting the two files by geid.  Did you do that?

a and b, in Linlin's code are boolean variables, indicating if a record came from one, the other, or both files.  They were set with the in= option.

A value of 1 indicates that the record existed in that file.

As for your question about my code, cards (otherwise known as datalines) is simply a way of entering data into SAS.

Frequent Contributor
Posts: 94

Re: Unique rows between two files

The first column is GEID.  So I don't understand what the a= and b= mean?  Is it saying that GEID is the 3 variable for this particular employee?

a=1 b=1 GEID=1010353890

Super Contributor
Posts: 1,636

Re: Unique rows between two files

Would you please post your code?

Frequent Contributor
Posts: 94

Re: Unique rows between two files

Sorry about that.  Here is the code:

proc sort data=jen1.combined_gdw_current_a; by geid;

proc sort data=jen2.combined_gdw_prior; by geid;

run;

data jen1.combined_gdw_discrepancies;

  merge jen1.combined_gdw_current_a(in=a) jen2.combined_gdw_prior_a(in=b);

  by geid;

  if a and b then delete;

  run;

PROC Star
Posts: 7,431

Re: Unique rows between two files

The second file is not the one that you sorted.  You sorted jen2.combined_gdw_prior, but used jen2.combined_gdw_prior_a.

Frequent Contributor
Posts: 94

Re: Unique rows between two files

The second file is not the one that you sorted.  You sorted jen2.combined_gdw_prior, but used jen2.combined_gdw_prior_a.

Thanks for catching that Arthur!

Frequent Contributor
Posts: 94

Re: Unique rows between two files

Sorry for being all over the place.  I tried Arthur's advice and it worked.  Now, what if I want for two fields (called GEID and GOC) to match.  So, I am looking for a dataset where both fields do not match.

Here is my code:

proc sql;

  create table jen3.combined_gdw_discrepancies_test as

   (select GEID from jen1.combined_gdw_current_a

         except

         select GEID  from jen2.combined_gdw_prior_a)

   union

   (select GEID  from jen2.combined_gdw_prior_a

         except

         select GEID  from jen1.combined_gdw_current_a)

  ;

quit;

PROC Star
Posts: 7,431

Re: Unique rows between two files

As long as the files don't contain duplicate records in both files (i.e., resulting in a many-to-many match), I'd just use a variant of Linlin's code.  i.e.:

proc sort data=jen1.combined_gdw_current_a; by geid goc; run;

proc sort data=jen2.combined_gdw_prior; by geid goc; run;

data jen1.combined_gdw_discrepancies;

  merge jen1.combined_gdw_current_a(in=a) jen2.combined_gdw_prior(in=b);

  by geid goc;

  if a and b then delete;

  run;


Frequent Contributor
Posts: 94

Re: Unique rows between two files

That worked!  How do I mark both you and Linlin with the correct answer for this question?

Thank you Linlin and Arthur!

PROC Star
Posts: 7,431

Re: Unique rows between two files

You can mark Linlin as "correct" and, if desired, I'd be quite happy with a "helpful" thank you.  More importantly, glad to see that you achieved what you needed to accomplish.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 488 views
  • 3 likes
  • 3 in conversation