BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jen123
Fluorite | Level 6

I am trying to get the rows that are NOT in both datasets.

**************************************

proc sql;

create table libname.file_discrepancies as select *

from libname.file_a t1, libname.file_b t2

where GEID.t1 NE GEID.t2;

run;

************************************

However, I get this error which I cannot figure out.  GEID exists as numeric in both datasets.

 

ERROR: Unresolved reference to table/correlation name GEID.

ERROR: Unresolved reference to table/correlation name GEID.

Any advice is appreciated.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

or by merging:

proc sort data=file_a; by geid;

proc sort data=file_b; by geid;

run;

data want;

  merge file_a(in=a) file_b(in=b);

  by geid;

  if a and b then delete;

  run;

View solution in original post

15 REPLIES 15
art297
Opal | Level 21

: I don't think your code will work or do what you want, but the error you are getting is because you have the roles reversed between file reference and variable.  They should be something like t1.geid, not the way you have them (reversed).

Conversely, something like the following might achieve what you want:

data file_a;

  input geid;

  cards;

1

2

3

4

5

;

data file_b;

  input geid;

  cards;

1

2

7

8

9

;

proc sql;

  create table work.file_discrepancies as

   (select geid from work.file_a

         except

         select geid from work.file_b)

   union

   (select geid from work.file_b

         except

         select geid from work.file_a)

  ;

quit;

jen123
Fluorite | Level 6

Hi Arthur - what does the "cards" mean?

input geid;

cards;

I tried your code.  No errors, but the resutls is only the GEID variable.  I want all the columns/variables.

Let's say if the GEID matches between the two files but other variables do not match, would changing the "select GEID" in your proc sql to "select *" give me only the rows where all the variables do not match?

Linlin
Lapis Lazuli | Level 10

or by merging:

proc sort data=file_a; by geid;

proc sort data=file_b; by geid;

run;

data want;

  merge file_a(in=a) file_b(in=b);

  by geid;

  if a and b then delete;

  run;

jen123
Fluorite | Level 6

Linlin,

I used your code, which was easier for me to understand considering my SAS knowledge level.  Anyway, I got this error:  ERROR: BY variables are not properly sorted on data set JEN2.COMBINED_GDW_PRIOR_A. (which is file_b in your code above).

The error pointed out to one specific employee.  However, when I look at the data row, it looks right.  What does a=1 b=1 mean in the error below? 

 

a=1 b=1 GEID=1010353890 SOEID=SK53890 Full_Name=K,Suresh......(there are 135 columns/variables).

 

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 104183 observations read from the data set JEN1.COMBINED_GDW_CURRENT_A.

NOTE: There were 35472 observations read from the data set JEN2.COMBINED_GDW_PRIOR_A.

 

WARNING: The data set JEN1.COMBINED_GDW_DISCREPANCIES may be incomplete. When this step was stopped there were 71599 observations

and 135 variables.

 

WARNING: Data set JEN1.COMBINED_GDW_DISCREPANCIES was not replaced because this step was stopped.

art297
Opal | Level 21

: Linlin's suggestion included first sorting the two files by geid.  Did you do that?

a and b, in Linlin's code are boolean variables, indicating if a record came from one, the other, or both files.  They were set with the in= option.

A value of 1 indicates that the record existed in that file.

As for your question about my code, cards (otherwise known as datalines) is simply a way of entering data into SAS.

jen123
Fluorite | Level 6

The first column is GEID.  So I don't understand what the a= and b= mean?  Is it saying that GEID is the 3 variable for this particular employee?

a=1 b=1 GEID=1010353890

Linlin
Lapis Lazuli | Level 10

Would you please post your code?

jen123
Fluorite | Level 6

Sorry about that.  Here is the code:

proc sort data=jen1.combined_gdw_current_a; by geid;

proc sort data=jen2.combined_gdw_prior; by geid;

run;

data jen1.combined_gdw_discrepancies;

  merge jen1.combined_gdw_current_a(in=a) jen2.combined_gdw_prior_a(in=b);

  by geid;

  if a and b then delete;

  run;

art297
Opal | Level 21

The second file is not the one that you sorted.  You sorted jen2.combined_gdw_prior, but used jen2.combined_gdw_prior_a.

jen123
Fluorite | Level 6

The second file is not the one that you sorted.  You sorted jen2.combined_gdw_prior, but used jen2.combined_gdw_prior_a.

Thanks for catching that Arthur!

jen123
Fluorite | Level 6

Sorry for being all over the place.  I tried Arthur's advice and it worked.  Now, what if I want for two fields (called GEID and GOC) to match.  So, I am looking for a dataset where both fields do not match.

Here is my code:

proc sql;

  create table jen3.combined_gdw_discrepancies_test as

   (select GEID from jen1.combined_gdw_current_a

         except

         select GEID  from jen2.combined_gdw_prior_a)

   union

   (select GEID  from jen2.combined_gdw_prior_a

         except

         select GEID  from jen1.combined_gdw_current_a)

  ;

quit;

art297
Opal | Level 21

As long as the files don't contain duplicate records in both files (i.e., resulting in a many-to-many match), I'd just use a variant of Linlin's code.  i.e.:

proc sort data=jen1.combined_gdw_current_a; by geid goc; run;

proc sort data=jen2.combined_gdw_prior; by geid goc; run;

data jen1.combined_gdw_discrepancies;

  merge jen1.combined_gdw_current_a(in=a) jen2.combined_gdw_prior(in=b);

  by geid goc;

  if a and b then delete;

  run;


jen123
Fluorite | Level 6

That worked!  How do I mark both you and Linlin with the correct answer for this question?

Thank you Linlin and Arthur!

art297
Opal | Level 21

You can mark Linlin as "correct" and, if desired, I'd be quite happy with a "helpful" thank you.  More importantly, glad to see that you achieved what you needed to accomplish.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1488 views
  • 3 likes
  • 3 in conversation