Help using Base SAS procedures

Compare contents of two datasets

Reply
Contributor
Posts: 57

Compare contents of two datasets

Hi-

 

Asking this super SAS support group for some help.  I am trying to compare the contents (data) from two different datasets.  Both have the same variable names but the number of observations differ.  I tried proc compare to compare to obtain differences but it was difficult to interpret the proc compare results.  Can anyone help me on how I can go about producing an outpuf file upon comparing patient information from both datasets.  It would be helpful if the output would contain only patient data from from table A not matching patient data in table B.   Any help would be appreciated.  Thank you all in advance!

Super User
Posts: 17,828

Re: Compare contents of two datasets

Proc compare has it's limitations. A simple SQL query will generate what you want, the code below will select observations from TableA where the ID is not in TableB. You can replace the ID variable by your ID variable and table name with your table names. 

 

proc sql;
create table missing as
select * from tableA
where ID not in (select distinct ID from tableB);
quit;

Can you have multiple records per patient? How should that be handled?

Contributor
Posts: 57

Re: Compare contents of two datasets

Hi-  Thank you so much for responding in my quest for help!! 

 

Yes.  There could be multiple records per patient and the patient record data could be reported from different location sites.  I want to compare the patient data at location ABC from Table A against the same patient data at location XYZ from Table B.   Thank you!

Occasional Contributor
Posts: 9

Re: Compare contents of two datasets

Maybe not very elegant, but this should do the trick.

 


/* Get some dummy data. */
%let varnames=age height weight;

data site1;
set sashelp.class;
age = age +1;
run;

data site2;
set sashelp.class;
run;


/* Rename your variables. */
%macro xxx (site);

proc datasets library=work memtype=data nolist;
 modify &site.;
 rename
 %do i=1 %to %sysfunc(countw(&varnames));
  %scan(&varnames,&i)= &site._&i
 %end;
 ;
quit;

%mend xxx;

%xxx (site1);
%xxx (site2);


/* Comparison. */
%macro yyy;

data comp;
    merge site1 (in = a) site2 (in = b);
    by name ; /* plus any other unique keys you may have */

    if a and b then flag =          "Site 1 & 2 ";
    else if a and not b then flag = "Site 1 only";
    else if not a and b then flag = "Site 2 only";

 %do i=1 %to %sysfunc(countw(&varnames));
   diff&i. = sum (site1_&i, -site2_&i);
 %end;

run;
%mend yyy;

%yyy;

Contributor
Posts: 57

Re: Compare contents of two datasets

Thank you for your response.  I'm not too versed with macros but I'll pull my reference books out to review again.   Do you know if the SQL stmts below would work?   To get the differences and then another pass to get the intersects/commons?

 

Proc SQL;

  create table missing as

  select * from Tab1

     except

 select * from Tab2;

Quit;

 

Proc SQL;

  create table missing as

  select * from Tab1

     intersect

 select * from Tab2;

Quit;

 

Contributor
Posts: 57

Re: Compare contents of two datasets

Hi-

 

Thank you for your help.  I am trying to compare 15 variables from Tab1 and Tab2.   Can you help explain what this part of your code does. 

 

/* Rename your variables. */

%macro xxx (site);

proc datasets library=work memtype=data nolist;

modify &site.;

rename

%do i=1 %to %sysfunc(countw(&varnames));

%scan(&varnames,&i)= &site._&i

%end;

;

quit;

%mend xxx;

%xxx (site1);

%xxx (site2);

 

Super User
Posts: 17,828

Re: Compare contents of two datasets

@Suzanne_Ed Did you try the initial code I suggested?

Contributor
Posts: 57

Re: Compare contents of two datasets

Hi Reeza,

 

Yes I did.  Unfortunately it didnt work.  I think its because I have duplicate IDs (which is valid for my dataset) But I dont think SAS liked it.  

Super User
Posts: 17,828

Re: Compare contents of two datasets


Suzanne_Ed wrote:

Hi Reeza,

 

Yes I did.  Unfortunately it didnt work.  I think its because I have duplicate IDs (which is valid for my dataset) But I dont think SAS liked it.  


 

 

@Suzanne_Ed Duplicate ID's shouldn't have mattered, did you get an error or results that were unsatisfactory?

Contributor
Posts: 57

Re: Compare contents of two datasets

Hi Reeza, 

 

Thank you so much for your help. 

 

** trying to compare Table A with Table B to determine contents in Table A changed/missing in Table B and produce output for additional review**

 

This is the code I applied which you suggested.   

proc sql;

create table missing as

select * from TABLEA

where PATIENTID LOCATION DATE PROC OUTCOME

not in (select distinct PATIENTID LOCATION DATE PROC OUTCOME from TABLEB);

quit;

 

Syntax error after PatientID

ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, -, '.', /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET, GROUP, GT, GTT, HAVING, IN,

INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.

ERROR 76-322: Syntax error, statement will be ignored.

 

Occasional Contributor
Posts: 9

Re: Compare contents of two datasets

[ Edited ]

Hi Suzanne,

 

Tha code simply renames your variables into site1_1 (the first variable  from your first dataset, in the order you listed in varnames: %let varnames=age height weight;), site1_2 ( the second variable  from your first dataset),... site1_n, site2_1, etc. 

 

I assumed that you want to do a difference between the two tables (diff&i. = sum (site1_&i, -site2_&i);) which was the reason I renamed the variables in the first place.

 

If that's the case, to use that code do the following:

1) in %let varnames=age height weight; replace age, height, weight with your variables from the two datasets.

2) name your datasets site1 and site2 (or change their name when you call the macro xxx).

3) replace by name with by (your list of unique identifiers).

 

I hope this helps.

Contributor
Posts: 57

Re: Compare contents of two datasets

Hi-

 

I think I almost got it to run correctly.  The code failed untimately.   The comparison code-  why is my data being converted from Character to Numeric?    The data contents within TableA and TableB are all string/characters/YN and no numeric.  And does the data need to be sorted first in the merge step?  << got error message too

 

Ask a Question
Discussion stats
  • 11 replies
  • 521 views
  • 0 likes
  • 3 in conversation