Solved: Re: Compare observations in different rows of different variables

Michael141 · Posted 05-05-2021 11:35 AM

I have two string variables which I want to compare using SAS in order to establish which observations occur in both variables and which are unique to one of these variables. I’ve provided an example of the data below, where Department_Old and Department_New are the variable names.

Department_Old	Department_New
Communications	Advertising
Finance	Communications
Health and Wellbeing	Finance
Transport	HR
	Health and Wellbeing

I have tried to use merge to do this, but from what I can see it can only make comparisons between observations within the same row. What I would like is to check whether an observations in one variable appear anywhere in the other variable, and have to SAS provide me with a list of those observations which are unique to one variable. I know that this is something which can be achieved in excel using the VLOOKUP function, but would ideally like to keep all of my work within SAS. Is there a particular process I could use to achieve this within SAS?

sbxkoenk · Posted 05-05-2021 12:26 PM

Hello,

I guess this is what you want:

data have;
LENGTH Department_Old $ 25 Department_New $ 25;
infile cards delimiter='|';
input Department_Old $ Department_New $;
cards;
Communications       |Advertising
Finance              |Communications
Health and Wellbeing |Finance
Transport            |HR
                     |Health and Wellbeing
;
run;

PROC SQL noprint;
 create table want_unique_Department_Old as
 select Department_Old
 from have
 where  upcase(Department_Old)
  NOT IN (select distinct upcase(Department_New) 
          from have)
  AND Department_Old is not missing
;
QUIT;

PROC SQL noprint;
 create table want_unique_Department_New as
 select Department_New
 from have
 where  upcase(Department_New)
  NOT IN (select distinct upcase(Department_Old) 
          from have)
  AND Department_New is not missing
;
QUIT;
/* end of program */

Koen

View solution in original post

sbxkoenk · Posted 05-05-2021 12:26 PM

Hello,

I guess this is what you want:

data have;
LENGTH Department_Old $ 25 Department_New $ 25;
infile cards delimiter='|';
input Department_Old $ Department_New $;
cards;
Communications       |Advertising
Finance              |Communications
Health and Wellbeing |Finance
Transport            |HR
                     |Health and Wellbeing
;
run;

PROC SQL noprint;
 create table want_unique_Department_Old as
 select Department_Old
 from have
 where  upcase(Department_Old)
  NOT IN (select distinct upcase(Department_New) 
          from have)
  AND Department_Old is not missing
;
QUIT;

PROC SQL noprint;
 create table want_unique_Department_New as
 select Department_New
 from have
 where  upcase(Department_New)
  NOT IN (select distinct upcase(Department_Old) 
          from have)
  AND Department_New is not missing
;
QUIT;
/* end of program */

Koen

Michael141 · Posted 05-05-2021 12:37 PM

This is exactly what I needed! I've been struggling with this for a few days, so I really appreciate your time and help. Thank you very much for your quick and helpful response

sbxkoenk · Posted 05-05-2021 12:34 PM

Or this:

data have;
LENGTH Department_Old $ 25 Department_New $ 25;
infile cards delimiter='|';
input Department_Old $ Department_New $;
cards;
Communications       |Advertising
Finance              |Communications
Health and Wellbeing |Finance
Transport            |HR
                     |Health and Wellbeing
;
run;

PROC FREQ data=have noprint;
 tables Department_Old / out=Count_Old(rename=(Department_Old=Department));
 tables Department_New / out=Count_New(rename=(Department_New=Department));
run;

data both(drop=count percent);
 merge Count_Old(in=a) Count_New(in=b);
 by Department;
 if a or b;
 if a then OLD=1; else OLD=0;
 if b then NEW=1; else NEW=0;
run;
/* end of program */

Michael141 · Posted 05-05-2021 12:38 PM

This solution is also very helpful thanks - and while it wasn't I was looking for originally, it might actually be more helpful. Thanks again for your time and help

Reeza · Posted 05-05-2021 12:51 PM

Here's one different option for you.

It combines the two columns into one, identifying the source as well. Then you count the source for ones that have only a single source (using HAVING).

data have;
LENGTH Department_Old $ 25 Department_New $ 25;
infile cards delimiter='|';
input Department_Old $ Department_New $;
ID = _n_;
cards;
Communications       |Advertising
Finance              |Communications
Health and Wellbeing |Finance
Transport            |HR
                     |Health and Wellbeing
;
run;


data stacked;
set have (keep = department_old rename = (department_old = department) in=t1)
     have (keep = department_new rename = (department_new = department) in=t2);
source = T1*1 + T2*2;
run;

proc sql;
create table unique_values as
select distinct department, source
from want
group by department
having count(distinct source) = 1;
quit;

sbxkoenk · Posted 05-05-2021 01:01 PM

Also nice,

but instead of

from want

in your PROC SQL, it should be

from stacked

Just clarifying for @Michael141 .

Kind regards

sbxkoenk · Posted 05-05-2021 12:53 PM

With pleasure.

Note that I have just edited the last program.

In the last data step (that does the merge) I have dropped the variables count and percent from the output. Count and Percent are coming from the last data set in the merge (Count_New), overriding the Count and Percent from Count_Old. As they don't bring any value and don't have any meaning in the MERGEd output data set, it's better to delete them!

data both(drop=count percent);

Koen

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away