I have two string variables which I want to compare using SAS in order to establish which observations occur in both variables and which are unique to one of these variables. I’ve provided an example of the data below, where Department_Old and Department_New are the variable names.
Department_Old | Department_New |
Communications | Advertising |
Finance | Communications |
Health and Wellbeing | Finance |
Transport | HR |
| Health and Wellbeing |
I have tried to use merge to do this, but from what I can see it can only make comparisons between observations within the same row. What I would like is to check whether an observations in one variable appear anywhere in the other variable, and have to SAS provide me with a list of those observations which are unique to one variable. I know that this is something which can be achieved in excel using the VLOOKUP function, but would ideally like to keep all of my work within SAS. Is there a particular process I could use to achieve this within SAS?
Hello,
I guess this is what you want:
data have;
LENGTH Department_Old $ 25 Department_New $ 25;
infile cards delimiter='|';
input Department_Old $ Department_New $;
cards;
Communications |Advertising
Finance |Communications
Health and Wellbeing |Finance
Transport |HR
|Health and Wellbeing
;
run;
PROC SQL noprint;
create table want_unique_Department_Old as
select Department_Old
from have
where upcase(Department_Old)
NOT IN (select distinct upcase(Department_New)
from have)
AND Department_Old is not missing
;
QUIT;
PROC SQL noprint;
create table want_unique_Department_New as
select Department_New
from have
where upcase(Department_New)
NOT IN (select distinct upcase(Department_Old)
from have)
AND Department_New is not missing
;
QUIT;
/* end of program */
Koen
Hello,
I guess this is what you want:
data have;
LENGTH Department_Old $ 25 Department_New $ 25;
infile cards delimiter='|';
input Department_Old $ Department_New $;
cards;
Communications |Advertising
Finance |Communications
Health and Wellbeing |Finance
Transport |HR
|Health and Wellbeing
;
run;
PROC SQL noprint;
create table want_unique_Department_Old as
select Department_Old
from have
where upcase(Department_Old)
NOT IN (select distinct upcase(Department_New)
from have)
AND Department_Old is not missing
;
QUIT;
PROC SQL noprint;
create table want_unique_Department_New as
select Department_New
from have
where upcase(Department_New)
NOT IN (select distinct upcase(Department_Old)
from have)
AND Department_New is not missing
;
QUIT;
/* end of program */
Koen
This is exactly what I needed! I've been struggling with this for a few days, so I really appreciate your time and help. Thank you very much for your quick and helpful response
Or this:
data have;
LENGTH Department_Old $ 25 Department_New $ 25;
infile cards delimiter='|';
input Department_Old $ Department_New $;
cards;
Communications |Advertising
Finance |Communications
Health and Wellbeing |Finance
Transport |HR
|Health and Wellbeing
;
run;
PROC FREQ data=have noprint;
tables Department_Old / out=Count_Old(rename=(Department_Old=Department));
tables Department_New / out=Count_New(rename=(Department_New=Department));
run;
data both(drop=count percent);
merge Count_Old(in=a) Count_New(in=b);
by Department;
if a or b;
if a then OLD=1; else OLD=0;
if b then NEW=1; else NEW=0;
run;
/* end of program */
This solution is also very helpful thanks - and while it wasn't I was looking for originally, it might actually be more helpful. Thanks again for your time and help
Here's one different option for you.
It combines the two columns into one, identifying the source as well. Then you count the source for ones that have only a single source (using HAVING).
data have;
LENGTH Department_Old $ 25 Department_New $ 25;
infile cards delimiter='|';
input Department_Old $ Department_New $;
ID = _n_;
cards;
Communications |Advertising
Finance |Communications
Health and Wellbeing |Finance
Transport |HR
|Health and Wellbeing
;
run;
data stacked;
set have (keep = department_old rename = (department_old = department) in=t1)
have (keep = department_new rename = (department_new = department) in=t2);
source = T1*1 + T2*2;
run;
proc sql;
create table unique_values as
select distinct department, source
from want
group by department
having count(distinct source) = 1;
quit;
Also nice,
but instead of
from want
in your PROC SQL, it should be
from stacked
Just clarifying for @Michael141 .
Kind regards
With pleasure.
Note that I have just edited the last program.
In the last data step (that does the merge) I have dropped the variables count and percent from the output. Count and Percent are coming from the last data set in the merge (Count_New), overriding the Count and Percent from Count_Old. As they don't bring any value and don't have any meaning in the MERGEd output data set, it's better to delete them!
data both(drop=count percent);
Koen
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.