Hello Experts,
I have two variables in two tables.
The first variable, Var, in the first table , it is a big table.
The second variable, var_ref, in the second table (reference table) , it is a small table.
Output Rules:
If var =var_ref then CommonValue='equal'
if there is a common values then CommonValue = "common values betwwen var and var_ref"
if not CommonValue=''
> Big Table
| Var |
| First one |
| Test number one |
> Ref table
| Var_Ref |
| one |
| First one |
| Test |
| number |
> Output
| Var | Var_Ref | Common Values |
| First one | one | one |
| First one | First one | equal |
| Test number one | one | one |
| Test number one | number | number |
| Test number one | Test | Test |
@jarg:That's good. The more, the merrier 😉 ;
None of those values are EQUAL. What is the test that you actually want to do? Are you trying to test if VAR_REF is contained in VAR? Does case matter? Does it need to match a full word? Does VAR_REF ever contain more than one word?
Thank you.
I want to test as described in the output
If the value of "VAR" is contained in VAR_Ref?
Does it contain at least one word in VAR_Ref ?
or if does not contain any word in in VAR_Ref ?
i'm certain there are at least 5 better ways of doing this but this may do what you're after:
options mlogic symbolgen;
data ref_table;
infile datalines dlm=",";
format var $50.;
input var;
datalines;
Var_Ref,
one,
First one,
Test,
number,
;
run;
data big_table;
infile datalines dlm=",";
format var $50.;
input var;
datalines;
Var,
First one,
Test number one,
;
run;
proc sql;
create table distinct_varlist as
select distinct(strip(var)) as ref_var
from ref_table;
quit;
proc sql;
create table Big_table_distinct as
select distinct var
from big_table;
quit;
proc sql noprint; select count(ref_var) into: distinct_varcount from distinct_varlist;
%macro iterative_check;
%do i = 1 %to &distinct_varcount.;
data _null_;
p = &i.;
set distinct_varlist point=p;
call symputx('varlength',length(ref_var),'L');
call symputx('ref_var_m',ref_var,'L');
stop;
run;
data setall_&i.(drop=CV:);
set Big_table_distinct;
BigTable_Var = strip(Var);
RefTable_Var = strip("&ref_var_m.");
if BigTable_Var = RefTable_Var then CV1 = "Equal";
if indexw(BigTable_Var,RefTable_Var) then CV2 = strip(substr(BigTable_Var,indexw(BigTable_Var,RefTable_Var),index(BigTable_Var,' ')));
if indexw(RefTable_Var,BigTable_Var) then CV3 = strip(substr(RefTable_Var,indexw(RefTable_Var,BigTable_Var),index(RefTable_Var,' ')+1));
format Common_Values $50.;
Common_Values = coalescec(CV1,CV2,CV3);
run;
%end;
%mend iterative_check;
%iterative_check;
data combine;
set setall:;
run;
proc sort nodupkey data=combine; by BigTable_Var RefTable_Var Common_Values;
data output;
set combine;
where Common_Values ne '';
run;
Thank you very much for your answer, That's very kind from you
A modicum of hash object usage should do it:
data big ;
input @1 var $16. ;
cards ;
first one
test number one
;
run ;
data ref ;
input @1 var_ref $16. ;
cards ;
one
first one
test
number
;
run ;
data want ;
if _n_ = 1 then do ;
if 0 then set big ref ;
dcl hash h (dataset:"ref") ;
h.definekey ("var_ref") ;
h.definedone () ;
end ;
set big ;
common_values = var ;
if h.find (key:var) = 0 then do ;
common_values = "equal" ;
output ;
end ;
do _n_ = 1 to countw (var) ;
common_values = scan (var, _n_) ;
if h.find (key:common_values) = 0 then output ;
end ;
run ;
Kind regards
Paul D.
Thank very much. that's very kind from you.
The hash object can make many things more elegant, simpler, and faster. That's why @DonH and I have written a whole book about it:
Though in this case, the concept of the program is quite simple:
The concept would remain the same if a different type of lookup table were used. However, the hash table makes the process of organizing the lookup table and searching it a breeze - and searches in O(1) time to boot. I agree that it behooves any SAS programmer to make the hash object, to a lesser or larger degree, an instrument in one's toolbox. After all, it's been an integral part of Base SAS for 16 years and counting!
Kind regards
Paul D.
@jarg:That's good. The more, the merrier 😉 ;
Please, is -it possible to show, the order of the first char there is a difference betwwen var and var_ref
Var=test one
Var_ref=test two
order_first_char_with_dif=5
@LineMoon: This is what the COMPARE function is for:
data want ;
if _n_ = 1 then do ;
if 0 then set big ref ;
dcl hash h (dataset:"ref") ;
h.definekey ("var_ref") ;
h.definedone () ;
end ;
set big ;
common_values = var ;
do _n_ = 1 to countw (var) ;
common_values = scan (var, _n_) ;
if h.find (key:common_values) then continue ;
order_first_char_with_dif = abs (compare (var_ref, var)) ;
output ;
end ;
if h.find (key:var) = 0 then do ;
common_values = "equal" ;
order_first_char_with_dif = 0 ;
output ;
end ;
run ;
If you want the records in the output listed in the order exactly as you originally suggested, just swap the execution order around:
data want ;
if _n_ = 1 then do ;
if 0 then set big ref ;
dcl hash h (dataset:"ref") ;
h.definekey ("var_ref") ;
h.definedone () ;
end ;
set big ;
common_values = var ;
common_values = "equal" ;
order_first_char_with_dif = 0 ;
if h.find (key:var) = 0 then output ;
do _n_ = countw (var) to 1 by -1 ;
common_values = scan (var, _n_) ;
if h.find (key:common_values) ne 0 then continue ;
order_first_char_with_dif = abs (compare (var_ref, var)) ;
output ;
end ;
run ;
Kind regards
Paul D.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.