Hello Everyone !
I am using SAS 9.4 Version.
I have one dataset as below and I want to check if VarA's value in present in VarB anywhere, irrespective of its row order.based on the i can create a new variable to mark yes/No based on its presence.
Can you please help how can i check it in sas using datastep /Macro/proc sql ?
I am trying to search if XX is present in VarB anywhere ? if yes then mark it as Y else NO
if YY is present in VarB anywhere ? if yes then mark it as Y else NO and so on .. till the dataset ends.
Dataset- Test
VarA VarB new_var
XX PP Y
YY MM N
ZZ XX N
AA BB Y
BB AA Y
data have;
input (VarA VarB) ($);
datalines;
XX PP
YY MM
ZZ XX
AA BB
BB AA
;
data want;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have');
h.definekey('varb');
h.definedone();
end;
set have;
if h.check(key:vara)=0 then flag='Y';
else flag='N';
run;
using Hash object is very easy if you know it,and since you didn't mention, I am giving you the more traditional datastep solution:
data have;
input (VarA VarB) ($);
datalines;
XX PP
YY MM
ZZ XX
AA BB
BB AA
;
proc transpose data=have(keep=varb) out=_have;
var varb;
run;
data want;
set have;
if _n_=1 then set _have;
array t(*) col: ;
if vara in t then new_var="Y";
else new_var="N";
keep vara varb new_var;
run;
Hi Novinosrin,
Thank you very much for your answer. I didn't know that even a single dataset can be looked for different variables within.
Can you please suggest the HASH object suggestion for same ?
@Priya_26 With no surprises, the genius Ksharp gave you the hash solution even before I woke up to see your message. 🙂
@novinosrin Thank you veru much for your time and Solution. Both of them are just working Perfect. Since I have to use this in production with Huge data as input, I will go with HASH solution.
Thanks again 🙂
My preference:
proc sql;
select distinct(quote(varb)) into : b_list separated by ' ' from have;
quit;
data want;
set have;
if vara in (&b_list) then new_var='Y';
else new_var='N';
run;
@Astounding's approach will work comfortably for relatively medium sized datasets however if you are working with very large datasets and the macro var list(if and when exceeds 64k length) as it did once to me in a similar situation will make you look for an alternative solution yet again.
And Proc SQL:
Proc sql does not guarantee the order of the output though:
data have;
input (VarA VarB) ($);
datalines;
XX PP
YY MM
ZZ XX
AA BB
BB AA
;
proc sql;
create table want as
select a.* ,max(case when a.vara=b.varb then 'Y' else 'N' end) as new_var
from have a, have(keep=varb) b
group by a.vara,a.varb;
quit;
proc sql;
create table test1 as
select t.*,
(case when (select count(*) from test where VarB =t.VarA) =0 then 'N'
else 'Y' end) as new_var
from test t;
quit;
This will work.
It will work like a charm until the size of the DS in small or medium or you have an index on VarB.
data have;
input (VarA VarB) ($);
datalines;
XX PP
YY MM
ZZ XX
AA BB
BB AA ;
run;
proc sql;
select distinct("varb") into : mac separated by ' ' from have;
quit;
data want;
set have;
if vara in (&mac) then new_var='Y';
else new_var='N';
run;
data have;
input (VarA VarB) ($);
datalines;
XX PP
YY MM
ZZ XX
AA BB
BB AA
;
data want;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have');
h.definekey('varb');
h.definedone();
end;
set have;
if h.check(key:vara)=0 then flag='Y';
else flag='N';
run;
@Ksharp Thanks alot, This is just perfect 🙂 Thank you veru much !! 🙂
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.