data have;
input obs VarA $2. VarB $8.;
datalines;
1 XX PPXX
2 YY MMY
3 ZZ XX
4 AA AA
5 BB XX BB
;
run;
need to create new Flag varaible which will contain Y when obs = 1, 4 and 5 and N when obs = 2 and 3
You need to show what the expected output would look like. You Flag and N are not well defined as to content, variable type.
@pdhokriya wrote:
There would be new variable Flag, where i need to show data Y/N
The reason I ask for an example of your output is because your stated requirement is check "one variable against another variable in same dataset resepctive (sic.) of its row". That means your one variable can have multiple comparisons and multiple matches on different rows.
Your example data would find a match for XX on row 5. You would also find a match for BB on row 5. That,s two matches. And why would row 4 have an 'N'? AA matches AA so that would be a 'Y' (irrespective of row remember). So there is at least one rule missing, and I suspect more not stated yet.
So if you have a single "flag" of Y what does that mean? How about when you have 3 matches, or 4??
The larger your data set becomes with this sort of comparison the more likely you are to have multiple matches and a single flag may very well be insufficient for many uses.
You can use the FIND function to return the character position of VARA as a substring of VARB. If the substring is not found, then FIND returns a zero:
data have;
input obs VarA $2. VarB $8.;
datalines;
1 XX PPXX
2 YY MMY
3 ZZ XX
4 AA AA
5 BB XX BB
;
run;
data want;
set have;
charnum=find(varb,trim(vara));
if charnum>0 then flag='Y';
else flag='N';
run;
I use "trim(vara)" as the second argument, to avoid false negatives if it has trailing (or leading) spaces.
You might want to improve on this by embedding the FIND function as a conditional expression inside an IFC function:
data want;
set have;
length flag $1;
flag= ifc(find(varb,trim(vara)),'Y','N');
run;
The IFC returns a character value. Either "Y" if the condition find(varb,trim(vara)) is not a zero or missing, or an "N" otherwise.
Thank you for your reply.
Its almost done but if i consider new senerio (mentioned below) here I will get N instead of Y when obs = 2, My addition requirement is when we dont find such data then it should flag as Y , OBS = 1 and 3 are fine 🙂
data have;
input obs VarA $3-7 VarB $8-80;
datalines;
1 1017 Customer 1017 did not complete
2 1017 Customer did not complete
3 1017 Customer 1111 did not complete
;
run;
data want;
set have;
length flag $1;
flag= ifc(find(varb,trim(vara)),'Y','N');
run;
Below assuming that your additional case can be identified by a string in VarB with not a single digit. If that's not suitable for your data then you need to tell us (and provide matching sample data) what rule could be used to identify such a case.
data have;
input obs VarA $3-7 VarB $8-80;
datalines;
1 1017 Customer 1017 did not complete
2 1017 Customer did not complete
3 1017 Customer 1111 did not complete
;
data want;
set have;
length flag $1;
if anydigit(varb)=0 then flag='Y';
else if find(varb,strip(vara)) then flag='Y';
else flag='N';
run;
If VarB contains date/time other then VarA data then i would be missed. anydigit wont work 😞
data have;
input obs VarA $3-7 VarB $8-80;
datalines;
1 1017 Customer 1017 did not complete
2 1017 Customer did not complete on 22-Sep-2021
3 1017 Customer 1111 did not complete
;
Changing "find" to "findw" should do the job.
Remove anydigit, just:
data want;
set have;
length flag $1;
flag = ifc(findw(varb, strip(vara)), 'Y', 'N');
run;
@pdhokriya There won't be logic that will work for any data. There might be logic that can work for your data.
In order to create such logic you need to know your data and you need to be very precise in formulating logical and exhaustive rules.
If you want working code from us then you need to share these precise rules AND you must provide representative sample data - representative meaning that the data needs to cover all your cases.
Below code will come a bit closer to what you want. It still doesn't handle the case in obs 5.
If you haven't got a list of all customers then below also wouldn't work. ...but may-be your real customer ids look very different and you could define a pattern how such a customer id's need to look like (example: always 3 characters followed by 10 digits). This would then allow to check in your string if there is such a pattern.
data have;
input obs VarA $3-7 VarB $8-80;
datalines;
1 1017 Customer 1017 did not complete
2 1017 Customer did not complete
3 1017 Customer 1111 did not complete
4 2021 Customer 2021 did not complete
5 2021 Customer did not complete on 22 September 2021
6 2021 Customer 1111 did not complete
;
data all_customers;
input customer $1-4;
datalines;
1111
1017
2021
;
data want;
if _n_=1 then
do;
if 0 then set all_customers;
dcl hash h1(dataset:'all_customers');
dcl hiter hh1('h1');
h1.defineKey('customer');
h1.defineData('customer');
h1.defineDone();
end;
set have;
length flag $1;
flag='Y';
if findw(varb,strip(vara)) then flag='Y';
else
do;
_rc = hh1.first();
do while (_rc = 0);
if findw(varb,strip(customer)) then
do;
flag='N';
leave;
end;
_rc = hh1.next();
end;
end;
drop customer _rc;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.