BookmarkSubscribeRSS Feed
pdhokriya
Pyrite | Level 9

 

data have;
input obs VarA $2. VarB $8.;
datalines;
1 XX PPXX
2 YY MMY
3 ZZ XX
4 AA AA
5 BB XX BB
;
run;

need to create new Flag varaible which will contain Y when obs = 1, 4 and 5 and N when obs = 2 and 3

12 REPLIES 12
ballardw
Super User

You need to show what the expected output would look like. You Flag and N are not well defined as to content, variable type.

pdhokriya
Pyrite | Level 9
There would be new variable Flag, where i need to show data Y/N
ballardw
Super User

@pdhokriya wrote:
There would be new variable Flag, where i need to show data Y/N

The reason I ask for an example of your output is because your stated requirement is check "one variable against another variable in same dataset resepctive (sic.) of its row". That means your one variable can have multiple comparisons and multiple matches on different rows.

Your example data would find a match for XX on row 5. You would also find a match for BB on row 5. That,s two matches. And why would row 4 have an 'N'? AA matches AA so that would be a 'Y' (irrespective of row remember). So there is at least one rule missing, and I suspect more not stated yet.

 

So if you have a single "flag" of Y what does that mean? How about when you have 3 matches, or 4??

The larger your data set becomes with this sort of comparison the more likely you are to have multiple matches and a single flag may very well be insufficient for many uses.

mkeintz
PROC Star

You can use the FIND function to return the character position of VARA as a substring of VARB.  If the substring is not found, then FIND returns a zero:

 

data have;
input obs VarA $2. VarB $8.;
datalines;
1 XX PPXX
2 YY MMY
3 ZZ XX
4 AA AA
5 BB XX BB
;
run;

data want;
  set have;
  charnum=find(varb,trim(vara));
  if charnum>0 then flag='Y';
  else flag='N';
run;

I use "trim(vara)" as the second argument, to avoid false negatives if it has trailing (or leading) spaces.

 

You might want to improve on this by embedding the FIND function as a conditional expression inside an IFC function:

 

data want;
  set have;
  length flag $1;
  flag= ifc(find(varb,trim(vara)),'Y','N');
run;

The IFC returns a character value.  Either "Y" if the condition find(varb,trim(vara)) is not a zero or missing, or an "N" otherwise.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
pdhokriya
Pyrite | Level 9

Thank you for your reply.

 

Its almost done but if i consider new senerio (mentioned below) here I will get N instead of Y when obs = 2, My addition requirement is when we dont find such data then it should flag as Y , OBS = 1 and 3 are fine 🙂

 

data have;
input obs VarA $3-7 VarB $8-80;
datalines;
1 1017 Customer 1017 did not complete
2 1017 Customer did not complete
3 1017 Customer 1111 did not complete
;
run;

data want;
set have;
length flag $1;
flag= ifc(find(varb,trim(vara)),'Y','N');
run;

Patrick
Opal | Level 21

Below assuming that your additional case can be identified by a string in VarB with not a single digit. If that's not suitable for your data then you need to tell us (and provide matching sample data) what rule could be used to identify such a case.

data have;
  input obs VarA $3-7 VarB $8-80;
  datalines;
1 1017 Customer 1017 did not complete
2 1017 Customer did not complete
3 1017 Customer 1111 did not complete
;

data want;
  set have;
  length flag $1;
  if anydigit(varb)=0 then flag='Y';
  else if find(varb,strip(vara)) then flag='Y';
  else flag='N';
run;
pdhokriya
Pyrite | Level 9

If VarB contains date/time other then VarA data then i would be missed. anydigit wont work 😞

 


data have;
input obs VarA $3-7 VarB $8-80;
datalines;
1 1017 Customer 1017 did not complete
2 1017 Customer did not complete on 22-Sep-2021
3 1017 Customer 1111 did not complete
;

andreas_lds
Jade | Level 19

Changing "find" to "findw" should do the job.

pdhokriya
Pyrite | Level 9
Yes I tried with findw but obs 2 has N rather than Y
If Vara data not present in varb thats should flag as Y, if mismatches bewtween vara and varb then only N


data have;
input obs VarA $3-7 VarB $8-80;
datalines;
1 1017 Customer 1018 1017 did not complete
2 1017 Customer did not complete on 22-Sep-2021
3 1017 Customer 1111 did not complete
;

data want;
set have;
length flag $1;
if anydigit(varb)=0 then flag='Y';
else if findw(varb,strip(vara)) then flag='Y';
else flag='N';
run;
andreas_lds
Jade | Level 19

Remove anydigit, just:

data want;
   set have;
   length flag $1;
   
   flag = ifc(findw(varb, strip(vara)), 'Y', 'N');
run;
pdhokriya
Pyrite | Level 9
if 2nd obs not find any records then it should also flag as Y, which does not work with this( vara has 1017 but varb doesnt have 1017 so flag should be Y in this case)

data have;
input obs VarA $3-7 VarB $8-80;
datalines;
1 1017 Customer 1018 1017 did not complete
2 1017 Customer did not complete on 22-Sep-2021
3 1017 Customer 1111 did not complete
;

data want; set have; length flag $1; flag = ifc(findw(varb, strip(vara)), 'Y', 'N'); run;
Patrick
Opal | Level 21

@pdhokriya There won't be logic that will work for any data. There might be logic that can work for your data. 

In order to create such logic you need to know your data and you need to be very precise in formulating logical and exhaustive rules.

If you want working code from us then you need to share these precise rules AND you must provide representative sample data - representative meaning that the data needs to cover all your cases.

 

Below code will come a bit closer to what you want. It still doesn't handle the case in obs 5.

If you haven't got a list of all customers then below also wouldn't work. ...but may-be your real customer ids look very different and you could define a pattern how such a customer id's need to look like (example: always 3 characters followed by 10 digits). This would then allow to check in your string if there is such a pattern.

data have;
  input obs VarA $3-7 VarB $8-80;
  datalines;
1 1017 Customer 1017 did not complete
2 1017 Customer did not complete
3 1017 Customer 1111 did not complete
4 2021 Customer 2021 did not complete
5 2021 Customer did not complete on 22 September 2021
6 2021 Customer 1111 did not complete
;
data all_customers;
  input customer $1-4;
  datalines;
1111
1017
2021
;

data want;

  if _n_=1 then
    do;
      if 0 then set all_customers;
      dcl hash h1(dataset:'all_customers');
      dcl hiter hh1('h1');
      h1.defineKey('customer');
      h1.defineData('customer');
      h1.defineDone();
    end;

  set have;
  length flag $1;

  flag='Y';
  if findw(varb,strip(vara)) then flag='Y';
  else
    do;
      _rc = hh1.first();
      do while (_rc = 0);
        if findw(varb,strip(customer)) then
          do;
            flag='N';
            leave;
          end;
        _rc = hh1.next();
      end;
    end;

  drop customer _rc;

run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1342 views
  • 0 likes
  • 5 in conversation