Solved: Re: Check if multiple variables have same value ignoring variables tha...

biglerc · Posted 10-29-2023 03:40 PM

Hello!

Just wanted to see if there is a straight-forward way to do this without having a ton of if-then statements. I have a dataset like this:

Observation 1:

Var1 =S

Var2 = D

Var3 = S

Var4 = <missing value>

Observation 2:

Var1 =<missing value>

Var2 = D

Var3 = D

Var4 = <missing value>

In my dataset there can be every combination of 'S', 'D' or empty values among the four variables. What I need to do, row by row, is check if the populated variables are equal and ignore the variables that are empty .

I started writing code with if-then statements that checks if var1 ne '' and var2 eq '' and var3 eq '' and var4 eq ''....blah blah blah but I'm going to have to check for every possible combination of populated/empty vars and it's getting too long.

Is there an easier way to do this?

Thanks!

mkeintz · Posted 10-29-2023 05:58 PM

You can sort the character variables var1 through var4, by declaring them as part of an array, and using SORTC. Blanks will sort low.

If the first non-blank is identical to the last non-blank (and you have 2 or more non-blanks) then all your non-blanks are the same. Use that info to make your new variable, and re-read the observation to unsort the var1-var4 values:

data want ;
  set have;

  array v {*} $ var1-var4;
  call sortc(of v{*});
  do i=1 to 4 until(v{i}^=' ');  end;  /*Find first non-blank */
  n_nonblank=5-i;
  if n_nonblank<=1 then all_the_same='N.A.';  /*Not Applicable */
  else if v{i}=v{4} then all_the_same='Yes';  /*First non-blank equals last */
  else all_the_same='No';

  set have;   /*Re-read var1-var4 to undo the sort */
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ballardw · Posted 10-29-2023 04:16 PM

You need to show us what the result looks like. Such as show what you want for the cases when variables are equal.

Also, are your values actually single character as shown or did you simplify. This may be an important difference as there is a function COUNTC that works with single characters:

data example;
   string= 'abdssdeabd';
   numa = countc(string,'a');
   numb = countc(string,'b');
   numd = countc(string,'d');
   nums = countc(string,'s');
run;

and it is easy to make a string that doesn't contain spaces to worry about: cats(var1,var2,var3,var4);

biglerc · Posted 10-29-2023 04:25 PM

Thank you for the reply! I'm sorry my example didn't provide enough information - I've attached an Excel file with a portion of my dataset so you can see exactly what I'm working with.

My task if to check if all values populated in var1-var4 match (ignoring blanks). If populated values match then the Result is 'Reviewer'. If any of them are different then the Result is 'Chair'.

Thanks again!

Tom · Posted 10-29-2023 04:35 PM

Are you literally look for those two character values? Or are S and D short had for some longer string? Or perhaps the variables are actually numeric?

And what exactly are trying to detect?

If you want to find the observations where there are DIFFERENT values. Like your first observations then test if both values appear.

different = indexc(cats(of var1-var4),'S') and indexc(cats(of var1-var4),'D');

Or if S and D are really something longer like 'SAM' and "DOUG' then perhaps use WHICHC()?

different = whichc('SAM',of var1-var4) and whichc('DOUG', of var1-var4);

If you want to find cases where S appear more than once you could use WHICHC() for that also. Look for cases where you find a different variable when searching from left than from the right.

multiple_S = whichc('S', of  var1-var4) ne 5-whichc('S', of var4-var1);

Do the same thing to detect multiple values of 'D'.

biglerc · Posted 10-29-2023 04:45 PM

Hi Tom,

Thanks for your reply! My initial example of 'S' and 'D' were just me trying to give a simplified example to see if someone could just help me with how the logic would go. Attached is some actual data and what my result needs to be, if this helps. My task is to check if all populated variables in var1-var4 are equal or not. If they are equal then my result should be 'Reviewer'. If Any differ than my result should be 'Chair'.

Thanks!

biglerc · Posted 10-29-2023 05:33 PM

PS it is also important I note that var1 may be empty as well

Thanks for your help, this is saving me!

mkeintz · Posted 10-29-2023 05:58 PM

You can sort the character variables var1 through var4, by declaring them as part of an array, and using SORTC. Blanks will sort low.

If the first non-blank is identical to the last non-blank (and you have 2 or more non-blanks) then all your non-blanks are the same. Use that info to make your new variable, and re-read the observation to unsort the var1-var4 values:

data want ;
  set have;

  array v {*} $ var1-var4;
  call sortc(of v{*});
  do i=1 to 4 until(v{i}^=' ');  end;  /*Find first non-blank */
  n_nonblank=5-i;
  if n_nonblank<=1 then all_the_same='N.A.';  /*Not Applicable */
  else if v{i}=v{4} then all_the_same='Yes';  /*First non-blank equals last */
  else all_the_same='No';

  set have;   /*Re-read var1-var4 to undo the sort */
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

biglerc · Posted 10-29-2023 06:21 PM

@mkeintz This worked perfectly, THANK YOU, I appreciate your help and I learned something today!!!

Patrick · Posted 10-29-2023 07:06 PM

Below another coding option that doesn't use call sortc() and though won't change the values in your source variables.

data have;
  infile datalines truncover dsd;
  input (var1 var2 var3 var4 RESULT) (:$50.);
  datalines;
SD - 03-AUG-2018,SD - 03-AUG-2018,SD - 03-AUG-2018,,Reviewer
NE - 07-SEP-2018,,NE - 07-SEP-2018,,Reviewer
SD - 08-AUG-2019,SD - 08-AUG-2019,SD - 08-AUG-2019,,Reviewer
CR - 12-SEP-2019,SD - 12-SEP-2019,SD - 12-SEP-2019,,Chair
,,,,
,CR - 12-SEP-2019,,NE - 07-SEP-2018,Chair
,NE - 07-SEP-2018,,NE - 07-SEP-2018,Reviewer
,,NE - 07-SEP-2018,,Reviewer,,
CR - 12-SEP-2019,,,NE - 07-SEP-2018,Chair
;

data want(drop=_:);
  set have;
  length result_derived $8 _first_nonblank $50;
  array _a_vars {*} var1 - var4;

  _first_nonblank=coalescec(of _a_vars[*]);

  if not missing(_first_nonblank) then
    do;
      result_derived='Reviewer';
      do _i=2 to dim(_a_vars);
        if not missing(_a_vars[_i]) and _first_nonblank ne _a_vars[_i] then
          do;
            result_derived='Chair';
            leave;
          end;
      end;
    end;

run;

proc print data=want;
run;

Or below a variant of above logic that avoids comparing a value with itself in case the first array element is missing.

data want(drop=_:);
  set have;
  length result_derived $8;
  array _a_vars {*} var1 - var4;

  do _i=2 to dim(_a_vars);
    if missing(_k) and not missing(_a_vars[_i-1]) then _k=_i-1;
    else
    if not missing(_k) then 
      if _a_vars[_k] ne _a_vars[_i] then
      do;
        result_derived='Chair';
        leave;
      end;
  end;
  if missing(result_derived) and not missing(_k) then result_derived='Reviewer';
run;

Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Re: Check if multiple variables have same value ignoring variables that are empty

Registration is open

Registration is open

SAS Training: Just a Click Away