Hello!
Just wanted to see if there is a straight-forward way to do this without having a ton of if-then statements. I have a dataset like this:
Observation 1:
Var1 =S
Var2 = D
Var3 = S
Var4 = <missing value>
Observation 2:
Var1 =<missing value>
Var2 = D
Var3 = D
Var4 = <missing value>
In my dataset there can be every combination of 'S', 'D' or empty values among the four variables. What I need to do, row by row, is check if the populated variables are equal and ignore the variables that are empty .
I started writing code with if-then statements that checks if var1 ne '' and var2 eq '' and var3 eq '' and var4 eq ''....blah blah blah but I'm going to have to check for every possible combination of populated/empty vars and it's getting too long.
Is there an easier way to do this?
Thanks!
You can sort the character variables var1 through var4, by declaring them as part of an array, and using SORTC. Blanks will sort low.
If the first non-blank is identical to the last non-blank (and you have 2 or more non-blanks) then all your non-blanks are the same. Use that info to make your new variable, and re-read the observation to unsort the var1-var4 values:
data want ;
set have;
array v {*} $ var1-var4;
call sortc(of v{*});
do i=1 to 4 until(v{i}^=' '); end; /*Find first non-blank */
n_nonblank=5-i;
if n_nonblank<=1 then all_the_same='N.A.'; /*Not Applicable */
else if v{i}=v{4} then all_the_same='Yes'; /*First non-blank equals last */
else all_the_same='No';
set have; /*Re-read var1-var4 to undo the sort */
run;
You need to show us what the result looks like. Such as show what you want for the cases when variables are equal.
Also, are your values actually single character as shown or did you simplify. This may be an important difference as there is a function COUNTC that works with single characters:
data example; string= 'abdssdeabd'; numa = countc(string,'a'); numb = countc(string,'b'); numd = countc(string,'d'); nums = countc(string,'s'); run;
and it is easy to make a string that doesn't contain spaces to worry about: cats(var1,var2,var3,var4);
Thank you for the reply! I'm sorry my example didn't provide enough information - I've attached an Excel file with a portion of my dataset so you can see exactly what I'm working with.
My task if to check if all values populated in var1-var4 match (ignoring blanks). If populated values match then the Result is 'Reviewer'. If any of them are different then the Result is 'Chair'.
Thanks again!
Are you literally look for those two character values? Or are S and D short had for some longer string? Or perhaps the variables are actually numeric?
And what exactly are trying to detect?
If you want to find the observations where there are DIFFERENT values. Like your first observations then test if both values appear.
different = indexc(cats(of var1-var4),'S') and indexc(cats(of var1-var4),'D');
Or if S and D are really something longer like 'SAM' and "DOUG' then perhaps use WHICHC()?
different = whichc('SAM',of var1-var4) and whichc('DOUG', of var1-var4);
If you want to find cases where S appear more than once you could use WHICHC() for that also. Look for cases where you find a different variable when searching from left than from the right.
multiple_S = whichc('S', of var1-var4) ne 5-whichc('S', of var4-var1);
Do the same thing to detect multiple values of 'D'.
Hi Tom,
Thanks for your reply! My initial example of 'S' and 'D' were just me trying to give a simplified example to see if someone could just help me with how the logic would go. Attached is some actual data and what my result needs to be, if this helps. My task is to check if all populated variables in var1-var4 are equal or not. If they are equal then my result should be 'Reviewer'. If Any differ than my result should be 'Chair'.
Thanks!
PS it is also important I note that var1 may be empty as well
Thanks for your help, this is saving me!
You can sort the character variables var1 through var4, by declaring them as part of an array, and using SORTC. Blanks will sort low.
If the first non-blank is identical to the last non-blank (and you have 2 or more non-blanks) then all your non-blanks are the same. Use that info to make your new variable, and re-read the observation to unsort the var1-var4 values:
data want ;
set have;
array v {*} $ var1-var4;
call sortc(of v{*});
do i=1 to 4 until(v{i}^=' '); end; /*Find first non-blank */
n_nonblank=5-i;
if n_nonblank<=1 then all_the_same='N.A.'; /*Not Applicable */
else if v{i}=v{4} then all_the_same='Yes'; /*First non-blank equals last */
else all_the_same='No';
set have; /*Re-read var1-var4 to undo the sort */
run;
@mkeintz This worked perfectly, THANK YOU, I appreciate your help and I learned something today!!!
Below another coding option that doesn't use call sortc() and though won't change the values in your source variables.
data have;
infile datalines truncover dsd;
input (var1 var2 var3 var4 RESULT) (:$50.);
datalines;
SD - 03-AUG-2018,SD - 03-AUG-2018,SD - 03-AUG-2018,,Reviewer
NE - 07-SEP-2018,,NE - 07-SEP-2018,,Reviewer
SD - 08-AUG-2019,SD - 08-AUG-2019,SD - 08-AUG-2019,,Reviewer
CR - 12-SEP-2019,SD - 12-SEP-2019,SD - 12-SEP-2019,,Chair
,,,,
,CR - 12-SEP-2019,,NE - 07-SEP-2018,Chair
,NE - 07-SEP-2018,,NE - 07-SEP-2018,Reviewer
,,NE - 07-SEP-2018,,Reviewer,,
CR - 12-SEP-2019,,,NE - 07-SEP-2018,Chair
;
data want(drop=_:);
set have;
length result_derived $8 _first_nonblank $50;
array _a_vars {*} var1 - var4;
_first_nonblank=coalescec(of _a_vars[*]);
if not missing(_first_nonblank) then
do;
result_derived='Reviewer';
do _i=2 to dim(_a_vars);
if not missing(_a_vars[_i]) and _first_nonblank ne _a_vars[_i] then
do;
result_derived='Chair';
leave;
end;
end;
end;
run;
proc print data=want;
run;
Or below a variant of above logic that avoids comparing a value with itself in case the first array element is missing.
data want(drop=_:);
set have;
length result_derived $8;
array _a_vars {*} var1 - var4;
do _i=2 to dim(_a_vars);
if missing(_k) and not missing(_a_vars[_i-1]) then _k=_i-1;
else
if not missing(_k) then
if _a_vars[_k] ne _a_vars[_i] then
do;
result_derived='Chair';
leave;
end;
end;
if missing(result_derived) and not missing(_k) then result_derived='Reviewer';
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.