Hi everyone,
I am new to SAS and a non-native English speaker so I apologize in advance if my question or sentences doesn't make sens.
I would like to refer to array variables in a sum function inside a do loop so that the value of the argument of the sum function changes at each iteration.
Here is an example of my data set:
ID |
CUMBE1 |
CUMBE2 |
CUMBE3 |
CUMBE4 |
CUMBE5 |
CUMBE6 |
Valid |
1 |
22 |
22 |
10 |
22 |
11 |
22 |
1 |
2 |
0 |
11 |
22 |
22 |
22 |
22 |
1 |
3 |
10 |
11 |
22 |
22 |
11 |
11 |
0 |
Where the variables CUMBE1 to 6 are the test results of 6 successive blood test, valid a variable that I would like to create. 0, 10 or 11 are valid results, 22 stand for missing value.
I would like to eliminate the individuals/observations for which more than one test is missing in between 2 valid tests. So if I have successive “22” at the beginning or at the end it is still valid, what we can’t have is a valid, 2 or more successive invalid and then a valid again, like in the third line of my example.
To do so, I tried to create a variable named “valid” that would equal 1 for the valid observations and 0 for the invalid. I tried several different codes and ways, including using 0/1 value instead of the numeric 0-10-11-22, by using the ARRAY statement.
At first I wanted SAS to count the number of "22" in between valid values, and if >1 valid =0, but I couldn’t find how to do so.
So I tried here to tell SAS that if the sum of the array-variables in-between 2 valid values > 44 (that is at most 11+ 22+ 11 ), then it is not valid.
Here is my last try:
data IBR.validA; set IBR.valid1; validA=1; array cum{6} cumBE1-cumBE6; do i=1 to 6; do n=1 to 5 while (i+n<7); if cum{i}<22 and cum{i+n}<22 and sum(of cum{i}-cum{i+n})>44 then do; validA=0; leave; end; else; end; if validA=0 then do; leave; end; end; run;
And this is the message in the log:
53 do n=1 to 5 while (i+n<7);
54 if cum{i}<22 and cum{i+n}<22 and sum(of cum{i}-cum{i+n})>44 then
-
22
ERROR 22-322: Syntax error, expecting one of the following: ), ','.
I mean sum values from cum{i} to cum{i+n}. But if I don’t put the “of” in the sum function, then SAS operates a subtraction of cum{i} minus {i+n} and gives me only valid value.
How can I do for SAS to read the expression “cum{i} – cum{i+n}” as a SAS variable list expression and not as a subtraction? Or is there an other function I could use to sum/to take into account only a part of the array-variables and not all of them (which "sum(of cum{i})" does) ?
Or, even better, is there maybe a more easiest way of doing this task?
Thanks a lot for any help!
Do not hesitate too tell me if my question is not clear, I’ll try explain it another way
If I understand correctly, see if there is any instance of 2 or more successive 22's between valid (non-22) values, then valid=0. Otherwise valid=1.
Declaring the array CUM is the right way to start. But I suggest looping _i from 2 to 6. If there is any instance of cum{_i-1}^=22, followed by cum{_i}=22 and cum{_i+1}=22 then you are halfway there, because this indicates there is a non-22 to the left of successive 22's.
Then, only when you are halfway, look to the right for a non-22. If you find one then valid=0.
Otherwise valid=1.
data have;
input ID CUMBE1 CUMBE2 CUMBE3 CUMBE4 CUMBE5 CUMBE6 expected;
datalines;
1 22 22 10 22 11 22 1
2 0 11 22 22 22 22 1
3 10 11 22 22 11 11 0
run;
data want (drop=_:);
set have;
array cum {*} cumbe1-cumbe6;
do _i=2 to dim(cum)-1 while (valid^=0);
if cum{_i-1}^=22 and cum{_i}=22 and cum{_i+1}=22 then valid=0.5;
if valid=0.5 and cum{_i+1}^=22 then valid=0;
end;
if valid^=0 then valid=1;
run;
The "trick" here is to loop over interior array elements ("do _i=2 to dim(cum-1);"). Then just use index value _i-1 and _i+1 to look to the left and right.
Here is the code again because it doesn't seem really easy to read the way it appear in my post :
data IBR.validA;
set IBR.valid1;
validA=1;
array cum{6} cumBE1-cumBE6;
do i=1 to 6;
do n=1 to 5 while (i+n<7);
if cum{i}<22 and cum{i+n}<22 and sum(cum{i}-cum{i+n})>44 then
do;
validA=0;
leave;
end;
else;
end;
if validA=0 then
do;
leave;
end;
end;
run;
Variable lists are evaluate when SAS is compiling the code. So there is no way that trying to use a run-time concept like an array reference could be used to generate a variable list.
Just use a DO loop. Or better take advantage of the DO loop you already have.
data IBR.validA;
set IBR.valid1;
validA=1;
array cum{6} cumBE1-cumBE6;
do i=1 to 6;
sum_cum=cum(i);
do n=1 to 5 while (i+n<7);
sum_cum=sum_cum+cum(i+n);
if cum{i}<22 and cum{i+n}<22 and sum_cum>44 then do;
validA=0;
leave;
end;
end;
if validA=0 then do;
leave;
end;
end;
run;
PS Use the Insert SAS code button on the menu bar to paste/edit SAS code in your posts.
Hi Tom,
Sorry for the delay of my answer, I couldn't work on it yesterday. Thank you for your rapid answer and for the tip about the Insert sas code button
Thank you for your proposition, but with the way it is written, sum_cum is a cumulative variable, summing all the cum{i} one iteration after the other until it exceeds 44, which always ends up to happen, and so all my observations are valid=0...
But I see the logic behind, and I'll try to apply it
Thank very much again for answering my question
Have a nice day,
All the best,
Aline
@uyuyuyu wrote:
Hi everyone,
I am new to SAS and a non-native English speaker so I apologize in advance if my question or sentences doesn't make sens.
I would like to refer to array variables in a sum function inside a do loop so that the value of the argument of the sum function changes at each iteration.
Here is an example of my data set:
ID
CUMBE1
CUMBE2
CUMBE3
CUMBE4
CUMBE5
CUMBE6
Valid
1
22
22
10
22
11
22
1
2
0
11
22
22
22
22
1
3
10
11
22
22
11
11
0
Where the variables CUMBE1 to 6 are the test results of 6 successive blood test, valid a variable that I would like to create. 0, 10 or 11 are valid results, 22 stand for missing value.
I would like to eliminate the individuals/observations for which more than one test is missing in between 2 valid tests. So if I have successive “22” at the beginning or at the end it is still valid, what we can’t have is a valid, 2 or more successive invalid and then a valid again, like in the third line of my example.
To do so, I tried to create a variable named “valid” that would equal 1 for the valid observations and 0 for the invalid. I tried several different codes and ways, including using 0/1 value instead of the numeric 0-10-11-22, by using the ARRAY statement.
At first I wanted SAS to count the number of "22" in between valid values, and if >1 valid =0, but I couldn’t find how to do so.
So I tried here to tell SAS that if the sum of the array-variables in-between 2 valid values > 44 (that is at most 11+ 22+ 11 ), then it is not valid.
Here is my last try:
data IBR.validA; set IBR.valid1; validA=1; array cum{6} cumBE1-cumBE6; do i=1 to 6; do n=1 to 5 while (i+n<7); if cum{i}<22 and cum{i+n}<22 and sum(of cum{i}-cum{i+n})>44 then do; validA=0; leave; end; else; end; if validA=0 then do; leave; end; end; run;
And this is the message in the log:
53 do n=1 to 5 while (i+n<7);
54 if cum{i}<22 and cum{i+n}<22 and sum(of cum{i}-cum{i+n})>44 then
-
22
ERROR 22-322: Syntax error, expecting one of the following: ), ','.
I mean sum values from cum{i} to cum{i+n}. But if I don’t put the “of” in the sum function, then SAS operates a subtraction of cum{i} minus {i+n} and gives me only valid value.
How can I do for SAS to read the expression “cum{i} – cum{i+n}” as a SAS variable list expression and not as a subtraction? Or is there an other function I could use to sum/to take into account only a part of the array-variables and not all of them (which "sum(of cum{i})" does) ?
Or, even better, is there maybe a more easiest way of doing this task?
Thanks a lot for any help!
Do not hesitate too tell me if my question is not clear, I’ll try explain it another way
One thing you might want to consider is that SAS has a value of missing available for numeric values, typically indicated by default as a period. Since you are going to have to repeatedly add code to avoid using your value of 22 you perhaps should consider recoding the data once it is in SAS as missing instead of retaining the value of 22. Then the functions available such as Missing, is the current variable missing, or NMISS, how many of the variables have missing values are available. These functions avoid issues like "summing" multiple values to compare if they are all missing such as nmiss(cum[I+], cum[I+2]) = 2 to test for 2 missing values. If you have other variables with different codes for "missing then logic comparing a different sum of values becomes less obvious while the nmiss approach stays the same nmiss(abc[I], abc[I+1]) =2. And nmiss(of cum(*))=6 does work to identify all missing.
Because you have a moderately complex start/stop rule I don't see a simple one line test.
Note that you only have to test the "betweenness" part of you rule for I= 1 to 3. If the 4th value is valid you cannot have 2 or more invalid between valid results. If the 4th is valid you have either 2 invalid following, which would be at the "end' so the result, assuming not invalid from a lower index comparsion, so not invalid, or at most 1 invalid between it an a valid.
Hi ballardw,
Thank you for answering me!
Indeed, it seems more handy to work with missing variables, thanks for the advice I haven't been able yet to write a code with the nmiss that works correctly, but at least now it runs, I don't have an error message anymore in the log, so it is already a progress
Thank you again very much for answering me!
Have a nice day,
Aline.
If I understand correctly, see if there is any instance of 2 or more successive 22's between valid (non-22) values, then valid=0. Otherwise valid=1.
Declaring the array CUM is the right way to start. But I suggest looping _i from 2 to 6. If there is any instance of cum{_i-1}^=22, followed by cum{_i}=22 and cum{_i+1}=22 then you are halfway there, because this indicates there is a non-22 to the left of successive 22's.
Then, only when you are halfway, look to the right for a non-22. If you find one then valid=0.
Otherwise valid=1.
data have;
input ID CUMBE1 CUMBE2 CUMBE3 CUMBE4 CUMBE5 CUMBE6 expected;
datalines;
1 22 22 10 22 11 22 1
2 0 11 22 22 22 22 1
3 10 11 22 22 11 11 0
run;
data want (drop=_:);
set have;
array cum {*} cumbe1-cumbe6;
do _i=2 to dim(cum)-1 while (valid^=0);
if cum{_i-1}^=22 and cum{_i}=22 and cum{_i+1}=22 then valid=0.5;
if valid=0.5 and cum{_i+1}^=22 then valid=0;
end;
if valid^=0 then valid=1;
run;
The "trick" here is to loop over interior array elements ("do _i=2 to dim(cum-1);"). Then just use index value _i-1 and _i+1 to look to the left and right.
Haha thank you very very much, it worked perfectly! And it is a simple way to do it
I will remember this logic of looking right and left, I never thought of it before, thanks for the advice
Have a very nice day,
All the best,
Aline.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.