BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
uyuyuyu
Calcite | Level 5

Hi everyone,

 

I am new to SAS and a non-native English speaker so I apologize in advance if my question or sentences doesn't make sens.

I would like to refer to array variables in a sum function inside a do loop so that the value of the argument of the sum function changes at each iteration.

Here is an example of my data set:

ID

CUMBE1

CUMBE2

CUMBE3

CUMBE4

CUMBE5

CUMBE6

Valid

1

22

22

10

22

11

22

1

2

0

11

22

22

22

22

1

3

10

11

22

22

11

11

0

Where the variables CUMBE1 to 6 are the test results of 6 successive blood test, valid a variable that I would like to create. 0, 10 or 11 are valid results, 22 stand for missing value.

I would like to eliminate the individuals/observations for which more than one test is missing in between 2 valid tests. So if I have successive “22” at the beginning or at the end it is still valid, what we can’t have is a valid, 2 or more successive invalid and then a valid again, like in the third line of my example.

To do so, I tried to create a variable named “valid” that would equal 1 for the valid observations and 0 for the invalid. I tried several different codes and ways, including using 0/1 value instead of the numeric 0-10-11-22, by using the ARRAY statement.

At first I wanted SAS to count the number of "22" in between valid values, and if >1 valid =0, but I couldn’t find how to do so.

So I tried here to tell SAS that if the sum of the array-variables in-between 2 valid values > 44 (that is at most 11+ 22+ 11 ), then it is not valid.

Here is my last try:

 

data IBR.validA;
	set IBR.valid1;
	validA=1;
	array cum{6} cumBE1-cumBE6;

	do i=1 to 6;
		do n=1 to 5 while (i+n<7);
			if cum{i}<22 and cum{i+n}<22 and sum(of cum{i}-cum{i+n})>44 then
				do;
					validA=0;
					leave;
				end;
			else;
		end;

		if validA=0 then
			do;
				leave;
			end;
	end;
run;

 

And this is the message in the log:

 

53           do n=1 to 5 while (i+n<7);

54               if cum{i}<22 and cum{i+n}<22 and sum(of cum{i}-cum{i+n})>44 then

                                                                                                -

                                                                                               22

ERROR 22-322: Syntax error, expecting one of the following: ), ','.

 

I mean sum values from cum{i} to cum{i+n}. But if I don’t put the “of” in the sum function, then SAS operates a subtraction of cum{i} minus {i+n} and gives me only valid value.

 

How can I do for SAS to read the expression “cum{i} – cum{i+n}” as a SAS variable list expression and not as a subtraction? Or is there an other function I could use to sum/to take into account  only a part of the array-variables and not all of them (which "sum(of cum{i})" does) ?

Or, even better, is there maybe a  more easiest way of doing this task? Smiley Tongue

 

Thanks a lot for any help!

 

Do not hesitate too tell me if my question is not clear, I’ll try explain it another way Smiley Happy

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If I understand correctly, see if there is any instance of 2 or more successive 22's between valid (non-22) values, then valid=0.  Otherwise valid=1.

 

Declaring the array CUM is the right way to start.  But I suggest looping _i from 2 to 6.   If there is any instance of cum{_i-1}^=22, followed by cum{_i}=22 and cum{_i+1}=22 then you are halfway there, because this indicates there is a non-22 to the left of successive 22's.

 

Then, only when you are halfway, look to the right for a non-22.  If you find one then valid=0.

 

Otherwise valid=1.

 

data have;
  input ID CUMBE1 CUMBE2 CUMBE3 CUMBE4 CUMBE5 CUMBE6 expected;
datalines;
1 22 22 10 22 11 22 1  
2  0 11 22 22 22 22 1
3 10 11 22 22 11 11 0
run;
data want (drop=_:);
  set have;
  array cum {*} cumbe1-cumbe6;

  do _i=2 to dim(cum)-1 while (valid^=0);
    if cum{_i-1}^=22 and cum{_i}=22 and cum{_i+1}=22 then valid=0.5;
    if valid=0.5 and cum{_i+1}^=22 then valid=0;
  end;
  if valid^=0 then valid=1;
run;

 

The "trick" here is to loop over interior array elements ("do _i=2 to dim(cum-1);").  Then just use index value _i-1 and _i+1 to look to the left and right.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
uyuyuyu
Calcite | Level 5

Here is the code again because it doesn't seem really easy to read the way it appear in my post :

 

data IBR.validA;

  set IBR.valid1;

  validA=1;

  array cum{6} cumBE1-cumBE6;

   do i=1 to 6;

    do n=1 to 5 while (i+n<7);

       if cum{i}<22 and cum{i+n}<22 and sum(cum{i}-cum{i+n})>44 then

           do;

            validA=0;

            leave;

           end;

       else;

    end;

   if validA=0 then

     do;

      leave;

     end;

 end;

run;

Tom
Super User Tom
Super User

Variable lists are evaluate when SAS is compiling the code. So there is no way that trying to use a run-time concept like an array reference could be used to generate a variable list.

Just use a DO loop.  Or better take advantage of the DO loop you already have.

data IBR.validA;
  set IBR.valid1;
  validA=1;
  array cum{6} cumBE1-cumBE6;
  do i=1 to 6;
    sum_cum=cum(i);
    do n=1 to 5 while (i+n<7);
      sum_cum=sum_cum+cum(i+n);
      if cum{i}<22 and cum{i+n}<22 and sum_cum>44 then do;
        validA=0;
        leave;
      end;
    end;
    if validA=0 then do;
      leave;
    end;
  end;
run;

 

 

PS Use the Insert SAS code button on the menu bar to paste/edit SAS code in your posts.

 

uyuyuyu
Calcite | Level 5

Hi Tom,

 

Sorry for the delay of my answer, I couldn't work on it yesterday. Thank you for your rapid answer and for the tip about the Insert sas code button Smiley Happy

Thank you for your proposition, but with the way it is written, sum_cum is a cumulative variable, summing all the cum{i} one iteration after the other until it exceeds 44, which always ends up to happen, and so all my observations are valid=0... Smiley Frustrated

But I see the logic behind, and I'll try to apply it Smiley Very Happy

 

Thank very much again for answering my question Smiley Happy

 

Have a nice day,

All the best,

Aline

ballardw
Super User

@uyuyuyu wrote:

Hi everyone,

 

I am new to SAS and a non-native English speaker so I apologize in advance if my question or sentences doesn't make sens.

I would like to refer to array variables in a sum function inside a do loop so that the value of the argument of the sum function changes at each iteration.

Here is an example of my data set:

ID

CUMBE1

CUMBE2

CUMBE3

CUMBE4

CUMBE5

CUMBE6

Valid

1

22

22

10

22

11

22

1

2

0

11

22

22

22

22

1

3

10

11

22

22

11

11

0

Where the variables CUMBE1 to 6 are the test results of 6 successive blood test, valid a variable that I would like to create. 0, 10 or 11 are valid results, 22 stand for missing value.

I would like to eliminate the individuals/observations for which more than one test is missing in between 2 valid tests. So if I have successive “22” at the beginning or at the end it is still valid, what we can’t have is a valid, 2 or more successive invalid and then a valid again, like in the third line of my example.

To do so, I tried to create a variable named “valid” that would equal 1 for the valid observations and 0 for the invalid. I tried several different codes and ways, including using 0/1 value instead of the numeric 0-10-11-22, by using the ARRAY statement.

At first I wanted SAS to count the number of "22" in between valid values, and if >1 valid =0, but I couldn’t find how to do so.

So I tried here to tell SAS that if the sum of the array-variables in-between 2 valid values > 44 (that is at most 11+ 22+ 11 ), then it is not valid.

Here is my last try:

data IBR.validA;
	set IBR.valid1;
	validA=1;
	array cum{6} cumBE1-cumBE6;

	do i=1 to 6;
		do n=1 to 5 while (i+n<7);
			if cum{i}<22 and cum{i+n}<22 and sum(of cum{i}-cum{i+n})>44 then
				do;
					validA=0;
					leave;
				end;
			else;
		end;

		if validA=0 then
			do;
				leave;
			end;
	end;
run;

 

And this is the message in the log:

 

53           do n=1 to 5 while (i+n<7);

54               if cum{i}<22 and cum{i+n}<22 and sum(of cum{i}-cum{i+n})>44 then

                                                                                                -

                                                                                               22

ERROR 22-322: Syntax error, expecting one of the following: ), ','.

 

I mean sum values from cum{i} to cum{i+n}. But if I don’t put the “of” in the sum function, then SAS operates a subtraction of cum{i} minus {i+n} and gives me only valid value.

 

How can I do for SAS to read the expression “cum{i} – cum{i+n}” as a SAS variable list expression and not as a subtraction? Or is there an other function I could use to sum/to take into account  only a part of the array-variables and not all of them (which "sum(of cum{i})" does) ?

Or, even better, is there maybe a  more easiest way of doing this task? Smiley Tongue

 

Thanks a lot for any help!

 

Do not hesitate too tell me if my question is not clear, I’ll try explain it another way Smiley Happy


One thing you might want to consider is that SAS has a value of missing available for numeric values, typically indicated by default as a period. Since you are going to have to repeatedly add code to avoid using your value of 22 you perhaps should consider recoding the data once it is in SAS as missing instead of retaining the value of 22. Then the functions available such as Missing, is the current variable missing, or NMISS, how many of the variables have missing values are available. These functions avoid issues like "summing" multiple values to compare if they are all missing such as nmiss(cum[I+], cum[I+2]) = 2 to test for 2 missing values. If you have other variables with different codes for "missing then logic comparing a different sum of values becomes less obvious while the nmiss approach stays the same nmiss(abc[I], abc[I+1]) =2. And nmiss(of cum(*))=6 does work to identify all missing.

 

Because you have a moderately complex start/stop rule  I don't see a simple one line test.

Note that you only have to test the "betweenness" part of you rule for I= 1 to 3. If the 4th value is valid you cannot have 2 or more invalid between valid results. If the 4th is valid you have either 2 invalid following, which would be at the "end' so the result, assuming not invalid from a lower index comparsion, so not invalid, or at most 1 invalid between it an a valid.

uyuyuyu
Calcite | Level 5

Hi

 

 

 

mkeintz
PROC Star

If I understand correctly, see if there is any instance of 2 or more successive 22's between valid (non-22) values, then valid=0.  Otherwise valid=1.

 

Declaring the array CUM is the right way to start.  But I suggest looping _i from 2 to 6.   If there is any instance of cum{_i-1}^=22, followed by cum{_i}=22 and cum{_i+1}=22 then you are halfway there, because this indicates there is a non-22 to the left of successive 22's.

 

Then, only when you are halfway, look to the right for a non-22.  If you find one then valid=0.

 

Otherwise valid=1.

 

data have;
  input ID CUMBE1 CUMBE2 CUMBE3 CUMBE4 CUMBE5 CUMBE6 expected;
datalines;
1 22 22 10 22 11 22 1  
2  0 11 22 22 22 22 1
3 10 11 22 22 11 11 0
run;
data want (drop=_:);
  set have;
  array cum {*} cumbe1-cumbe6;

  do _i=2 to dim(cum)-1 while (valid^=0);
    if cum{_i-1}^=22 and cum{_i}=22 and cum{_i+1}=22 then valid=0.5;
    if valid=0.5 and cum{_i+1}^=22 then valid=0;
  end;
  if valid^=0 then valid=1;
run;

 

The "trick" here is to loop over interior array elements ("do _i=2 to dim(cum-1);").  Then just use index value _i-1 and _i+1 to look to the left and right.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
uyuyuyu
Calcite | Level 5

Haha thank you very very much, it worked perfectly! And it is a simple way to do it Smiley Very Happy

I will remember this logic of looking right and left, I never thought of it before, thanks for the advice Smiley Happy

 

Have a very nice day,

All the best,

Aline.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 997 views
  • 3 likes
  • 4 in conversation