BookmarkSubscribeRSS Feed
EC27556
Quartz | Level 8

Hi,

 

I have a range of variable X0-X100. For simplicity, lets say x is a variable that represents income and the adjacent number corresponds to the income in that week. I.e. X0 is the income in week 0 and X1 is the income in week 1 etc.

 

I would like to see when the income changes to a non missing number and then return the week number associated with this variable.

 

For example, if someone had a change in income in week4 then X1-X3 would be the same value but X4 would be different and a positive integer. I would then like a variable, say Y, that would return the value of 4. Similarly if X1-X2 were the same value but X3=. I would still like the variable Y to return the value 4.

 

 

The only solution I can think of is this:

I could get rid of the missing values by creating a new range of variables X2_0 to X2_100 which were the same as X1-X100 other than the missing values were replaced by the value for X0 and then I could say:

 

if X2_0 ne X2_1 then Y=1;
else if X2_0=X2_1 ne X2_2 then Y=2;
else if X2_0=X2_1=X2_2 ne X2_3 then Y=3;
….. etc

but this would be very arduous for 100 variables!

 

does anyone have any suggestions for some much more simple code?

 

Thanks 🙂

4 REPLIES 4
PhilC
Rhodochrosite | Level 12

You need to use arrays!  SAS Help Center: ARRAY Statement

 

something like this:

data test ;
   array a{0:100} A0-A100;

   A0=1;
   A1=1;
   A2=1;

   do Y=1 by 1to 100 until (a{Y-1} ne a{Y});
   end;

   put "NOTE: " y=;
run;
ballardw
Super User

You really should supply some example data of what you have and what the result for the example should look like.

Some additional points consider:

With "income changes to a non missing number" what if the first (x0) is not missing? It isn't a change so potentially you have a case where NONE of the values are non-missing and there is NO change to report. Does this mean look for a "missing" value in another variable and then start looking for the possible change? What would the results look like for these cases?

Also, what if the values are missing for all of the variables? What would the result be?

Is your "missing" an actual SAS missing value or are you considering a reported value of zero (0) as "missing" for this purpose?

 

When you need to do the same thing with many variables on the same observation then ARRAY processing is usually the basic tool.

One possible approach:

data  example;
  input x0 - x10;
datalines;
. . . 1 2 3 4 5 6 7 8
. 1 2 3 4 5 6 7 8 9 10
;

data want;
   set example;
   array w (*) x0-x10;
   do i= 1 to ( dim(w) - 1);
      if w[i] ne w[i+1] then do;
         y=i;
         leave;
      end;
   end;
run;

Caveat: there is nothing to address the questions I asked above.

The ARRAY statement creates a shortcut to reference a bunch of variables, in this case I call it W and the list of variables afterwards are the value that can be addressed. You use w[<index number value>] to get the specific value. With the way this array is define the first value, index 1, points to X0, 2 points to x1 and so on.

The DO loop using a special function DIM that returns the number of elements, or variables, that the array has.

Because we looking at one value and the "next" one , we have the loop stop with comparing the next to last [i] value so we don't attempt to look at things that aren't there. (SAS will throw errors when that happens).

The test for difference, similar to your proposal compares x0 with x1, x1 with x2, x2 with x3 ....

When they are not equal for the first time the "then do" instructions assign the value to Y, note that offset because of starting with 0 means that Y actually gets the suffix of the second compared variable.

The LEAVE instruction means "leave this do loop now". Which stops the comparisons so we get the first time the values change.

 

I use the square brackets so it is easy for me to know when an array index is meant This is one of the very few places SAS allows the [ ]. Since you could potentially have multiple functions and calculations using the array variable this can be very handy in complex calculations. The index value can be, and often is, the result of some calculation.

w[ max(a,b,c,d) - min(p,d,q)] is legal syntax.

The INDEX VALUE must be an integer within the range defined in the array statement. There are other ways to define an array but in this case it is the number of variables on the array after the (*).

 

Reeza
Super User

Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

 

If you need more help please consider posting some example data. Your actual data may not be able to be shared and is likely too large so making some fake data is totally fine, say X1-X10 with about 20 observations should be enough and show the expected output as well. If you do need help with the code please make sure that any of the situations below that are applicable are also included in your sample data. 

 

Some things that you need to consider, and would be helpful to know:

 

  • Can you assume each line would be independent from other rows?
  • What happens in the case of all values are missing?
  • What happens in the case of all values are the same?
  • If there are multiple changes (stable, increase, stable, decrease) how would you like that scenario handled
  • Is an increase shown the same ways as a decrease?
  • Do you need just the time point or values?

@EC27556 wrote:

Hi,

 

I have a range of variable X0-X100. For simplicity, lets say x is a variable that represents income and the adjacent number corresponds to the income in that week. I.e. X0 is the income in week 0 and X1 is the income in week 1 etc.

 

I would like to see when the income changes to a non missing number and then return the week number associated with this variable.

 

For example, if someone had a change in income in week4 then X1-X3 would be the same value but X4 would be different and a positive integer. I would then like a variable, say Y, that would return the value of 4. Similarly if X1-X2 were the same value but X3=. I would still like the variable Y to return the value 4.

 

 

The only solution I can think of is this:

I could get rid of the missing values by creating a new range of variables X2_0 to X2_100 which were the same as X1-X100 other than the missing values were replaced by the value for X0 and then I could say:

 

if X2_0 ne X2_1 then Y=1;
else if X2_0=X2_1 ne X2_2 then Y=2;
else if X2_0=X2_1=X2_2 ne X2_3 then Y=3;
….. etc

but this would be very arduous for 100 variables!

 

does anyone have any suggestions for some much more simple code?

 

Thanks 🙂


 

Tom
Super User Tom
Super User

It sounds like this is what you want, but without concrete examples to clarify your meaning it is hard to tell.

data want;
  set have;
  array x x1-x6 ;
  do y=2 to dim(x) while((x[y] = .) or (x[y] = x[1])); end;
  if y>dim(x) then y=0;
run;
Obs    id    x1    x2    x3    x4    x5    x6    y

 1      1    10    10    10     9     8    12    4
 2      2    10    10     .    10    12     .    5
 3      3     1     1     1     1     1     1    0

 

PS: You are free to name your variables starting with X0 instead of X1, but both SAS and I prefer to number things starting with 1 instead of zero.  But there are ways to define an array in SAS to use any range of integers for the indexes.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1076 views
  • 3 likes
  • 5 in conversation