DATA Step, Macro, Functions and more

Referring to other observations when creating a new variable.

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Referring to other observations when creating a new variable.

Hello, I'm a new SAS user and I'm having a bit of difficulty with manipulating data using the data step.  I'm working with almost entirely numeric data, and I frequently want to define a new variable as a function of both other variables and other observations.

For example, I'm currently working with data that was collected over a number of years, up to 2010.  I have three relevant variables: YEAR and RATIO.  To create ADJRATIO, I want to follow this procedure:

For the ith observation AGJRATIO_i:

If YEAR_i=2010, then ADJRATIO_i=RATIO_i.  Else:

Find the value for RATIO that (1) was observed in 2010, (2) is less than RATIO_i, and (3) is maximal, given (1) and (2).  More mathematically:

Find k such that YEAR_k=2010, RATIO_i > RATIO_k, and RATIO_k >= RATIO_j, for all j such that YEAR_j=2010 as well.  Then ADJRATIO_i = RATIO_k

Coming from a background in MatLab and R, my strong instrinct is that this should  be very easy to code.  Instead, I'm finding it fiendishly difficult, and both asking friends more familiar with SAS and searching online have given me the impression that this is a fundamental difficulty in the language, rather than something I missed in the help index.  So far, the best solution I've found is to make several datasets which each keep only one variable, transpose them, concatenate horizontally, and use arrays to do the arithmetic I'm aiming for.  This will get *ridiculously* messy when I need to refer to more than one variable, as well as more than one observation, and I'm hoping there's a better solution.

Does anyone have any thoughts that might be helpful to someone who's used to datasets being treated as MxN arrays, and being able to easily refer to (Observation_m, Variable_n)?  Many thanks in advance.

Attachment

Accepted Solutions
Solution
‎12-08-2011 11:03 AM
PROC Star
Posts: 7,366

Referring to other observations when creating a new variable.

There are a number of ways to accomplish what you want.  Here is one:

/*

  First, use proc sql to create two macro variables that contain your POVRATIO

  andPERPOVRATIO variables, for 2010, in descending order of the POVRATIO

  values

*/

proc sql noprint;

  select POVRATIO,PERPOVRATIO

    into Smiley Tongueovratio separated by "~",

         Smiley Tongueerpovratio separated by "~"

        from input

            where inty =2010

          order by povratio descending

  ;

quit;

/*

  Then, in a datastep, set the x and y values for 2010, otherwise compare the

  POVRATIO values with the 2010 values until one is found that is less than

  or equal to the current POVRATIO.  When found, set the x and y values

   accordingly

*/

data want (drop=i);

  set input;

  if inty eq 2010 then do;

    x=povratio;

    y=perpovratio;

  end;

  else do;

    i=1;

    do while (scan("&povratio.",i,"~") ne "");

      if input(scan("&povratio.",i,"~"),best12.) le povratio then do;

        x=input(scan("&povratio.",i,"~"),best12.);

        y=input(scan("&perpovratio.",i,"~"),best12.);

        leave;

      end;

      i+1;

    end;

  end;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,366

Referring to other observations when creating a new variable.

Can you provide a very simple example, in the form of two datasteps, one that shows an example of the data you have, and another that shows an example of the file you want to produce from that data?

That will make it easier for others to show you how to accomplish the task in SAS.

New Contributor
Posts: 3

Referring to other observations when creating a new variable.

art297 wrote:

Can you provide a very simple example, in the form of two datasteps, one that shows an example of the data you have, and another that shows an example of the file you want to produce from that data?

Of course; I'm sorry, I should've thought of that.  I've attached code to the original post, and here it is as text:

*--------------------------------------------------------------------;

data work.input;

input INTY POVRATIO PERPOVRATIO;

datalines;

1990    4    10

1991    5    20

1992    2    30

1993    5    40

1994    6    50

1994    8    55

1995    3    60

1996    8    70

1997    7    80

1998    5    90

1999    8    95

2010    4.9    110

2010    1.6    120

2010    2.1    130

2010    7.9    140

2010    4.6    150

run;

data work.output;

input INTY POVRATIO PERPOVRATIO x y;

datalines;

1990    4    10    2.1    130

1991    5    20    4.9    110

1992    2    30    1.6    120

1993    5    40    4.9    110

1994    6    50    4.9    110

1994    8    55    7.9    140

1995    3    60    2.1    130

1996    8    70    7.9    140

1997    7    80    4.9    110

1998    5    90    4.9    110

1999    8    95    7.9    140

2010    4.9    110    4.9    110

2010    1.6    120    1.6    120

2010    2.1    130    2.1    130

2010    7.9    140    7.9    140

2010    4.6    150    4.6    150

run;

*--------------------------------------------------------------------;

The idea is that for 2010, x is just POVRATIO.  For previous years, x takes the largest 2010 observation for POVRATIO less than that observation's POVRATIO.  So for the two observations in 1994 with POVRATIO 6 and 8, we assign x to be 4.9 and 7.9 respectively, because 4.9 is the largest 2010 value of POVRATIO less than 6, and 7.9 is the largest 2010 value less than 8.

As a follow up, I'm also trying to use the 2010 assignment to define another new variable, y.  I'd like to assign to y the PERPOVRATIO value of the same observation as was found for x.  That is, for any pre-2010 observation that had 4.9 assigned to x, I'd assign 110 to y.  If 1.6 was assigned to x, then 120 would be assigned to y, and so on.

I've worked a bit previously with lag() and retain, but it doesn't seem that either would be useful in this situation (assuming it's not just that my SASfu weak, which is a distinct possibility).

Solution
‎12-08-2011 11:03 AM
PROC Star
Posts: 7,366

Referring to other observations when creating a new variable.

There are a number of ways to accomplish what you want.  Here is one:

/*

  First, use proc sql to create two macro variables that contain your POVRATIO

  andPERPOVRATIO variables, for 2010, in descending order of the POVRATIO

  values

*/

proc sql noprint;

  select POVRATIO,PERPOVRATIO

    into Smiley Tongueovratio separated by "~",

         Smiley Tongueerpovratio separated by "~"

        from input

            where inty =2010

          order by povratio descending

  ;

quit;

/*

  Then, in a datastep, set the x and y values for 2010, otherwise compare the

  POVRATIO values with the 2010 values until one is found that is less than

  or equal to the current POVRATIO.  When found, set the x and y values

   accordingly

*/

data want (drop=i);

  set input;

  if inty eq 2010 then do;

    x=povratio;

    y=perpovratio;

  end;

  else do;

    i=1;

    do while (scan("&povratio.",i,"~") ne "");

      if input(scan("&povratio.",i,"~"),best12.) le povratio then do;

        x=input(scan("&povratio.",i,"~"),best12.);

        y=input(scan("&perpovratio.",i,"~"),best12.);

        leave;

      end;

      i+1;

    end;

  end;

run;

New Contributor
Posts: 3

Referring to other observations when creating a new variable.

Hmm, fascinating.  I'll need to do some reading on proc SQL and macro (variables), but I see the structure of what you're doing.

Thanks a lot, this is exactly what I wanted to do but couldn't grok.

Super Contributor
Posts: 1,636

Referring to other observations when creating a new variable.

Hi Art,

I thought when work with macro variables we have to use "%scan", can you explain why "scan" also works?

Thank you!

PROC Star
Posts: 7,366

Referring to other observations when creating a new variable.

You are confusing the SAS macro language and macro functions with macro variables.  Using %scan, in this case in open code, will treat i as a character and inform you that it is expecting a numeric variable.

The macro variable, in this case, is simply text substitution.

However, I should point out that I am a psychologist, NOT a programmer.  Possibly, one of our programmers can provide you with a better definition of the differences.

Super Contributor
Posts: 1,636

Referring to other observations when creating a new variable.

Thank you Art for your clarification. I understand now.

Super User
Posts: 9,691

Referring to other observations when creating a new variable.

data work.input;
input INTY POVRATIO PERPOVRATIO;
datalines;
1990    4    10
1991    5    20
1992    2    30
1993    5    40
1994    6    50
1994    8    55
1995    3    60
1996    8    70
1997    7    80
1998    5    90
1999    8    95
2010    4.9    110
2010    1.6    120
2010    2.1    130
2010    7.9    140
2010    4.6    150
;
run;
data temp ;
 set input(rename=(INTY=_INTY POVRATIO=_POVRATIO PERPOVRATIO=_PERPOVRATIO) where=(_INTY=2010));
run;
data want(drop=_: dif);
 set input;
 if inty ne 2010 then do;
  dif=9999;
  do i=1 to _nobs;
  set temp nobs=_nobs point=i;
  if dif gt (POVRATIO-_POVRATIO) and POVRATIO gt _POVRATIO then do;
                              dif=POVRATIO-_POVRATIO;
                              x=_POVRATIO;y=_PERPOVRATIO;
                            end;
  end;
  end;
  else do; 
        x=POVRATIO;y=PERPOVRATIO; 
       end;
run;


Ksharp

Super Contributor
Posts: 1,636

Re: Referring to other observations when creating a new variable.

I combined Art and ksharp's methods:

data work.input;

input INTY POVRATIO PERPOVRATIO;

datalines;

1990    4    10

1991    5    20

1992    2    30

1993    5    40

1994    6    50

1994    8    55

1995    3    60

1996    8    70

1997    7    80

1998    5    90

1999    8    95

2010    4.9    110

2010    1.6    120

2010    2.1    130

2010    7.9    140

2010    4.6    150

;

run;

proc sql;

create table temp as

       select INTY as _INTY,

              POVRATIO as _POVRATIO,

              PERPOVRATIO as _PERPOVRATIO

             from input

where _INTY=2010

order by 2 desc;

quit;

data want(drop=_: );

set input;

if inty ne 2010 then do;

do i=1 to _obs;

set temp nobs=_obs point=i;

if povratio>_povratio then do;

    x=_povratio;

    y=_perpovratio;

                    leave;

     end;

          end;

   end;

else do;

   x=povratio;

   y=perpovratio;

end;

run;

Super User
Posts: 17,960

Referring to other observations when creating a new variable.

Things you can look up that might help, lag function (be careful with this), retain statement and IML (if you have it licensed).

Otherwise, sometimes for things that refer to previous steps or observations (or futures ones) you may need to have multiple datasteps or proc sql.

As Art stated, you need to provide the two sample datasets (not datasteps I think), what you have and what you want.

PROC Star
Posts: 7,366

Referring to other observations when creating a new variable.

Fareeza,  FWIW: The reason I always suggest that people provide the forum with "an example in the form of a datastep" is so that those who respond don't have to take the time to create a sample dataset.

Super User
Posts: 17,960

Referring to other observations when creating a new variable.

Ah...I think I know what you mean now.  Thanks for the clarification.

Super Contributor
Posts: 1,636

Referring to other observations when creating a new variable.

Hi Art,

What does "FWIW" mean? English is not my first language.

Thank you!

Trusted Advisor
Posts: 1,300

Referring to other observations when creating a new variable.

For What It's Worth

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 298 views
  • 9 likes
  • 7 in conversation