BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
evandwong
Calcite | Level 5

Hello, I'm a new SAS user and I'm having a bit of difficulty with manipulating data using the data step.  I'm working with almost entirely numeric data, and I frequently want to define a new variable as a function of both other variables and other observations.

For example, I'm currently working with data that was collected over a number of years, up to 2010.  I have three relevant variables: YEAR and RATIO.  To create ADJRATIO, I want to follow this procedure:

For the ith observation AGJRATIO_i:

If YEAR_i=2010, then ADJRATIO_i=RATIO_i.  Else:

Find the value for RATIO that (1) was observed in 2010, (2) is less than RATIO_i, and (3) is maximal, given (1) and (2).  More mathematically:

Find k such that YEAR_k=2010, RATIO_i > RATIO_k, and RATIO_k >= RATIO_j, for all j such that YEAR_j=2010 as well.  Then ADJRATIO_i = RATIO_k

Coming from a background in MatLab and R, my strong instrinct is that this should  be very easy to code.  Instead, I'm finding it fiendishly difficult, and both asking friends more familiar with SAS and searching online have given me the impression that this is a fundamental difficulty in the language, rather than something I missed in the help index.  So far, the best solution I've found is to make several datasets which each keep only one variable, transpose them, concatenate horizontally, and use arrays to do the arithmetic I'm aiming for.  This will get *ridiculously* messy when I need to refer to more than one variable, as well as more than one observation, and I'm hoping there's a better solution.

Does anyone have any thoughts that might be helpful to someone who's used to datasets being treated as MxN arrays, and being able to easily refer to (Observation_m, Variable_n)?  Many thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

There are a number of ways to accomplish what you want.  Here is one:

/*

  First, use proc sql to create two macro variables that contain your POVRATIO

  andPERPOVRATIO variables, for 2010, in descending order of the POVRATIO

  values

*/

proc sql noprint;

  select POVRATIO,PERPOVRATIO

    into :povratio separated by "~",

         :perpovratio separated by "~"

        from input

            where inty =2010

          order by povratio descending

  ;

quit;

/*

  Then, in a datastep, set the x and y values for 2010, otherwise compare the

  POVRATIO values with the 2010 values until one is found that is less than

  or equal to the current POVRATIO.  When found, set the x and y values

   accordingly

*/

data want (drop=i);

  set input;

  if inty eq 2010 then do;

    x=povratio;

    y=perpovratio;

  end;

  else do;

    i=1;

    do while (scan("&povratio.",i,"~") ne "");

      if input(scan("&povratio.",i,"~"),best12.) le povratio then do;

        x=input(scan("&povratio.",i,"~"),best12.);

        y=input(scan("&perpovratio.",i,"~"),best12.);

        leave;

      end;

      i+1;

    end;

  end;

run;

View solution in original post

20 REPLIES 20
art297
Opal | Level 21

Can you provide a very simple example, in the form of two datasteps, one that shows an example of the data you have, and another that shows an example of the file you want to produce from that data?

That will make it easier for others to show you how to accomplish the task in SAS.

evandwong
Calcite | Level 5

art297 wrote:

Can you provide a very simple example, in the form of two datasteps, one that shows an example of the data you have, and another that shows an example of the file you want to produce from that data?

Of course; I'm sorry, I should've thought of that.  I've attached code to the original post, and here it is as text:

*--------------------------------------------------------------------;

data work.input;

input INTY POVRATIO PERPOVRATIO;

datalines;

1990    4    10

1991    5    20

1992    2    30

1993    5    40

1994    6    50

1994    8    55

1995    3    60

1996    8    70

1997    7    80

1998    5    90

1999    8    95

2010    4.9    110

2010    1.6    120

2010    2.1    130

2010    7.9    140

2010    4.6    150

run;

data work.output;

input INTY POVRATIO PERPOVRATIO x y;

datalines;

1990    4    10    2.1    130

1991    5    20    4.9    110

1992    2    30    1.6    120

1993    5    40    4.9    110

1994    6    50    4.9    110

1994    8    55    7.9    140

1995    3    60    2.1    130

1996    8    70    7.9    140

1997    7    80    4.9    110

1998    5    90    4.9    110

1999    8    95    7.9    140

2010    4.9    110    4.9    110

2010    1.6    120    1.6    120

2010    2.1    130    2.1    130

2010    7.9    140    7.9    140

2010    4.6    150    4.6    150

run;

*--------------------------------------------------------------------;

The idea is that for 2010, x is just POVRATIO.  For previous years, x takes the largest 2010 observation for POVRATIO less than that observation's POVRATIO.  So for the two observations in 1994 with POVRATIO 6 and 8, we assign x to be 4.9 and 7.9 respectively, because 4.9 is the largest 2010 value of POVRATIO less than 6, and 7.9 is the largest 2010 value less than 8.

As a follow up, I'm also trying to use the 2010 assignment to define another new variable, y.  I'd like to assign to y the PERPOVRATIO value of the same observation as was found for x.  That is, for any pre-2010 observation that had 4.9 assigned to x, I'd assign 110 to y.  If 1.6 was assigned to x, then 120 would be assigned to y, and so on.

I've worked a bit previously with lag() and retain, but it doesn't seem that either would be useful in this situation (assuming it's not just that my SASfu weak, which is a distinct possibility).

art297
Opal | Level 21

There are a number of ways to accomplish what you want.  Here is one:

/*

  First, use proc sql to create two macro variables that contain your POVRATIO

  andPERPOVRATIO variables, for 2010, in descending order of the POVRATIO

  values

*/

proc sql noprint;

  select POVRATIO,PERPOVRATIO

    into :povratio separated by "~",

         :perpovratio separated by "~"

        from input

            where inty =2010

          order by povratio descending

  ;

quit;

/*

  Then, in a datastep, set the x and y values for 2010, otherwise compare the

  POVRATIO values with the 2010 values until one is found that is less than

  or equal to the current POVRATIO.  When found, set the x and y values

   accordingly

*/

data want (drop=i);

  set input;

  if inty eq 2010 then do;

    x=povratio;

    y=perpovratio;

  end;

  else do;

    i=1;

    do while (scan("&povratio.",i,"~") ne "");

      if input(scan("&povratio.",i,"~"),best12.) le povratio then do;

        x=input(scan("&povratio.",i,"~"),best12.);

        y=input(scan("&perpovratio.",i,"~"),best12.);

        leave;

      end;

      i+1;

    end;

  end;

run;

evandwong
Calcite | Level 5

Hmm, fascinating.  I'll need to do some reading on proc SQL and macro (variables), but I see the structure of what you're doing.

Thanks a lot, this is exactly what I wanted to do but couldn't grok.

Linlin
Lapis Lazuli | Level 10

Hi Art,

I thought when work with macro variables we have to use "%scan", can you explain why "scan" also works?

Thank you!

art297
Opal | Level 21

You are confusing the SAS macro language and macro functions with macro variables.  Using %scan, in this case in open code, will treat i as a character and inform you that it is expecting a numeric variable.

The macro variable, in this case, is simply text substitution.

However, I should point out that I am a psychologist, NOT a programmer.  Possibly, one of our programmers can provide you with a better definition of the differences.

Linlin
Lapis Lazuli | Level 10

Thank you Art for your clarification. I understand now.

Ksharp
Super User
data work.input;
input INTY POVRATIO PERPOVRATIO;
datalines;
1990    4    10
1991    5    20
1992    2    30
1993    5    40
1994    6    50
1994    8    55
1995    3    60
1996    8    70
1997    7    80
1998    5    90
1999    8    95
2010    4.9    110
2010    1.6    120
2010    2.1    130
2010    7.9    140
2010    4.6    150
;
run;
data temp ;
 set input(rename=(INTY=_INTY POVRATIO=_POVRATIO PERPOVRATIO=_PERPOVRATIO) where=(_INTY=2010));
run;
data want(drop=_: dif);
 set input;
 if inty ne 2010 then do;
  dif=9999;
  do i=1 to _nobs;
  set temp nobs=_nobs point=i;
  if dif gt (POVRATIO-_POVRATIO) and POVRATIO gt _POVRATIO then do;
                              dif=POVRATIO-_POVRATIO;
                              x=_POVRATIO;y=_PERPOVRATIO;
                            end;
  end;
  end;
  else do; 
        x=POVRATIO;y=PERPOVRATIO; 
       end;
run;


Ksharp

Linlin
Lapis Lazuli | Level 10

I combined Art and ksharp's methods:

data work.input;

input INTY POVRATIO PERPOVRATIO;

datalines;

1990    4    10

1991    5    20

1992    2    30

1993    5    40

1994    6    50

1994    8    55

1995    3    60

1996    8    70

1997    7    80

1998    5    90

1999    8    95

2010    4.9    110

2010    1.6    120

2010    2.1    130

2010    7.9    140

2010    4.6    150

;

run;

proc sql;

create table temp as

       select INTY as _INTY,

              POVRATIO as _POVRATIO,

              PERPOVRATIO as _PERPOVRATIO

             from input

where _INTY=2010

order by 2 desc;

quit;

data want(drop=_: );

set input;

if inty ne 2010 then do;

do i=1 to _obs;

set temp nobs=_obs point=i;

if povratio>_povratio then do;

    x=_povratio;

    y=_perpovratio;

                    leave;

     end;

          end;

   end;

else do;

   x=povratio;

   y=perpovratio;

end;

run;

Reeza
Super User

Things you can look up that might help, lag function (be careful with this), retain statement and IML (if you have it licensed).

Otherwise, sometimes for things that refer to previous steps or observations (or futures ones) you may need to have multiple datasteps or proc sql.

As Art stated, you need to provide the two sample datasets (not datasteps I think), what you have and what you want.

art297
Opal | Level 21

Fareeza,  FWIW: The reason I always suggest that people provide the forum with "an example in the form of a datastep" is so that those who respond don't have to take the time to create a sample dataset.

Reeza
Super User

Ah...I think I know what you mean now.  Thanks for the clarification.

Linlin
Lapis Lazuli | Level 10

Hi Art,

What does "FWIW" mean? English is not my first language.

Thank you!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1111 views
  • 9 likes
  • 7 in conversation