Help with recoding various variables in a single data step

Reply
Occasional Contributor lis
Occasional Contributor
Posts: 7

Help with recoding various variables in a single data step

I would really appreciate if someone can give me some feedback about my code.

var PAQ605R -Do you engage in vigorous physical activity?  1= YES, 2=NO, .=missing.  From this variable I need to recode the "NO" as zero and create a variable WORKVIG.  Then I need to create

WORKVIG = ((PAD615r *2) * (PAQ610r))  using var PAD615r (minutes of vigorous physical activity) and var PAQ610r (# days of vigorous physical activity).  Similar procedure was used to create the variable for vigorous recreational activity.  This time the first variable was PAQ650r (Did you engage in vigorous recreational activity?  1= yes, 2=no, .=missing).

The problem that I have is that I am not sure if I am assigning those with no activity as zero correctly because after running the whole I was expecting to see some people in the NO categories, so people that responded NO.  I am doing something wrong!! PLEASE HELP.

Thank you,

Lis   (I'm new in SAS so please keep it simple).

DATA PAQ1;

    SET PAQ;

    **CREATE 2 VARIABLES (WORKVIG AND RECVIG) FOR VIGORATE PHYSICAL ACTIVITY;

    IF PAQ605r =. then WORKVIG = .;

    IF PAQ605r =2 then WORKVIG = 0;

    ELSE WORKVIG=PAQ605r;

    IF PAD615r =. THEN WORKVIG= .;

    IF PAQ610r =. THEN WORKVIG= .;

    **TIME SPENT IN VIGOROUS ACTIVITY WAS WEIGHTED BY A FACTOR OF 2 IN

    ORDER TO CONVERT TO MODERATE PHYSICAL ACTIVITY EQUIVALENTS;

    WORKVIG = ((PAD615r *2) * (PAQ610r));

*RECREATIONAL VIGOROUS VARIABLE;

    IF PAQ650r = . THEN RECVIG =.;

    IF PAQ650r = 2 THEN RECVIG =0;

    ELSE RECVIG =PAQ650r;

    IF PAD660r= . THEN RECVIG=.;

    IF PAQ655r=. THEN RECVIG=.;

    RECVIG = ((PAD660r *2) * (PAQ655r));

RUN;

Super Contributor
Posts: 644

Re: Help with recoding various variables in a single data step

This code is generally OK and has the advantage that it is readily understood by anyone (no tricky stuff here).

I don't think it quite achieves what you want because you have several different definitions of WORKVIG, and the last will overwrite the others.  

My guess is that you want to combine values for WORKVIG,  something like this (notional code):

    

     if PAQ605r = 2 then ANYWORKVIG = 0 ;

          else ANYWORKVIG = PAQ605r ; /* includes the case where PAQ605r is missing */

     WORKVIG = PAD615r * 2 * PAQ610r ; /* includes the case where PAD615r or PAQ610r is missing */

     WORKVIG = ANYWORKVIG * WORKVIG ; /* is this what you intend? */

The same applies to your calculation of RECVIG

Richard

Super Contributor
Posts: 644

Re: Help with recoding various variables in a single data step

If you want code that is more concise, you can recode the value PAQ605r thus:

     If PAQ605r > 1 then PAQ605r = 0 ;

This code will leave PAQ605r unchanged if its value is one or less (including missing) but otherwise sets the value (when 2) to 0.

You can then use PAQ605r directly without having to create ANYWORKVIG.

Richard

Occasional Contributor lis
Occasional Contributor
Posts: 7

Re: Help with recoding various variables in a single data step

Thank you Richard. Your explanation is helpful and I ran your code and I get the same numbers.  The problem that I have is that from the beginning with the var PAQ605R -Do you engage in vigorous physical activity?  1= YES, 2=NO, .=missing   I need to keep the NO's in the subsequent steps and in the new variable WORKVIG.  I am doing something wrong because I lose those NO's and there is a bunch of people that responded NO to that question.

Any other thoughts?

Thanks so much,

Lisbeth

Super Contributor
Posts: 644

Re: Help with recoding various variables in a single data step

Lisbeth

In a case like this it really helps to show some sample data and the coded results you require.  From what I have seen so far it seems to me you want the WORKVIG result to hold these possible values

  • 0 (zero) if the answer is NO
  • the calculated weighted value of vigorous work if the answer is YES
  • a missing value if some key input value is missing

If I read this right you are asking poor old WORKVIG to do too much work.  You want it to preserve both the non dimensional YES/NO value and the quantity of vigorous work if the answer is YES.  Trust me, while it is possible in some cases to combine results into one column it is almost always better in the long run not to.

Others have raised the question what you want to do if the amount of work can be calculated but an inconsistent reply is coded for the YES/NO.  With good questionnaire design and rational respondents it shouldn't happen but ...

So I would start off by calculating the weighted work:

     WORKVIG = 2 * PAD615r * PAQ610r ; /* will be missing if either input amount is missing */

then adjust PAQ605r to the required values, as I suggested

     If PAQ605r > 1 then PAQ605r = 0 ;

Keep this YES/NO/UMM value as a separate column

You might like to update PAQ605r to YES (1) if the weighted work is non zero

     If WORKVIG > 0 then PAQ605r = 1 ;


If you must combine the values, despite my dire warning

     If PAQ605r = 0 then WORKVIG = 0 ;  /* will replace missing values in the calculation */

Richard



PROC Star
Posts: 7,363

Re: Help with recoding various variables in a single data step

Lisbeth,

You have to explain something about your code, particularly regarding PAQ605r and PAQ650r.

Setting and keeping WORKVIG and RECVIG to 2 if the values of PAQ605r and PAQ650r are 0, respectively, is easy. But, if they aren't 0, you first have code that uses the original values, then immediately dismisses those values.

There must be a reason why you initially set WORKVIG and RECVIG to the values of PAQ605r and PAQ650r but, if there is, your code never uses those values.  Please explain.

Occasional Contributor lis
Occasional Contributor
Posts: 7

Re: Help with recoding various variables in a single data step

Hi Arthur,


I need to create two new variables (workvig and recvig) that multiply the number of minutes of vigorous activity by the number of days per week of vigorous activity for both work and recreation (i.e. workvig = # of minutes of vigorous work * # days of vigorous work, etc).  To do so, I  need the following variables (PAQ605r, PAQ610r, PAD615r, PAQ650r, PAQ655r, PAD660r).  At the same time  vigorous physical activity needs to be adjusted by a factor of 2 to convert these minutes into moderate physical activity equivalents.


Therefore, the first question is whether or not the participant engaged in vigorous physical activity (PAQ605r)?  the answer can be yes=1, 2= no and .=missing.  Those that answers NO should be coded as zero and those that answer yes are the individuals that also answer the questions of minutes and # days of vigorous physical activity ((PAD615r  and PAQ610r).

While doing all the steps I need to keep in the new variable WORKVIG those that engaged in vigorous physical activities and those that initially from variable PAQ605r responded that did NOT engaged in physical activity.

I am doing something wrong but I am not able to figure it out what is the problem.  As I said, I am very new in SAS and I am sure I am missing something.

I would really appreciate your help.

-Lisbeth

Vigorous work activity

PAQ605r

Days vigorous work

PAQ610r

Minutes vigorous-intensity work

PAD615r

Moderate work activity

PAQ620r

Number of days moderate work

PAQ625r

Minutes moderate-intensity work

PAD630r

Vigorous recreational activities

PAQ650r

Days vigorous recreational activities

PAQ655r

Minutes vigorous recreational activities

PAD660r
PROC Star
Posts: 7,363

Re: Help with recoding various variables in a single data step

Then, before anyone tries to show you how to accomplish what you want/need to accomplish, I have a couple of additional questions.

First, you have two sets of fields that conflict with each other, namely PAQ605r, PAQ610r and PAD615r and their counterparts for recreational activities.  If PAQ605r is equal to 2, you want the resulting field to equal 0.

However, you haven't indicated what you want the resulting field to be if the value for that field is missing.  But you have number of days and number of minutes of vigorous work.  If either of those fields contains non-zero values, wouldn't that reflect that they happened regardless of the value of PAQ605r?

Further, I would think that you want your calculation to reflect the number of minutes of vigorous work.  However, "days" would have to be defined.  E.g., if a day equals 8 hours, then it would equal 8*60, and if minutes are on top of that, you may be wanting to achieve days*8*60 plus minutes.

Super User
Posts: 17,837

Re: Help with recoding various variables in a single data step

    IF PAQ605r =. then WORKVIG = .;

    IF PAQ605r =2 then WORKVIG = 0;

    ELSE WORKVIG=PAQ605r;

Those lines are an issue. The first line will look at the missing, and then the if condition stops.

The second if is completely isolated from the first if and says that if PAGQ605r=2 then workvig=0 otherwise workvig=paq605r. This happens when paq605r is missing as well because the if's aren't linked. I think you need to add in another ELSE, before the second IF. You have the same issue in the second set of code as well.

    IF PAQ605r =. then WORKVIG = .;

    ELSE IF PAQ605r =2 then WORKVIG = 0;

    ELSE WORKVIG=PAQ605r;

Occasional Contributor lis
Occasional Contributor
Posts: 7

Re: Help with recoding various variables in a single data step

Thank you for your comments.  I just realized that the issue is that the variables have skip patterns.

If you answer yes to PAQ605r  then you have information on minutes and number of days doing vigorous physical activity (pad615 and paq610) but

if you answer NO then you are directed to question PAQ620 (do you engage in moderate physical activity?).

If you answer yes to paq620 then you have information on minutes and number of days doing moderate physical activity (pad630r and paq25r).

If you answer NO then you go to PAQ635.

I do not know how to work around these skip patterns.  Could anyone can give me an example?

Thank you all for the comments.

Lisbeth

PROC Star
Posts: 7,363

Re: Help with recoding various variables in a single data step

Question is where are you re-routed if you skip 605r or 620r.  Regardless, my suggestion would be to ignore 605r and 620r and, for the relevant days and minutes questions if one or the other isn't missing, calculate number of minutes.  However, (1) that would require deciding how many minutes a day represents and (2) a decision by whoever the decision maker is as to whether they agree with the approach.

If they agree with that approach, then if both days and minutes are either missing or equal to 0, then the variable will get a value of 0 assigned.  Otherwise, it will get a value equal to the total minutes (however defined).

The above would be easy, but I don't want to suggest code until you indicate that the above is what you want to do and you supply the formula that you would actually want to use to calculate minutes.

Super User
Posts: 17,837

Re: Help with recoding various variables in a single data step

Forget the actual SAS code for a minute.

Can you specify the logic using pseudo code?

http://users.csc.calpoly.edu/~jdalbey/SWE/pdl_std.html

Basically it sounds like you need a bunch of if then statements to work through the calculation for total activity.

if paq605r=2 then total_activity= your_formula here;

else if paq620r=2 then total_activity=your formulat here;

else if next condition;

else if none of those conditions match;

Ask a Question
Discussion stats
  • 11 replies
  • 329 views
  • 0 likes
  • 4 in conversation