DATA Step, Macro, Functions and more

Add constant value to all numeric variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 138
Accepted Solution

Add constant value to all numeric variables

[ Edited ]

My survey data variables take 0, 1, 2,3 ... values and I have to change them to 1, 2, 3, 4 values (see attached first 10 obs data). It will help eliminate zero values to facilitate further analyses.

 

Error log from below code is:

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant,
              a datetime constant, a missing value, iterator, (.
ERROR 76-322: Syntax error, statement will be ignored.

 

data want; set have;
        array inc {*} _numeric_;
            do i = 1 to dim(inc);
                if inc{i} in (_all_) then inc{i} = _all_+1;
            end;
run;

 I apprceiate any hints.

Thanks.


Accepted Solutions
Solution
‎05-30-2017 02:21 PM
Super User
Posts: 19,770

Re: Add constant value to all numeric variables

What are you trying to do with this line? What do you expect _all_ to refer to? It is all listed variables so that's a really weird line. 

 

 if inc{i} in (_all_) then inc{i} = _all_+1;

Wouldn't it just be:

 

inc(i) = inc(i)+1;

View solution in original post


All Replies
Solution
‎05-30-2017 02:21 PM
Super User
Posts: 19,770

Re: Add constant value to all numeric variables

What are you trying to do with this line? What do you expect _all_ to refer to? It is all listed variables so that's a really weird line. 

 

 if inc{i} in (_all_) then inc{i} = _all_+1;

Wouldn't it just be:

 

inc(i) = inc(i)+1;
Trusted Advisor
Posts: 1,911

Re: Add constant value to all numeric variables

My survey data variables take 0, 1, 2,3 ... values and I have to change them to 1, 2, 3, 4 values (see attached first 10 obs data). It will help eliminate zero values to facilitate further analyses.

 

It is hard to imagine that adding a constant will facilitate further analysis, what problems could zeros cause? You haven't changed the distribution of the data other than by shifting it one unit to the right.

Frequent Contributor
Posts: 138

Re: Add constant value to all numeric variables

Posted in reply to PaigeMiller

@PaigeMiller

I'm dealing with survey data coded inconsistently. Shifting values by one unit to the right helped 7 out of 8 items included zero in the response choices. Now all zeros in the data is 1 so that those are not chopped off when following program is run. However, now my problem is that, one of items doesn't take 0 but starts from 1. After shifting, its starting value becomes 2.  I was wondering if it makes sense to shift 7 items to the right except that one item with no zero value anyway? Do you have any suggestions? thoughts?

 

 

ARRAY PFI(10) PF01-PF10;

DO I = 1 TO 10;
   IF PFI(I) < 1 OR PFI(I) > 3 THEN PFI(I) = .;
END;

PFNUM = N(OF PF01-PF10);

PFMEAN = MEAN(OF PF01-PF10);

DO I = 1 TO 10;
  IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;

IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);
PF = ((RAWPF - 10)/(30-10)) * 100;

 

Trusted Advisor
Posts: 1,911

Re: Add constant value to all numeric variables

Honestly, I still don't see how zero is a problem or why you are doing this. It could be (but you haven't stated this in any way) that zero indicates a response of "not applicable" or "no answer", in which case you would want to eliminate the zeros and turn them into missings. In that case, and maybe in other cases I haven't thought about, adding one to the data still isn't the solution, this step

 

DO I = 1 TO 10;
   IF PFI(I) < 1 OR PFI(I) > 3 THEN PFI(I) = .;
END;

is still unnecessary (although it works). Instead you could do things in a much more simple way. Don't add the ones to all of the variables; and then

 

 

if pfi(I)=0 then pfi(I)=.;

I don't have a problem with shifting 7 variables to the right and not shifting the other one; as stated I do have a problem with the entire concept of shifting any variables to the right, I just don't see the need for it. Your computation of PF at the end needs to be adjusted, but that's all.

 

 

Also,

PFMEAN = MEAN(OF PF01-PF10);

DO I = 1 TO 10;
  IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;

typically assigning the mean to the missing values is a form of "imputation", but usually you assign the mean of the column, and not the mean of the row to replace the missing.

 

Frequent Contributor
Posts: 138

Re: Add constant value to all numeric variables

Posted in reply to PaigeMiller

@PaigeMiller

Very interesting.

Zeros indicate "no, my physical activity is not limited" so is not missing. Assigning the column mean (mean(PFSmiley Happy) would reflect in all study subjects' average versus imputing one's own responses' MEAN(OF PF01-PF10)?  Otherwise, average of same item(question) mean (PF01) instead taking average of different questions mean(PF01-PF10) makes sense to me. 

 

Trusted Advisor
Posts: 1,911

Re: Add constant value to all numeric variables

Yes, that's what the column mean would do, it's a standard type of way to replace a missing ... I have never seen replacing the missing with the row mean.

Frequent Contributor
Posts: 138

Re: Add constant value to all numeric variables

[ Edited ]
Posted in reply to PaigeMiller

@PaigeMiller

 

Below is what I would do. Any objections? Here I am scoring data to 0-100 scale. So then the next step is to weight them.

 

DATA WANT; set HAVE;
ARRAY PFI(10) PF01-PF10;
DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;
PFNUM = N(OF PF01-PF10);
IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);
PF = ((RAWPF - 10)/(30-10)) * 100;
LABEL  PF = 'PF SCORING (0-100)'
       RAWPF = 'PF ITEM SUMMARY';
RUN;

 

Super User
Posts: 19,770

Re: Add constant value to all numeric variables


SUNY_Maggie wrote:

@PaigeMiller

Very interesting.

Zeros indicate "no, my physical activity is not limited" so is not missing. Assigning the column mean (mean(PFSmiley Happy) would reflect in all study subjects' average versus imputing one's own responses' MEAN(OF PF01-PF10)?  Otherwise, average of same item(question) mean (PF01) instead taking average of different questions mean(PF01-PF10) makes sense to me. 

 


This is incorrect and shows a misunderstanding of how mean() function works. You should test this thoroughly. 

 

I meant 'hostorical' as in previous code used in history, ie code used to analyze results last year. A similar rationale can be applied to using a macro. It's easier to modify the data to fit a macro than to rewrite the macro. 

Super User
Posts: 19,770

Re: Add constant value to all numeric variables

Depends on why you have 0 in the first place and what you're trying to do overall. Why is that one question missing zero? Context is more important here, this isn't a technical problem, it's a logic issue.

 

I work with survey data regularly and don't see how this would help anything further on, except possibly align your values to match historical coding so you can use historical code. 

 

The question you haven't answered - what's wrong with zeroes in the first place.

 

 

Frequent Contributor
Posts: 138

Re: Add constant value to all numeric variables

Hi Reeza, missing imputation and scoring program was provided to me by the author of the methodology. And i was trying to fit my data to the program provided. Now I'm willing to modify the program to the data, not other way around.
Super User
Posts: 19,770

Re: Add constant value to all numeric variables


SUNY_Maggie wrote:
Hi Reeza, missing imputation and scoring program was provided to me by the author of the methodology. And i was trying to fit my data to the program provided. Now I'm willing to modify the program to the data, not other way around.

Modifying your data to fit a standard program is a great reason to shift your data. You just need to explain things sometimes because otherwise it doesn't make sense to us, in terms of best practice. You can also just ignore us Smiley Wink

Contributor
Posts: 25

Re: Add constant value to all numeric variables

"What's wrong with Zeros in the first place?"

 

Well it's not a problem in SAS but in many programming languages, numeric values are by default, Zero.  So you can't tell the difference between missing and zero.

 

Got 10 questions and 9 need shifting?

 

q1=q1+1;

q2=q2+1;

q3=q3+1;

q4=q4+1;

*(skip the one you don't want to shift);

q6=q6+1;

q7=q7+1;

q8=q8+1;

q9=q9+1;

q10=q10+1;

 

keeping it simple is the best way to go.  Try searching for a bug involving a particular variable that doesn't appear in your code because you used some programming short cut unique to some language.

 

 

Frequent Contributor
Posts: 138

Re: Add constant value to all numeric variables

@Reeza

Reeza Thanks a lot.

Trusted Advosor's suggestion made the standard program fit to my data better. Below code is (after suggestion applied) to score the questionnaire responses using 0-100 scale. I need someone to reassure with me on the program to be sure. Would you please look at it especially beyong /*Attention*/ mark? It makes sense to me, however, resulting data has many negative values (see attached csv).

 

 

DATA WANT; set HAVE;
ARRAY PFI(10) PF01-PF10;
DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;
/*Attention*/ PFNUM = N(OF PF01-PF10); IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10); PF = ((RAWPF - 10)/(30-10)) * 100; LABEL PF = 'PF SCORING (0-100)' RAWPF = 'PF ITEM SUMMARY'; RUN;

 

Super User
Posts: 19,770

Re: Add constant value to all numeric variables

DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;

 

That portion doesn't make sense. Mean(of pf(I)) is just going to be that number, the mean of a single value is the value.

Then the reassignment isn't going to work. Also, replacing missing with mean is a naive way to impute missing values and is probably one of the worst. Don't do it. Look at other imputation methods.

 

You may be looking to do something like the following but to be honest, working off code without knowing the actual intention is guesswork. It's better to evaluate the logic process and see if the code matches that.

 

 

 

 

pfmean = mean (of pf(*));

do i=1 to 10; *You shouldn't do this but if you do this is the way to do it;
if pf(i)=. then pf(i)=pfmean;
end;

 

This is also concerning because it calculates the sum of a variable number of values and then treats it the same, wether there's 6 or 10 values. Without understanding context this seems dangerous. 

IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 224 views
  • 4 likes
  • 5 in conversation