BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

My survey data variables take 0, 1, 2,3 ... values and I have to change them to 1, 2, 3, 4 values (see attached first 10 obs data). It will help eliminate zero values to facilitate further analyses.

 

Error log from below code is:

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant,
              a datetime constant, a missing value, iterator, (.
ERROR 76-322: Syntax error, statement will be ignored.

 

data want; set have;
        array inc {*} _numeric_;
            do i = 1 to dim(inc);
                if inc{i} in (_all_) then inc{i} = _all_+1;
            end;
run;

 I apprceiate any hints.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

What are you trying to do with this line? What do you expect _all_ to refer to? It is all listed variables so that's a really weird line. 

 

 if inc{i} in (_all_) then inc{i} = _all_+1;

Wouldn't it just be:

 

inc(i) = inc(i)+1;

View solution in original post

17 REPLIES 17
Reeza
Super User

What are you trying to do with this line? What do you expect _all_ to refer to? It is all listed variables so that's a really weird line. 

 

 if inc{i} in (_all_) then inc{i} = _all_+1;

Wouldn't it just be:

 

inc(i) = inc(i)+1;
PaigeMiller
Diamond | Level 26

My survey data variables take 0, 1, 2,3 ... values and I have to change them to 1, 2, 3, 4 values (see attached first 10 obs data). It will help eliminate zero values to facilitate further analyses.

 

It is hard to imagine that adding a constant will facilitate further analysis, what problems could zeros cause? You haven't changed the distribution of the data other than by shifting it one unit to the right.

--
Paige Miller
Cruise
Ammonite | Level 13

@PaigeMiller

I'm dealing with survey data coded inconsistently. Shifting values by one unit to the right helped 7 out of 8 items included zero in the response choices. Now all zeros in the data is 1 so that those are not chopped off when following program is run. However, now my problem is that, one of items doesn't take 0 but starts from 1. After shifting, its starting value becomes 2.  I was wondering if it makes sense to shift 7 items to the right except that one item with no zero value anyway? Do you have any suggestions? thoughts?

 

 

ARRAY PFI(10) PF01-PF10;

DO I = 1 TO 10;
   IF PFI(I) < 1 OR PFI(I) > 3 THEN PFI(I) = .;
END;

PFNUM = N(OF PF01-PF10);

PFMEAN = MEAN(OF PF01-PF10);

DO I = 1 TO 10;
  IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;

IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);
PF = ((RAWPF - 10)/(30-10)) * 100;

 

PaigeMiller
Diamond | Level 26

Honestly, I still don't see how zero is a problem or why you are doing this. It could be (but you haven't stated this in any way) that zero indicates a response of "not applicable" or "no answer", in which case you would want to eliminate the zeros and turn them into missings. In that case, and maybe in other cases I haven't thought about, adding one to the data still isn't the solution, this step

 

DO I = 1 TO 10;
   IF PFI(I) < 1 OR PFI(I) > 3 THEN PFI(I) = .;
END;

is still unnecessary (although it works). Instead you could do things in a much more simple way. Don't add the ones to all of the variables; and then

 

 

if pfi(I)=0 then pfi(I)=.;

I don't have a problem with shifting 7 variables to the right and not shifting the other one; as stated I do have a problem with the entire concept of shifting any variables to the right, I just don't see the need for it. Your computation of PF at the end needs to be adjusted, but that's all.

 

 

Also,

PFMEAN = MEAN(OF PF01-PF10);

DO I = 1 TO 10;
  IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;

typically assigning the mean to the missing values is a form of "imputation", but usually you assign the mean of the column, and not the mean of the row to replace the missing.

 

--
Paige Miller
Cruise
Ammonite | Level 13

@PaigeMiller

Very interesting.

Zeros indicate "no, my physical activity is not limited" so is not missing. Assigning the column mean (mean(PF:)) would reflect in all study subjects' average versus imputing one's own responses' MEAN(OF PF01-PF10)?  Otherwise, average of same item(question) mean (PF01) instead taking average of different questions mean(PF01-PF10) makes sense to me. 

 

PaigeMiller
Diamond | Level 26

Yes, that's what the column mean would do, it's a standard type of way to replace a missing ... I have never seen replacing the missing with the row mean.

--
Paige Miller
Cruise
Ammonite | Level 13

@PaigeMiller

 

Below is what I would do. Any objections? Here I am scoring data to 0-100 scale. So then the next step is to weight them.

 

DATA WANT; set HAVE;
ARRAY PFI(10) PF01-PF10;
DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;
PFNUM = N(OF PF01-PF10);
IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);
PF = ((RAWPF - 10)/(30-10)) * 100;
LABEL  PF = 'PF SCORING (0-100)'
       RAWPF = 'PF ITEM SUMMARY';
RUN;

 

Reeza
Super User

@SUNY_Maggie wrote:

@PaigeMiller

Very interesting.

Zeros indicate "no, my physical activity is not limited" so is not missing. Assigning the column mean (mean(PF:)) would reflect in all study subjects' average versus imputing one's own responses' MEAN(OF PF01-PF10)?  Otherwise, average of same item(question) mean (PF01) instead taking average of different questions mean(PF01-PF10) makes sense to me. 

 


This is incorrect and shows a misunderstanding of how mean() function works. You should test this thoroughly. 

 

I meant 'hostorical' as in previous code used in history, ie code used to analyze results last year. A similar rationale can be applied to using a macro. It's easier to modify the data to fit a macro than to rewrite the macro. 

Reeza
Super User

Depends on why you have 0 in the first place and what you're trying to do overall. Why is that one question missing zero? Context is more important here, this isn't a technical problem, it's a logic issue.

 

I work with survey data regularly and don't see how this would help anything further on, except possibly align your values to match historical coding so you can use historical code. 

 

The question you haven't answered - what's wrong with zeroes in the first place.

 

 

Cruise
Ammonite | Level 13
Hi Reeza, missing imputation and scoring program was provided to me by the author of the methodology. And i was trying to fit my data to the program provided. Now I'm willing to modify the program to the data, not other way around.
Reeza
Super User

@SUNY_Maggie wrote:
Hi Reeza, missing imputation and scoring program was provided to me by the author of the methodology. And i was trying to fit my data to the program provided. Now I'm willing to modify the program to the data, not other way around.

Modifying your data to fit a standard program is a great reason to shift your data. You just need to explain things sometimes because otherwise it doesn't make sense to us, in terms of best practice. You can also just ignore us 😉

rickpaulos
Obsidian | Level 7

"What's wrong with Zeros in the first place?"

 

Well it's not a problem in SAS but in many programming languages, numeric values are by default, Zero.  So you can't tell the difference between missing and zero.

 

Got 10 questions and 9 need shifting?

 

q1=q1+1;

q2=q2+1;

q3=q3+1;

q4=q4+1;

*(skip the one you don't want to shift);

q6=q6+1;

q7=q7+1;

q8=q8+1;

q9=q9+1;

q10=q10+1;

 

keeping it simple is the best way to go.  Try searching for a bug involving a particular variable that doesn't appear in your code because you used some programming short cut unique to some language.

 

 

Cruise
Ammonite | Level 13

@Reeza

Reeza Thanks a lot.

Trusted Advosor's suggestion made the standard program fit to my data better. Below code is (after suggestion applied) to score the questionnaire responses using 0-100 scale. I need someone to reassure with me on the program to be sure. Would you please look at it especially beyong /*Attention*/ mark? It makes sense to me, however, resulting data has many negative values (see attached csv).

 

 

DATA WANT; set HAVE;
ARRAY PFI(10) PF01-PF10;
DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;
/*Attention*/ PFNUM = N(OF PF01-PF10); IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10); PF = ((RAWPF - 10)/(30-10)) * 100; LABEL PF = 'PF SCORING (0-100)' RAWPF = 'PF ITEM SUMMARY'; RUN;

 

Reeza
Super User
DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;

 

That portion doesn't make sense. Mean(of pf(I)) is just going to be that number, the mean of a single value is the value.

Then the reassignment isn't going to work. Also, replacing missing with mean is a naive way to impute missing values and is probably one of the worst. Don't do it. Look at other imputation methods.

 

You may be looking to do something like the following but to be honest, working off code without knowing the actual intention is guesswork. It's better to evaluate the logic process and see if the code matches that.

 

 

 

 

pfmean = mean (of pf(*));

do i=1 to 10; *You shouldn't do this but if you do this is the way to do it;
if pf(i)=. then pf(i)=pfmean;
end;

 

This is also concerning because it calculates the sum of a variable number of values and then treats it the same, wether there's 6 or 10 values. Without understanding context this seems dangerous. 

IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 5860 views
  • 4 likes
  • 5 in conversation