My survey data variables take 0, 1, 2,3 ... values and I have to change them to 1, 2, 3, 4 values (see attached first 10 obs data). It will help eliminate zero values to facilitate further analyses.
Error log from below code is:
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant,
a datetime constant, a missing value, iterator, (.
ERROR 76-322: Syntax error, statement will be ignored.
data want; set have;
array inc {*} _numeric_;
do i = 1 to dim(inc);
if inc{i} in (_all_) then inc{i} = _all_+1;
end;
run;
I apprceiate any hints.
Thanks.
What are you trying to do with this line? What do you expect _all_ to refer to? It is all listed variables so that's a really weird line.
if inc{i} in (_all_) then inc{i} = _all_+1;
Wouldn't it just be:
inc(i) = inc(i)+1;
What are you trying to do with this line? What do you expect _all_ to refer to? It is all listed variables so that's a really weird line.
if inc{i} in (_all_) then inc{i} = _all_+1;
Wouldn't it just be:
inc(i) = inc(i)+1;
My survey data variables take 0, 1, 2,3 ... values and I have to change them to 1, 2, 3, 4 values (see attached first 10 obs data). It will help eliminate zero values to facilitate further analyses.
It is hard to imagine that adding a constant will facilitate further analysis, what problems could zeros cause? You haven't changed the distribution of the data other than by shifting it one unit to the right.
I'm dealing with survey data coded inconsistently. Shifting values by one unit to the right helped 7 out of 8 items included zero in the response choices. Now all zeros in the data is 1 so that those are not chopped off when following program is run. However, now my problem is that, one of items doesn't take 0 but starts from 1. After shifting, its starting value becomes 2. I was wondering if it makes sense to shift 7 items to the right except that one item with no zero value anyway? Do you have any suggestions? thoughts?
ARRAY PFI(10) PF01-PF10;
DO I = 1 TO 10;
IF PFI(I) < 1 OR PFI(I) > 3 THEN PFI(I) = .;
END;
PFNUM = N(OF PF01-PF10);
PFMEAN = MEAN(OF PF01-PF10);
DO I = 1 TO 10;
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;
IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);
PF = ((RAWPF - 10)/(30-10)) * 100;
Honestly, I still don't see how zero is a problem or why you are doing this. It could be (but you haven't stated this in any way) that zero indicates a response of "not applicable" or "no answer", in which case you would want to eliminate the zeros and turn them into missings. In that case, and maybe in other cases I haven't thought about, adding one to the data still isn't the solution, this step
DO I = 1 TO 10; IF PFI(I) < 1 OR PFI(I) > 3 THEN PFI(I) = .; END;
is still unnecessary (although it works). Instead you could do things in a much more simple way. Don't add the ones to all of the variables; and then
if pfi(I)=0 then pfi(I)=.;
I don't have a problem with shifting 7 variables to the right and not shifting the other one; as stated I do have a problem with the entire concept of shifting any variables to the right, I just don't see the need for it. Your computation of PF at the end needs to be adjusted, but that's all.
Also,
PFMEAN = MEAN(OF PF01-PF10); DO I = 1 TO 10; IF PFI(I)= . THEN PFI(I) = PFMEAN; END;
typically assigning the mean to the missing values is a form of "imputation", but usually you assign the mean of the column, and not the mean of the row to replace the missing.
Very interesting.
Zeros indicate "no, my physical activity is not limited" so is not missing. Assigning the column mean (mean(PF:)) would reflect in all study subjects' average versus imputing one's own responses' MEAN(OF PF01-PF10)? Otherwise, average of same item(question) mean (PF01) instead taking average of different questions mean(PF01-PF10) makes sense to me.
Yes, that's what the column mean would do, it's a standard type of way to replace a missing ... I have never seen replacing the missing with the row mean.
Below is what I would do. Any objections? Here I am scoring data to 0-100 scale. So then the next step is to weight them.
DATA WANT; set HAVE;
ARRAY PFI(10) PF01-PF10;
DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;
PFNUM = N(OF PF01-PF10);
IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);
PF = ((RAWPF - 10)/(30-10)) * 100;
LABEL PF = 'PF SCORING (0-100)'
RAWPF = 'PF ITEM SUMMARY';
RUN;
@SUNY_Maggie wrote:
Very interesting.
Zeros indicate "no, my physical activity is not limited" so is not missing. Assigning the column mean (mean(PF:)) would reflect in all study subjects' average versus imputing one's own responses' MEAN(OF PF01-PF10)? Otherwise, average of same item(question) mean (PF01) instead taking average of different questions mean(PF01-PF10) makes sense to me.
This is incorrect and shows a misunderstanding of how mean() function works. You should test this thoroughly.
I meant 'hostorical' as in previous code used in history, ie code used to analyze results last year. A similar rationale can be applied to using a macro. It's easier to modify the data to fit a macro than to rewrite the macro.
Depends on why you have 0 in the first place and what you're trying to do overall. Why is that one question missing zero? Context is more important here, this isn't a technical problem, it's a logic issue.
I work with survey data regularly and don't see how this would help anything further on, except possibly align your values to match historical coding so you can use historical code.
The question you haven't answered - what's wrong with zeroes in the first place.
@SUNY_Maggie wrote:
Hi Reeza, missing imputation and scoring program was provided to me by the author of the methodology. And i was trying to fit my data to the program provided. Now I'm willing to modify the program to the data, not other way around.
Modifying your data to fit a standard program is a great reason to shift your data. You just need to explain things sometimes because otherwise it doesn't make sense to us, in terms of best practice. You can also just ignore us 😉
"What's wrong with Zeros in the first place?"
Well it's not a problem in SAS but in many programming languages, numeric values are by default, Zero. So you can't tell the difference between missing and zero.
Got 10 questions and 9 need shifting?
q1=q1+1;
q2=q2+1;
q3=q3+1;
q4=q4+1;
*(skip the one you don't want to shift);
q6=q6+1;
q7=q7+1;
q8=q8+1;
q9=q9+1;
q10=q10+1;
keeping it simple is the best way to go. Try searching for a bug involving a particular variable that doesn't appear in your code because you used some programming short cut unique to some language.
Reeza Thanks a lot.
Trusted Advosor's suggestion made the standard program fit to my data better. Below code is (after suggestion applied) to score the questionnaire responses using 0-100 scale. I need someone to reassure with me on the program to be sure. Would you please look at it especially beyong /*Attention*/ mark? It makes sense to me, however, resulting data has many negative values (see attached csv).
DATA WANT; set HAVE;
ARRAY PFI(10) PF01-PF10;
DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;
/*Attention*/
PFNUM = N(OF PF01-PF10);
IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);
PF = ((RAWPF - 10)/(30-10)) * 100;
LABEL PF = 'PF SCORING (0-100)'
RAWPF = 'PF ITEM SUMMARY';
RUN;
DO I = 1 TO 10;
PFMEAN = MEAN(OF PFI(I));
IF PFI(I)= . THEN PFI(I) = PFMEAN;
END;
That portion doesn't make sense. Mean(of pf(I)) is just going to be that number, the mean of a single value is the value.
Then the reassignment isn't going to work. Also, replacing missing with mean is a naive way to impute missing values and is probably one of the worst. Don't do it. Look at other imputation methods.
You may be looking to do something like the following but to be honest, working off code without knowing the actual intention is guesswork. It's better to evaluate the logic process and see if the code matches that.
pfmean = mean (of pf(*));
do i=1 to 10; *You shouldn't do this but if you do this is the way to do it;
if pf(i)=. then pf(i)=pfmean;
end;
This is also concerning because it calculates the sum of a variable number of values and then treats it the same, wether there's 6 or 10 values. Without understanding context this seems dangerous.
IF PFNUM GE 5 THEN RAWPF = SUM(OF PF01-PF10);
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.