Hello!
I'm a grad student, and I'm working on a thesis. I have data from an online survey we administered through Redcap. The data is in English and Spanish. However, the way Redcap administered the survey and collected the data was as if the English and Spanish surveys were different sections of the same survey. So I have a bunch of participants, and each participant has variables in English & Spanish. The English speakers are missing the Spanish data, and the Spanish speakers are missing the English data. (I hope this makes sense). My question is, how do I pool all participants together? So if I were to run a frequency table on something like Education Level, the table would have both English and Spanish speakers?
I was told that I could try using an array, but it came up with this error when I tried.
16
17 /* Numeric Arrays */
18
19 Array English (i) q2___1-q2___10 q3 q4 q6 q5 q44 q7 q8 q9 q11-q17e___9 q17f___1-q17f___7
19 ! q18-q28___5 q29 q30___1-q32___12 q33___1-q33___13 q34-q35___9 q36___1-q36___9 q37 q38 q39;
ERROR: Alphabetic prefixes for enumerated variables (q11-q17e___9) are different.
ERROR: Alphabetic prefixes for enumerated variables (q18-q28___5) are different.
ERROR: Alphabetic prefixes for enumerated variables (q30___1-q32___12) are different.
ERROR: Alphabetic prefixes for enumerated variables (q34-q35___9) are different.
20 Array Spanish (i) q2_sp___1-q2_sp___10 q3_sp q4_sp q6_sp q5_sp q44_sp q7_sp q8_sp q9_sp
20 ! q11_sp-q17e_sp___9 q17f_sp___1-q17f_sp___7 q18_sp-q28_sp___5 q29_sp q30_sp___1-q32_sp___13
20 ! q33_sp___1-q33_sp___13 q34_sp-q35_sp___9 q36_sp___1-q36_sp___9 q37_sp q38_sp q39_sp;
ERROR: Missing numeric suffix on a numbered variable list (q11_sp-q17e_sp___9).
ERROR: Missing numeric suffix on a numbered variable list (q18_sp-q28_sp___5).
ERROR: Alphabetic prefixes for enumerated variables (q30_sp___1-q32_sp___13) are different.
ERROR: Missing numeric suffix on a numbered variable list (q34_sp-q35_sp___9).
21 do i= 148;
22 if languages1=2 then Spanish (i) = English (i);
ERROR: Mixing of implicit and explicit array subscripting is not allowed.
ERROR: Mixing of implicit and explicit array subscripting is not allowed.
ERROR: Mixing of implicit and explicit array subscripting is not allowed.
ERROR: Mixing of implicit and explicit array subscripting is not allowed.
23
24 end;
25
26 /* Character Arrays */
27
28 Array EnglishC (i) q1 otherq2 otherq4 otherq6 otherq7 otherq8 q10 otherq17e otherq17f otherq28
28 ! otherq29 otherq32 otherq33 otherq35 otherq36 contact;
29 Array SpanishC (i) q1_sp otherq2_sp otherq4_sp otherq6_sp otherq7_sp otherq8_sp q10_sp
29 ! otherq17e_sp otherq17f_sp otherq28_sp otherq29_sp otherq32_sp otherq33_sp otherq35_sp
29 ! otherq36_sp contact_sp;
30 do i= 148;
31
32 if languages1=2 then Spanish (i) = English (i);
ERROR: Mixing of implicit and explicit array subscripting is not allowed.
ERROR: Mixing of implicit and explicit array subscripting is not allowed.
ERROR: Mixing of implicit and explicit array subscripting is not allowed.
ERROR: Mixing of implicit and explicit array subscripting is not allowed.
33
34 end;
Did you try the bit of code I suggested earlier?
I strongly suggest getting all of the values into one set of variables before attempting to recode, such as your race_eth variable.
Note that if your variables q2___1-q2___10 have more than one with the value of 1 then your Race_eth variable using this code will only reflect the value of the last variable that had the value of one. Is that the desired behavior?
I am not sure exactly what you were attempting with
do i= 148; if languages1=2 then Spanish (i) = English (i); end.
with a single value of i it would only process one of the variables in each array, assuming there are at least 148 variables. Were you wanting to place the values of the English array into the Spanish variable?
The error about implicit, I think this comes from Languages1, and explicit (these would be the Spanish and English array references makes me think that there is not actually an existing variable named Languages1 in the data set. So SAS thinks it might refer to another element that might belong in an array.
Again, recommending get all the values into one set of variables, whether the English or Spanish, then your Race_sum would be better coded as:
race_sum = sum(of q2___1 - q2___10);
Better in two senses: First the more obvious that the code is much shorter and easier to follow. Second is if any of the variables listed in a series + operations is missing then the result will be missing.
Your error with
70 proc freq label; ----- 22 202
is because LABEL is not a valid Procedure statement option.
This error:
74 proc freq ; 75 table race_sum * q2___1 * q2___2 * q2___3 * q2___4 * q2___5 * q2___6 * q2___7 * q2___8 * 75 ! q2___9 * q2___10/list missing; ERROR: Variable RACE_SUM not found. 76 run;
Is because you have the Race_sum (and race_eth) assignment code commented out, appearing between /* */ in the data step. So those statements do not execute and the variables are not calculated.
Hint for the long run: Even though SAS will use the last data set to run most procedures with you really want to get into a habit of specifying which exact data set to use. There are times when for debugging purposes you may intend to run proc freq or another procedure against a specific set but forget that it was not the last one created, which is the one used. You can end up spending a lot of time trying to figure out where 3 records went or why variable XXX is not re-coding correctly when the set you think Proc Freq is using is AAA but data set BBB was the last created.
Did the data come in one file or two?
How did you read that data into SAS? As in what code was used?
Are all of the question variables of the same type? If not you need to split numeric from character.
When the variables are not actually sequentially numbered: example q11-q17e___9 you have to separate out the Q11, Q12 (if any) Q13 Q14 etc or use the double -- such as: q11--q17e___9 to indicate the columns are sequential, not the variable names.
You can use the COALESCE, or if the variables a character values COALESCEC to select the value into one variable.
If you don't run into problems then code such as the following will move all the Spanish responses into the English variables. If your data is character, the as I say above COALESCEC or you may need two variable.
data want; set have; Array English (*) q2___1-q2___10 q3 q4 q6 q5 q44 q7 q8 q9 q11--q17e___9 q17f___1-q17f___7 q18--q28___5 q29 q30___1-q32___12 q33___1-q33___13 q34--q35___9 q36___1-q36___9 q37 q38 q39; Array Spanish (*) q2_sp___1-q2_sp___10 q3_sp q4_sp q6_sp q5_sp q44_sp q7_sp q8_sp q9_sp q11_sp--q17e_sp___9 q17f_sp___1-q17f_sp___7 q18_sp--q28_sp___5 q29_sp q30_sp___1-q32_sp___13 q33_sp___1-q33_sp___13 q34_sp--q35_sp___9 q36_sp___1-q36_sp___9 q37_sp q38_sp q39_sp; do i=1 to dim(English); English(i) = Coalesce(English(i),Spanish(i)); end; run;
Danger: Do not reuse the same data set name on the Data and Set statements. Overwriting data sets when recoding data this way may cause loss of values if there is a logic problem.
AFTER you have verified that the data step does what is needed then you could rerun the step and drop all the Spanish variables (or the other way around if preferred).
Probably if this had been my project I would have written code that made sure the variable names were the same for English and Spanish, did not have all those extra underscore characters and possibly a few other things.
Did you try the bit of code I suggested earlier?
I strongly suggest getting all of the values into one set of variables before attempting to recode, such as your race_eth variable.
Note that if your variables q2___1-q2___10 have more than one with the value of 1 then your Race_eth variable using this code will only reflect the value of the last variable that had the value of one. Is that the desired behavior?
I am not sure exactly what you were attempting with
do i= 148; if languages1=2 then Spanish (i) = English (i); end.
with a single value of i it would only process one of the variables in each array, assuming there are at least 148 variables. Were you wanting to place the values of the English array into the Spanish variable?
The error about implicit, I think this comes from Languages1, and explicit (these would be the Spanish and English array references makes me think that there is not actually an existing variable named Languages1 in the data set. So SAS thinks it might refer to another element that might belong in an array.
Again, recommending get all the values into one set of variables, whether the English or Spanish, then your Race_sum would be better coded as:
race_sum = sum(of q2___1 - q2___10);
Better in two senses: First the more obvious that the code is much shorter and easier to follow. Second is if any of the variables listed in a series + operations is missing then the result will be missing.
Your error with
70 proc freq label; ----- 22 202
is because LABEL is not a valid Procedure statement option.
This error:
74 proc freq ; 75 table race_sum * q2___1 * q2___2 * q2___3 * q2___4 * q2___5 * q2___6 * q2___7 * q2___8 * 75 ! q2___9 * q2___10/list missing; ERROR: Variable RACE_SUM not found. 76 run;
Is because you have the Race_sum (and race_eth) assignment code commented out, appearing between /* */ in the data step. So those statements do not execute and the variables are not calculated.
Hint for the long run: Even though SAS will use the last data set to run most procedures with you really want to get into a habit of specifying which exact data set to use. There are times when for debugging purposes you may intend to run proc freq or another procedure against a specific set but forget that it was not the last one created, which is the one used. You can end up spending a lot of time trying to figure out where 3 records went or why variable XXX is not re-coding correctly when the set you think Proc Freq is using is AAA but data set BBB was the last created.
show us the line of code that generates
19 Array English (i) q2___1-q2___10 q3 q4 q6 q5 q44 q7 q8 q9 q11-q17e___9 q17f___1-q17f___7
19 ! q18-q28___5 q29 q30___1-q32___12 q33___1-q33___13 q34-q35___9 q36___1-q36___9 q37 q38 q39;
I suspect it might contain a macro variable?
In any case the code makes assumptions on variable names, and these assumptions (that the names are suffixed with constant increments) is not (or no longer) valid.
q11-q17e___9
is not a valid suite of names (as clearly indicated by the log message).
Bottom line: Fix the list of variables list as array elements.
If they are contiguous in the table, it may be as simple as using a double hyphen:
q11--q17e___9
Also note that
Array English (i)
is invalid syntax. So this log looks very suspicious.
The use of an index variable in the definition of an array is valid SAS syntax.
Try it.
data _null_;
set sashelp.class (obs=3);
put _n_=;
array vars (index) sex name ;
do over vars;
put index= vars= ;
end;
run;
They seemed to have removed the documentation, but the code still works the way it did in 1983 when I learned SAS.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.