Hi all--
I work at a city level evaluation office for child welfare agencies. For each agency we evaluate, we use a series of surveys. The surveys have questions that ask the evaluator to check “all that apply”.
Here is an example:
Q16. Which of the following controlling interventions did the case planner provide to control the safety factor(s)? (Check all that apply)
Code Responce Option
1= Intensive Home Based Family PreservationServices
2= Family has been provided Emergency Shelter
3= Family has been safely moved to a DV Shelter
4= Caretaker has been moved to a safe environment
5= Authorization of Emergency Food/Cash
6= Judicial Intervention
7= Order of Protection
8= Law Enforcement involvement
9= Emergency Medical Services
10= Crisis Mental Health Services have beenprovided
11= Emergency In-patient Mental Health Serviceshave been provided
12= Immediate Supervision/Monitoring
13= Emergency Alcohol Abuse Services have beenprovided
14= Emergency Drug Abuse Services
When the raw data is generated from the database the “check all that apply” question response option codes are all contained in one cell and separated by a “;” for each case. See below.
Raw Data
Case | Q16 |
---|---|
case1ccc | 1;3;7 |
case2 | 2 |
case3 | 5;7;12;14 |
case4 | 1;2;4 |
case5 | 3 |
case6 | 6;8;10;12;13 |
case7 | 4;8 |
What I need to do is delimitate the column (kind of like the “Text to Columns” option in excel) so for each case all the response option codes have their own variable. The new variables must be in the order the codes appear in the original variable, in this case Q16.
Q16 should now look like this:
Case | Q16_a | Q16_b | Q16_c | Q16_d | Q16_e |
---|---|---|---|---|---|
case1 | 1 | 3 | 7 | ||
case2 | 2 | ||||
case3 | 5 | 7 | 12 | 14 | |
case4 | 1 | 2 | 4 | ||
case5 | 3 | ||||
case6 | 6 | 8 | 10 | 12 | 13 |
case7 | 4 | 8 |
Q16 was separated into 5 variables because in case6 5 response options were checked by the evaluator.
There will be a different number response options for the “check all that apply” questions for each case – one case could just have 2;11 and one case could have all 14 codes . For Q16 there are 14 response options, so there is the potential the evaluator could check all the options for a case. Therefore, for that case, the codes would need to fit into 14 new variables.
And then all the codes need to be decoded, so the final product should look like this:
Case | Q16_a | Q16_b | Q16_c | Q16_d | Q16_e |
---|---|---|---|---|---|
case1 | Intensive Home Based Family PreservationServices | Family has been safely moved to a DV Shelter | Order of Protection | ||
case2 | Family has been provided Emergency Shelter | ||||
case3 | Authorization of Emergency Food/Cash | Order of Protection | Immediate Supervision/Monitoring | Emergency Drug Abuse Services | |
case4 | Intensive Home Based Family PreservationServices | Family has been provided Emergency Shelter | Caretaker has been moved to a safe environment | ||
case5 | Family has been safely moved to a DV Shelter | ||||
case6 | Judicial Intervention | Law Enforcement involvement | Crisis Mental Health Services have beenprovided | Immediate Supervision/Monitoring | Emergency Alcohol Abuse Services have beenprovided |
case7 | Caretaker has been moved to a safe environment | Law Enforcement involvement |
So, for case3 4 controling interventions were used and for case7 2 controling interventions.
In addtion, there will be a different sample size for each agency.
I’ve asked around for help and the brief responses I’ve received include:
but I’m not familiar with any of these procedures.
Any help is greatly appreciated.
Thanks so much.
Thanks Tom and Art.
Tom, your version was perfect for analysis. I can now effectively count the frequencies of all different interventions. This is great!
Art, your version is perfect for manipulating the data to send out to agencies. Please see the version i used below . Thanks so Much!
data Pre.Q60 (keep=Agency_Name Case_No Case_FirstName Case_LastName cra60_);
set Pre.Presheet1;
i=1;
do while (scan(cra60,i,";") ne "");
cra60_=input(scan(cra60,i,";"),12.);
output;
i+1;
end;
Proc sort data= Pre.Q60;
by Case_No;
proc transpose
data=Pre.Q60
out=Pre.Q60 (drop=_:)
prefix= cra60_;
var cra60_ ;
format cra60_ cra60format.;
by Case_No;
run;
The tables were too big. It looks like varible 16_e was cut off. Sorry.
There are a number of ways to do what you want. Below is one such way. I didn't complete the entire task, as I don't know your full coding scheme. What it is missing is a proc format that assigns your definitions to each value:
data have;
input Case $ Q16 $20.;
cards4;
case1 1;3;7
case2 2
case3 5;7;12;14
case4 1;2;4
case5 3
case6 6;8;10;12;13
case7 4;8
;;;;
data need (keep=case q16_);
set have;
i=1;
do while (scan(Q16,i,";") ne "");
Q16_=input(scan(Q16,i,";"),12.);
output;
i+1;
end;
run;
proc transpose data=need
out=want (drop=_:)
prefix=q16_;
var q16_;
by case;
run;
Thanks so much art297!
Question:The sample and response options are different every time and we evaluatemany agencies. We won’t be able to enter in a dataset manually using CARDS everytime.
Art just included your sample data to make it easier for others to run the code he is suggesting.
You can add an INFILE statement to read data from a text file. Or use some other method to read your source data.
Hi Art--
Where do I assign the format definitions for each value? So if I create a format $cra60f do I add it to proc transpose?
proc transpose
data=Pre.Q60
out=Pre.Q60 (drop=_:)
prefix= cra60_;
var cra60_ ;
by Case_No;
run;
For use in analysis it probably makes the most sense to create a new variable for each possible response and assign it a 0/1 value to indicate whether it was included in the list or not. For example for Q16 there are 14 things to choose so you need to make 14 new variables. You can then create an array of these new variables, loop over the values in the original string variable and use the pulled out value as an index into the array to set the flag to indicate that choice was selected.
data want ;
set have ;
array q16_ q16_1 - q16_14 ;
* Set all of the flag variables to false ;
do i=1 to dim(q16_); q16_(i)=0; end;
* Loop over selected items and set to true ;
do j=1 by 1 while (scan(q16,j,';') ne ' ');
i = input(scan(q16,j,';'),best.);
q16_(i)=1;
end;
drop i j ;
run;
To convert it to the format you requested we would need change the logic a little. It will be easier if you first create a format for the question responses. Note that format names cannot end in a number so name the format something like Q16F.
data want ;
set have ;
array q16_ $50 q16_1 - q16_14 ;
* Loop over selected items and output formatted value of selection number ;
do j=1 by 1 while (scan(q16,j,';') ne ' ');
i = input(scan(q16,j,';'),best.);
q16_(j)=put(i,q16f.);
end;
drop i j ;
run;
Thanks so much Tom! So I tried this on the dataset on a question cra60.
data Pre.cra60test ;
set Pre.Sheet1 ;
array cra60_ cra60_1 - cra60_14 ;
* Set all of the flag variables to false ;
do i=1 to dim(cra60_); cra60_ (i)=0; end;
* Loop over selected items and set to true ;
do j=1 by 1 while (scan(cra60,j,';') ne ' ');
i = input(scan(cra60,j,';'),best.);
cra60_(i)=1;
end;
drop i j ;
run;
I get these messages st the begining of the log:
NOTE: Invalid argument to function INPUT at line 819 column 9.
ERROR: Array subscript out of range at line 820 column 5.
Then the log shows all the questions in the dataset with an =and a number =and a *
and then the array which was created
cra60_1=0 cra60_2=0 cra60_3=0 cra60_4=0 cra60_5=0 cra60_6=0 cra60_7=0 cra60_8=0 cra60_9=0 cra60_10=0 cra60_11=0 cra60_12=0 cra60_13=0 cra60_14=0
i= j=1 _ERROR_=1 _N_=1
Then these final messages:
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 819:9
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set PRE.SHEET1.
It looks like this is being run for all the Questions/Variables in the dataset, not just cra60.
Any help is greatly aperciated.
sorry i forgot to mention. I get 0 observations back in the new dataset. Thanks again.
You can use the line numbers in the error messages to see which statement is generating the error.
Most likely it is because the value returned from the SCAN function call is NOT a number. You should be able to check that by looking at the values for the variables printed in the log. The second error message is then caused by trying to use the missing value in the variable I to index into the array.
You can break it into smaller steps to see more in the log or to add your own error checking. It is possible that the value of variable CRA60 has characters other than digits and semi-colons that are causing this error. You can use the ?? modifier prefix on the format in the input statement to suppress the error messages when the value is not a valid number.
do j=1 by 1 while (scan(cra60,j,';') ne ' ');
next=scan(cra60,j,';');
i = input(next,??best.);
if i ne . then cra60_(i)=1;
else put case= cra60= j= next= next $hex. ;
end;
drop i j next ;
Thanks!
My response isn't in the SAS realm. I have worked with at least two different survey software packages that can exhibit a similar data output/export behavior. Both packages had options that allowed configuration of multiple choice output in two or three different forms. It may well be worth expoloring those options as they may avoid lots of work.
If you are working with a contractor that provides collected data, prod them to look for other export options.
Also, another approach to data you have is to set dichotomous variables such that Q16a is yes=1/no=0 for Intensive Home Based Family PreservationServices.
Depending on the analysis you are doing you can then use SUM in PROC MEANS, REPORT or TABULATE to get the number receiving that service, MEAN ( * 100) would be the percent receiving, and N the count.
Your current approach is going to require looking across multiple Q16 variables to answer questions about specific topics.
Hi Tom. Thank youaginin for your help. I got your code to work but I amstill not sure what these 3 lines of code mean.
i = input(next,??best.);
if i ne . then cra60_(i)=1;
else put case= cra60= j= next= next $hex. ;
end;
The variables Next, i and j mean what? I can't seem to find a description for the format $hex.
Again, thanks for all your help today!
If you look at Tom's code, the line you are asking about is just putting some things in your log so that you can figure out what is wrong.
next is a variable that is created by the statement: next=scan(cra60,j,';');
i is a variable that is created by the statement: i=input(next,??best.);
j is a variable that was created by the loop:
do j=1 by 1 while (scan(cra60,j,';') ne ' ');
Thanks! last question:
Where do I assign the format definitions for each value? So if I create a format cra60f do I add it to proc transpose?
proc transpose
data=Pre.Q60
out=Pre.Q60 (drop=_:)
prefix= cra60_;
var cra60_ ;
by Case_No;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.