Help using Base SAS procedures

Delimiting Codes in a Single Variable into Multiple Variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 143
Accepted Solution

Delimiting Codes in a Single Variable into Multiple Variables

Hi all--

I work at a city level evaluation office for child welfare agencies. For each agency we evaluate, we use a series of surveys.  The surveys have questions that ask the evaluator to check “all that apply”.

Here is an example:

Q16. Which of the following controlling interventions did the case planner provide to control the safety factor(s)? (Check all that apply)

Code                                 Responce Option

1=          Intensive Home Based Family PreservationServices

2=          Family has been provided Emergency Shelter

3=          Family has been safely moved to a DV Shelter

4=          Caretaker has been moved to a safe environment

5=          Authorization of Emergency Food/Cash

6=          Judicial Intervention

7=          Order of Protection

8=          Law Enforcement involvement

9=          Emergency Medical Services

10=         Crisis Mental Health Services have beenprovided

11=          Emergency In-patient Mental Health Serviceshave been provided

12=          Immediate Supervision/Monitoring

13=          Emergency Alcohol Abuse Services have beenprovided

14=          Emergency Drug Abuse Services

When the raw data is generated from the database  the “check all that apply” question response option codes are all contained in one cell and separated  by a “;” for each case. See below.

Raw Data

Case Q16
case1ccc
1;3;7
case22
case35;7;12;14
case41;2;4
case53
case66;8;10;12;13
case74;8

What I need to do is delimitate the column (kind of like the “Text to Columns” option in excel) so for each case all the response option codes have their own variable.  The new variables must be in the order the codes appear in the original variable, in this case Q16. 


Q16 should now look like this:

Case Q16_aQ16_bQ16_cQ16_dQ16_e
case1137

case22



case3571214
case4124

case53



case668101213
case748


Q16 was separated into 5 variables because in case6 5  response options were checked by the evaluator.

There will be a different number response options for the “check all that apply” questions  for each case – one case could just have 2;11 and one case could have all 14 codes . For Q16 there are 14 response options, so there is the potential the evaluator could check all the options for a case. Therefore, for that case, the codes would need to fit into 14 new variables.

And then all the codes need to be decoded, so the final product should look like this:

CaseQ16_aQ16_bQ16_cQ16_dQ16_e
case1Intensive Home Based Family PreservationServices Family has been safely moved to a DV ShelterOrder of Protection

case2Family has been provided Emergency Shelter



case3Authorization of Emergency Food/CashOrder of ProtectionImmediate Supervision/Monitoring Emergency Drug Abuse Services
case4Intensive Home Based Family PreservationServices Family has been provided Emergency Shelter Caretaker has been moved to a safe environment

case5Family has been safely moved to a DV Shelter



case6Judicial Intervention Law Enforcement involvementCrisis Mental Health Services have beenprovidedImmediate Supervision/Monitoring Emergency Alcohol Abuse Services have beenprovided
case7Caretaker has been moved to a safe environmentLaw Enforcement involvement


So, for case3 4 controling interventions were used and for case7 2 controling interventions.

In addtion, there will be a different sample size for each agency.

I’ve asked around for help and the brief responses I’ve received include:

  • Use SCAN and/or SUBSTR Functions

  • Create a dynamic Macros using %Do -%TO-%END ect…

but I’m not familiar with any of these procedures.

Any help is greatly appreciated.

Thanks so much. 


Accepted Solutions
Solution
‎12-30-2011 05:55 PM
Frequent Contributor
Posts: 143

Re: Delimiting Codes in a Single Variable into Multiple Variables

Thanks Tom and Art.

Tom, your version was perfect for analysis. I can now effectively count the frequencies of all different interventions. This is great!

Art, your version is perfect for manipulating the data to send out to agencies. Please see the version i used below . Thanks so Much!

data Pre.Q60 (keep=Agency_Name Case_No Case_FirstName Case_LastName cra60_);

  set Pre.Presheet1;

  i=1;

  do while (scan(cra60,i,";") ne "");

    cra60_=input(scan(cra60,i,";"),12.);

    output;

    i+1;

  end;

Proc sort data= Pre.Q60;

by Case_No;

proc transpose

data=Pre.Q60

out=Pre.Q60 (drop=_Smiley Happy

prefix= cra60_;

  var cra60_  ;

  format cra60_ cra60format.;

  by Case_No;

run;

View solution in original post


All Replies
Frequent Contributor
Posts: 143

Delimiting Codes in a Single Variable into Multiple Variables

The tables were too big. It looks like varible 16_e was cut off. Sorry.

PROC Star
Posts: 7,416

Delimiting Codes in a Single Variable into Multiple Variables

There are a number of ways to do what you want.  Below is one such way.  I didn't complete the entire task, as I don't know your full coding scheme.  What it is missing is a proc format that assigns your definitions to each value:

data have;

  input Case $ Q16 $20.;

  cards4;

case1 1;3;7

case2 2

case3 5;7;12;14

case4 1;2;4

case5 3

case6 6;8;10;12;13

case7 4;8

;;;;

data need (keep=case q16_);

  set have;

  i=1;

  do while (scan(Q16,i,";") ne "");

    Q16_=input(scan(Q16,i,";"),12.);

    output;

    i+1;

  end;

run;

proc transpose data=need

               out=want (drop=_Smiley Happy

               prefix=q16_;

  var q16_;

  by case;

run;

Frequent Contributor
Posts: 143

Delimiting Codes in a Single Variable into Multiple Variables

Thanks so much art297!

Question:The sample and response options are different every time and we evaluatemany agencies. We won’t be able to enter in a dataset manually using CARDS everytime.

Super User
Super User
Posts: 6,694

Re: Delimiting Codes in a Single Variable into Multiple Variables

Art just included your sample data to make it easier for others to run the code he is suggesting.

You can add an INFILE statement to read data from a text file.  Or use some other method to read your source data.

Frequent Contributor
Posts: 143

Delimiting Codes in a Single Variable into Multiple Variables

Hi Art--

Where do I assign the format definitions for each value? So if I create a format $cra60f do I add it to proc transpose? 

proc transpose

data=Pre.Q60

out=Pre.Q60  (drop=_Smiley Happy

prefix= cra60_;

  var cra60_ ;

  by Case_No;

run;

Super User
Super User
Posts: 6,694

Re: Delimiting Codes in a Single Variable into Multiple Variables

For use in analysis it probably makes the most sense to create a new variable for each possible response and assign it a 0/1 value to indicate whether it was included in the list or not.  For example for Q16 there are 14 things to choose so you need to make 14 new variables.  You can then create an array of these new variables, loop over the values in the original string variable and use the pulled out value as an index into the array to set the flag to indicate that choice was selected.

data want ;

  set have ;

  array q16_ q16_1 - q16_14 ;

  * Set all of the flag variables to false ;

  do i=1 to dim(q16_);  q16_(i)=0; end;

  * Loop over selected items and set to true ;

  do j=1 by 1 while (scan(q16,j,';') ne ' ');

    i = input(scan(q16,j,';'),best.);

    q16_(i)=1;

  end;

  drop i j ;

run;

To convert it to the format you requested we would need change the logic a little.  It will be easier if you first create a format for the question responses.  Note that format names cannot end in a number so name the format something like Q16F.

data want ;

  set have ;

  array q16_ $50 q16_1 - q16_14 ;

  * Loop over selected items and output formatted value of selection number ;

  do j=1 by 1 while (scan(q16,j,';') ne ' ');

    i = input(scan(q16,j,';'),best.);

    q16_(j)=put(i,q16f.);

  end;

  drop i j ;

run;

Frequent Contributor
Posts: 143

Delimiting Codes in a Single Variable into Multiple Variables

Thanks so much Tom! So I tried this on the dataset on a question cra60.

data Pre.cra60test ;

  set Pre.Sheet1 ;

  array cra60_ cra60_1 - cra60_14 ;

  * Set all of the flag variables to false ;

  do i=1 to dim(cra60_);  cra60_ (i)=0; end;

  * Loop over selected items and set to true ;

  do j=1 by 1 while (scan(cra60,j,';') ne ' ');

    i = input(scan(cra60,j,';'),best.);

    cra60_(i)=1;

  end;

  drop i j ;

run;

I get these messages st the begining of the log:

NOTE: Invalid argument to function INPUT at line 819 column 9.

ERROR: Array subscript out of range at line 820 column 5.

Then the log shows all the questions in the dataset with an =and a number =and a *

and then the array which was created

cra60_1=0 cra60_2=0 cra60_3=0 cra60_4=0 cra60_5=0 cra60_6=0 cra60_7=0 cra60_8=0 cra60_9=0 cra60_10=0 cra60_11=0 cra60_12=0 cra60_13=0 cra60_14=0

i= j=1 _ERROR_=1 _N_=1

Then these final messages:

NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.

      Each place is given by: (Number of times) at (Line)Smiley SadColumn).

      1 at 819:9

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 1 observations read from the data set PRE.SHEET1.

It looks like this is being run for all the Questions/Variables in the dataset, not just cra60.

Any help is greatly aperciated.

Frequent Contributor
Posts: 143

Delimiting Codes in a Single Variable into Multiple Variables

sorry i forgot to mention. I get 0 observations back in the new dataset. Thanks again.

Super User
Super User
Posts: 6,694

Re: Delimiting Codes in a Single Variable into Multiple Variables

You can use the line numbers in the error messages to see which statement is generating the error.

Most likely it is because the value returned from the SCAN function call is NOT a number.  You should be able to check that by looking at the values for the variables printed in the log. The second error message is then caused by trying to use the missing value in the variable I to index into the array.

You can break it into smaller steps to see more in the log or to add your own error checking.  It is possible that the value of variable CRA60 has characters other than digits and semi-colons that are causing this error.  You can use the ?? modifier prefix on the format in the input statement to suppress the error messages when the value is not a valid number.

do j=1 by 1 while (scan(cra60,j,';') ne ' ');

  next=scan(cra60,j,';');

  i = input(next,??best.);

  if i ne . then cra60_(i)=1;

  else put case= cra60= j= next= next $hex. ;

end;

drop i j next ;

Frequent Contributor
Posts: 143

Delimiting Codes in a Single Variable into Multiple Variables

Thanks!

Super User
Posts: 10,865

Delimiting Codes in a Single Variable into Multiple Variables

My response isn't in the SAS realm. I have worked with at least two different survey software packages that can exhibit a similar data output/export behavior. Both packages had options that allowed configuration of multiple choice output in two or three different forms. It may well be worth expoloring those options as they may avoid lots of work.

If you are working with a contractor that provides collected data, prod them to look for other export options.

Also, another approach to data you have is to set dichotomous variables such that Q16a is yes=1/no=0 for Intensive Home Based Family PreservationServices.

Depending on the analysis you are doing you can then use SUM in PROC MEANS, REPORT or TABULATE to get the number receiving that service, MEAN ( * 100) would be the percent receiving, and N the count.

Your current approach is going to require looking across multiple Q16 variables to answer questions about specific topics.

Frequent Contributor
Posts: 143

Delimiting Codes in a Single Variable into Multiple Variables

Hi Tom. Thank youaginin for your help. I got your code to work but I amstill not sure what these 3 lines of code mean.


i = input(next,??best.);
  if i ne . then cra60_(i)=1;
  else put case= cra60= j= next= next $hex. ;
end;

The variables Next, i and j mean what?  I can't seem to find a description for the format $hex. 

Again, thanks for all your help today!

PROC Star
Posts: 7,416

Delimiting Codes in a Single Variable into Multiple Variables

If you look at Tom's code, the line you are asking about is just putting some things in your log so that you can figure out what is wrong.

next is a variable that is created by the statement: next=scan(cra60,j,';');

i is a variable that is created by the statement:  i=input(next,??best.);

j is a variable that was created by the loop:

 

do j=1 by 1 while (scan(cra60,j,';') ne ' ');


Frequent Contributor
Posts: 143

Delimiting Codes in a Single Variable into Multiple Variables

Thanks! last question:

Where do I assign the format definitions for each value? So if I create a format cra60f do I add it to proc transpose?

proc transpose

data=Pre.Q60

out=Pre.Q60  (drop=_Smiley Happy

prefix= cra60_;

  var cra60_ ;

  by Case_No;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 244 views
  • 10 likes
  • 5 in conversation