Converting a long string variable into many yes/no variables

Accepted Solution Solved
Reply
Highlighted
New Contributor
Posts: 2
Accepted Solution

Converting a long string variable into many yes/no variables

I know the title doesn't explain much. It's a hard thing to describe. I used to have the code somewhere, so I know it's possible, but I've lost it.

 

I put out a survey, in which people were allowed to select more than one answer. When downloaded as a CSV, it just put all of the choices each individual selected in a semicolon-delimited list. That's kind of hard to analyze, so I'd like to split it up into many variables that are either yes or no. So, if the individual selected one of the answers, then the variable for that option would be yes. I hope that makes sense. This is what the variable looks like, if it helps!

 

Capture.PNG

 

Thank you! It was really useful code, and I hate that I lost it.


Accepted Solutions
Solution
Wednesday
Super User
Posts: 12,994

Re: Converting a long string variable into many yes/no variables

Here is an example of what I think you want.

data want;
   set have;
   Brigitte = (index(longstring,'D.va/Brigitte')>0);
run;

Which will assign 1 (true when the full string is found in the "longstring" variable) or 0 (false).

 

The numeric forms are much nicer than Yes/No coding as you can sum the variables to get counts of yes and means to get a decimal percentage of yes. Also some forms of modeling require numeric values.

 

You would have one line for each of the variables you want to assign. I assumed that the comma in the picture separated the values of interest since there were no semicolons shown. An alternate approach would consist of a temporary array holding the search values and then an array of the variable names and a loop over the values in the temporary array and the whichc function to know which of the array of yes/no variables to set to yes after defaulting to no (0). But I am not going to type that much from a picture.

View solution in original post


All Replies
Solution
Wednesday
Super User
Posts: 12,994

Re: Converting a long string variable into many yes/no variables

Here is an example of what I think you want.

data want;
   set have;
   Brigitte = (index(longstring,'D.va/Brigitte')>0);
run;

Which will assign 1 (true when the full string is found in the "longstring" variable) or 0 (false).

 

The numeric forms are much nicer than Yes/No coding as you can sum the variables to get counts of yes and means to get a decimal percentage of yes. Also some forms of modeling require numeric values.

 

You would have one line for each of the variables you want to assign. I assumed that the comma in the picture separated the values of interest since there were no semicolons shown. An alternate approach would consist of a temporary array holding the search values and then an array of the variable names and a loop over the values in the temporary array and the whichc function to know which of the array of yes/no variables to set to yes after defaulting to no (0). But I am not going to type that much from a picture.

New Contributor
Posts: 2

Re: Converting a long string variable into many yes/no variables

Thank you!! That's exactly the code I lost. 

Super User
Posts: 22,819

Re: Converting a long string variable into many yes/no variables

Check the survey tool you used. There's usually options that will either split or combine them into one field.
PROC Star
Posts: 8,093

Re: Converting a long string variable into many yes/no variables

As long as your CSV file doesn't contain a header record, and none of your variables have a length greater than 32 characters, and don't contain any special characters or embedded spaces, the code like the following might work:

/* import data */
proc import datafile='/folders/myfolders/have.csv'
            out=need dbms=csv replace;
  getnames=no;
run;

/* get a list of all vars */
data unique (keep=variable);
  set need;
  length variable $32;
  array vars(*) $32. var:;
  do i=1 to dim(vars);
    if vars(i) ne '' then do;
      variable=vars(i);
      output;
    end;
    else leave;
  end;
run;

/* Create a macro variable to initialize all variables */
proc sql noprint;
  select distinct catt(variable,'=0;'), variable
    into :vars separated by ' '
      from unique
  ;
quit;

/* create desired output dataset */
data _null_ (drop=i var:);
  set need end=last;
  array vars(*) $32. var:;
  length forexec $255;
  &vars.;
  if _n_ eq 1 then call execute ("data want(drop=var:); set need; &vars.");
  do i=1 to dim(vars);
    if vars(i) ne '' then do;
      forexec=catt(vars(i),'=1;');
      call execute(forexec);
    end;
    else leave;
  end;
  call execute('output;');
  if last then call execute('run;');
run;

Art, CEO, AnalystFinder.com

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 101 views
  • 2 likes
  • 4 in conversation