My aim is to split multiple values from rows to multiple vars (create a new var for each same value)
I have read various similar topics but all inclusive.
Each rows contain between 1 to X information separate by a comma.
For each variable I have to process, I have around 15 items plus the text typed by the users (free fields of a survey).
Currently I get information by column but I didn't succeed to group each same value by column (see eg). I also tried to recode variables but, I have a lost of data...
As I have several hundred lines and several variables with so much information to process, would someone please advise me?
Eg:
What I have
Id Varname
1 Would like to get TV, Would like to view message
2 Would like to speak
3 Would like to speak, Would like to get TV
An idea of what I want
Id Varname TV SPEAK MSG
1 Would like to get TV, Would like to view message 1 0 1
2 Would like to speak 0 1 0
3 Would like to speak, Would like to get TV 1 1 0
Best
Mic
It will be hard to guaranty extraction all required words.
To get the last word of a string you can use scan function:
wanted = scan(<string>,-1);
In other case you want to extract the following word coming after "prefer" ?!
if findw(<string>,'prefer') > 0 then
wanted = scan(substr(<string>,findw(<string>,'prefer')),1);
You need check your data and gather rules to define which words to look for.
The program and its output should be checked often to assure optimal result.
Even after having the rules, you still need:
1) To check whether the extracted words are the expected ones and all required
2) I suggest to start with transposing the data so each observation contains:
ID, STRING (one string only per observation - assigned the maximum expected length).
Alternative way will be to scan the data and create a list of required words, then it is possible to check each word of the string if it is in the list either by array method or by a hashing method.
Seems simple enough.
data have ;
input id varname $60. ;
cards;
1 Would like to get TV, Would like to view message
2 Would like to speak
3 Would like to speak, Would like to get TV
;
data want ;
set have ;
TV = 0 ne findw(varname,'Would like to get TV',',','sit');
SPEAK = 0 ne findw(varname,'Would like to speak',',','sit');
MSG = 0 ne findw(varname,'Would like to view message',',','sit');
run;
proc print;
run;
Obs id varname TV SPEAK MSG 1 1 Would like to get TV, Would like to view message 1 0 1 2 2 Would like to speak 0 1 0 3 3 Would like to speak, Would like to get TV 1 1 0
Yes but how to be sure I do not forget the elements written in free text ? and how can I get them in easy way ?
Do you know ahead the values you want to save as variables or you want to gather them dynamically according to input text?
Are there rules which values to extract into the variables?
In the database I must analyze it is a mixture of known and unknown data.
For the known data Tom's method is perfect (and simple 🙂 )but for the free-text data it's more complicated.
In theory this free text is located at the end of the character string but not checked on all the data.
for example my cell can be of the form
ID varname
1 Would like to get TV, Would like to view message
2 Would like to speak
3 Would like to speak, Would like to get TV, I prefer ... but ....
I prefer ... but... = free text added by the participant
It will be hard to guaranty extraction all required words.
To get the last word of a string you can use scan function:
wanted = scan(<string>,-1);
In other case you want to extract the following word coming after "prefer" ?!
if findw(<string>,'prefer') > 0 then
wanted = scan(substr(<string>,findw(<string>,'prefer')),1);
You need check your data and gather rules to define which words to look for.
The program and its output should be checked often to assure optimal result.
Even after having the rules, you still need:
1) To check whether the extracted words are the expected ones and all required
2) I suggest to start with transposing the data so each observation contains:
ID, STRING (one string only per observation - assigned the maximum expected length).
Alternative way will be to scan the data and create a list of required words, then it is possible to check each word of the string if it is in the list either by array method or by a hashing method.
Thank you for your time and explanations.
By going through this method I seem to obtain the desired result.... remains more than just doing a last random check on all data.
In any case I thank you both for the advices on these SAS functionalities that I almost never use.
The ideal will always remain to participate in the design of the data collection in order to limit this kind of things.
data demo (keep=id varname wanted); set demo;
wanted = scan(varname,-1,","); run;
data demo ; set demo;
if find(wanted, 'Would like to view message',"i") >0 OR
find(wanted, 'Would like to speak',"i") >0 THEN wanted="";
run;
proc freq data=demo;
table varname wanted; run
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.