Hello. I have data sets with a long character variable (Longvar) that I want to split into 3 shorter character variables (Var1 Var2 Var3) . I might have more than 3 (depending on the data set I am working with) but to make it simple I will go with 3. For example:
subj Longvar Var1 Var2 Var3
1 "I need; to create; 3 short variables" I need to create 3 short variables
2 "Must; create; three short variables" Must create three short variables
etc
Note how what is actually written varies, depending on the subject, but I was hoping that the semicolon would be all I needed to be able to split the string - not matter what is written after the semicolon. I know this can be done in SPSS, but I can't seem to find the equivalent in SAS.
What would be the most effective way to do this in SAS?. And, if it can be done, could I create more than 3 variables. Is there a max limit ?
Thank you!
data have;
longvar = "I need; to create; 3 short variables"; output;
longvar="Must; create; three short variables"; output;
run;
data long;
set have;
ID = _n_;
nTerms = count(longVar, ";")+1;
do i=1 to nTerms;
phrase = scan(longVar, i, ";");
output;
end;
run;
proc transpose data=long out=wide;
by ID;
var phrase;
run;
@Mscarboncopy wrote:
Hello. I have data sets with a long character variable (Longvar) that I want to split into 3 shorter character variables (Var1 Var2 Var3) . I might have more than 3 (depending on the data set I am working with) but to make it simple I will go with 3. For example:
subj Longvar Var1 Var2 Var3
1 "I need; to create; 3 short variables" I need to create 3 short variables
2 "Must; create; three short variables" Must create three short variables
etc
Note how what is actually written varies, depending on the subject, but I was hoping that the semicolon would be all I needed to be able to split the string - not matter what is written after the semicolon. I know this can be done in SPSS, but I can't seem to find the equivalent in SAS.
What would be the most effective way to do this in SAS?. And, if it can be done, could I create more than 3 variables. Is there a max limit ?
Thank you!
First, let's confirm whether you really need to do this. PROC REPORT has a "wrap" option that will take long text and print it in as many lines as are needed. So what do you gain by splitting the text yourself?
I need them to be separate variables that will be used in analysis. A lot of these will be numbers.
Thank you. I am not familiar with Scan.
And what I can find is this (please see below). Do I have to write each String? This can't be it though, because it creates only one variable Word.
Data want;
Set have;
keep count word;
length word $30;
string=',leading, trailing,and multiple,,delimiters,,';
delim=',';
modif='mo';
nwords=countw(string, delim, modif);
do count=1 to nwords;
word=scan(string, count, delim, modif);
output;
end;
run;
First really determine need.
The SCAN function can use a specific delimiter to be used in getting the strings separated by that character.
Countw can tell you how many sets of characters you have to process.
An example:
data example; infile datalines missover; input x $35.; array w {3} $ 20; do i=1 to countw(x,';'); w[i]= scan(x,i,';'); end; drop i; datalines4; This; is sometext;separated More text; two substrings ;more interesting; set of words ;;;;
The array is a simple way to define the new variables by name, creating W1, W2 and W3, setting the length for the variables at 20 characters each. The Do loop shows one way to process this.
Thank you.
So I am trying this:
Data Want;
Set have;
array w {3} $ 20;
do i=1 to countw(LONGVAR,';');
w[i]= scan(LONGVAR,i,';');
end;
drop i;
Run;
and I see the W vars being created but it says my LONGVAR is uninitialized, which is odd because I see the variable in the data set have. Due to this, all of the 3 vars have missing values.
It is also saying "Numeric values have been converted to character values"
Which is odd, since LONGVAR is a string var.
I must be doing something wrong.
@Mscarboncopy wrote:
Thank you.
So I am trying this:
Data Want;
Set have;
array w {3} $ 20;
do i=1 to countw(LONGVAR,';');
w[i]= scan(LONGVAR,i,';');
end;
drop i;
Run;
and I see the W vars being created but it says my LONGVAR is uninitialized, which is odd because I see the variable in the data set have. Due to this, all of the 3 vars have missing values.
It is also saying "Numeric values have been converted to character values"
Which is odd, since LONGVAR is a string var.
I must be doing something wrong.
That means you are referencing a variable that doesn't exist in your HAVE data set. So, what is the actual name of the variable that has the long text?
Yes sorry it was a typo in the code. I found it and fixed it.
Now this is the error I am getting for this line:
w[i]= scan(LONGVAR,i,';');
Array subscript out of range at line 6526 column 7.
@Mscarboncopy wrote:
Yes sorry it was a typo in the code. I found it and fixed it.
Now this is the error I am getting for this line:
w[i]= scan(LONGVAR,i,';');
Array subscript out of range at line 6526 column 7.
And just how many ; are in your values. I provide an example for 3 if you do not know how many you actually have then you need a different approach like @Reeza shows, or specify a larger number in the W array to match the maximum number of ; that appear in your variable.
data have;
longvar = "I need; to create; 3 short variables"; output;
longvar="Must; create; three short variables"; output;
run;
data long;
set have;
ID = _n_;
nTerms = count(longVar, ";")+1;
do i=1 to nTerms;
phrase = scan(longVar, i, ";");
output;
end;
run;
proc transpose data=long out=wide;
by ID;
var phrase;
run;
@Mscarboncopy wrote:
Hello. I have data sets with a long character variable (Longvar) that I want to split into 3 shorter character variables (Var1 Var2 Var3) . I might have more than 3 (depending on the data set I am working with) but to make it simple I will go with 3. For example:
subj Longvar Var1 Var2 Var3
1 "I need; to create; 3 short variables" I need to create 3 short variables
2 "Must; create; three short variables" Must create three short variables
etc
Note how what is actually written varies, depending on the subject, but I was hoping that the semicolon would be all I needed to be able to split the string - not matter what is written after the semicolon. I know this can be done in SPSS, but I can't seem to find the equivalent in SAS.
What would be the most effective way to do this in SAS?. And, if it can be done, could I create more than 3 variables. Is there a max limit ?
Thank you!
Thank you. If I have to write this for each string in my data files, I won't be able to do it this way, as I have hundreds of them. Unless, I am misunderstanding.
longvar = "I need; to create; 3 short variables"; output;
longvar="Must; create; three short variables"; output;
run;
Then maybe start your process at the next step and ensure it refers to the correct input data set that's appropriate for you.
Thank you. Yes, that makes sense 🙂
It worked really well. Another basic question: How do I make the Col vars permanent variables?
I will need to rename them as well.
Also, the wide file does not keep the IDs, it transformed them into obs. I would need to keep the actual ID.
I tried this to see if I could carry all the variables I needed (A B C), including the numericID, but it does not work. Is the solution to this, merging the wide data file back with the main data file? I would then rename the COL variables to make them permanent variables.
Thanks again.
data long (keep= A B C numericID LONGVAR phrase);
set have;
STRINGid = _n_;
nTerms = count(LONGVAR, ";")+1;
do i=1 to nTerms;
phrase = scan(LONGVAR, i, ";");
output;
end;
run;
proc transpose data=long out=wide;
by STRINGid;
var phrase;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.