BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mscarboncopy
Pyrite | Level 9

Hello. I have data sets with a long character variable (Longvar) that I want to split into 3 shorter character variables (Var1 Var2 Var3) . I might have more than 3 (depending on the data set I am working with) but to make it simple I will go with 3. For example: 

 

subj         Longvar                                                       Var1              Var2               Var3

1              "I need; to create; 3 short variables"        I need           to create        3 short variables

2              "Must; create; three short variables"        Must             create            three short variables

etc

 

Note how what is actually written varies, depending on the subject, but I was hoping that the semicolon would be all I needed to be able to split the string - not matter what is written after the semicolon. I know this can be done in SPSS, but I can't seem to find the equivalent in SAS.

What would be the most effective way to do this in SAS?. And, if it can be done, could I create more than 3 variables. Is there a max limit ? 

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data have;
longvar = "I need; to create; 3 short variables"; output;
longvar="Must; create; three short variables"; output;
run;

data long;
set have;
ID = _n_;
nTerms = count(longVar, ";")+1;

do i=1 to nTerms;
phrase = scan(longVar, i, ";");
output;
end;
run;

proc transpose data=long out=wide;
by ID;
var phrase;
run;

@Mscarboncopy wrote:

Hello. I have data sets with a long character variable (Longvar) that I want to split into 3 shorter character variables (Var1 Var2 Var3) . I might have more than 3 (depending on the data set I am working with) but to make it simple I will go with 3. For example: 

 

subj         Longvar                                                       Var1              Var2               Var3

1              "I need; to create; 3 short variables"        I need           to create        3 short variables

2              "Must; create; three short variables"        Must             create            three short variables

etc

 

Note how what is actually written varies, depending on the subject, but I was hoping that the semicolon would be all I needed to be able to split the string - not matter what is written after the semicolon. I know this can be done in SPSS, but I can't seem to find the equivalent in SAS.

What would be the most effective way to do this in SAS?. And, if it can be done, could I create more than 3 variables. Is there a max limit ? 

 

Thank you!


 

View solution in original post

18 REPLIES 18
Astounding
PROC Star

First, let's confirm whether you really need to do this.  PROC REPORT has a "wrap" option that will take long text and print it in as many lines as are needed.  So what do you gain by splitting the text yourself?

Mscarboncopy
Pyrite | Level 9

I need them to be separate variables that will be used in analysis. A lot of these will be numbers.

Reeza
Super User
SCAN() is the function you're looking for. You can create as many variable as you want but you need to know the maximum ahead of time. If you don't, you either do it in multiple steps or you create a row for each value and then transpose it afterwards which is a fully dynamic solution that doesn't require any information ahead of time.

Mscarboncopy
Pyrite | Level 9

Thank you. I am not familiar with Scan. 

And what I can find is this (please see below). Do I have to write each String? This can't be it though, because it creates only one variable Word. 

Data want;

Set have;

keep count word;
length word $30;
string=',leading, trailing,and multiple,,delimiters,,';
delim=',';
modif='mo';
nwords=countw(string, delim, modif);
do count=1 to nwords;
word=scan(string, count, delim, modif);
output;
end;
run;

ballardw
Super User

First really determine need.

 

The SCAN function can use a specific delimiter to be used in getting the strings separated by that character.

Countw can tell you how many sets of characters you have to process.

 

An example:

data example;
   infile datalines missover;
   input x $35.;
   array w {3} $ 20;
   do i=1 to countw(x,';');
      w[i]= scan(x,i,';');
   end;
   drop i;
datalines4;
This; is sometext;separated 
More text; two substrings
;more interesting; set of words
;;;;

The array is a simple way to define the new variables by name, creating W1, W2 and W3, setting the length for the variables at 20 characters each. The Do loop shows one way to process this.

Mscarboncopy
Pyrite | Level 9

Thank you.

So I am trying this:

Data Want;

Set have;

array w {3} $ 20;
do i=1 to countw(LONGVAR,';');
w[i]= scan(LONGVAR,i,';');
end;
drop i;
Run;

 

and I see the W vars being created but it says my LONGVAR is uninitialized, which is odd because I see the variable in the data set have. Due to this, all of the 3 vars have missing values.

It is also saying  "Numeric values have been converted to character values"

Which is odd, since LONGVAR is a string var. 

I must be doing something wrong.

 

 

 

ballardw
Super User

@Mscarboncopy wrote:

Thank you.

So I am trying this:

Data Want;

Set have;

array w {3} $ 20;
do i=1 to countw(LONGVAR,';');
w[i]= scan(LONGVAR,i,';');
end;
drop i;
Run;

 

and I see the W vars being created but it says my LONGVAR is uninitialized, which is odd because I see the variable in the data set have. Due to this, all of the 3 vars have missing values.

It is also saying  "Numeric values have been converted to character values"

Which is odd, since LONGVAR is a string var. 

I must be doing something wrong.

 

 

 


That means you are referencing a variable that doesn't exist in your HAVE data set. So, what is the actual name of the variable that has the long text?

Mscarboncopy
Pyrite | Level 9

Yes sorry it was a typo in the code. I found it and fixed it.

Now this is the error I am getting for this line:

 

w[i]= scan(LONGVAR,i,';');

Array subscript out of range at line 6526 column 7.

 

 

ballardw
Super User

@Mscarboncopy wrote:

Yes sorry it was a typo in the code. I found it and fixed it.

Now this is the error I am getting for this line:

 

w[i]= scan(LONGVAR,i,';');

Array subscript out of range at line 6526 column 7.

 

 


And just how many ; are in your values. I provide an example for 3 if you do not know how many you actually have then you need a different approach like @Reeza shows, or specify a larger number in the W array to match the maximum number of ; that appear in your variable.

Reeza
Super User
data have;
longvar = "I need; to create; 3 short variables"; output;
longvar="Must; create; three short variables"; output;
run;

data long;
set have;
ID = _n_;
nTerms = count(longVar, ";")+1;

do i=1 to nTerms;
phrase = scan(longVar, i, ";");
output;
end;
run;

proc transpose data=long out=wide;
by ID;
var phrase;
run;

@Mscarboncopy wrote:

Hello. I have data sets with a long character variable (Longvar) that I want to split into 3 shorter character variables (Var1 Var2 Var3) . I might have more than 3 (depending on the data set I am working with) but to make it simple I will go with 3. For example: 

 

subj         Longvar                                                       Var1              Var2               Var3

1              "I need; to create; 3 short variables"        I need           to create        3 short variables

2              "Must; create; three short variables"        Must             create            three short variables

etc

 

Note how what is actually written varies, depending on the subject, but I was hoping that the semicolon would be all I needed to be able to split the string - not matter what is written after the semicolon. I know this can be done in SPSS, but I can't seem to find the equivalent in SAS.

What would be the most effective way to do this in SAS?. And, if it can be done, could I create more than 3 variables. Is there a max limit ? 

 

Thank you!


 

Mscarboncopy
Pyrite | Level 9

Thank you. If I have to write this for each string in my data files, I won't be able to do it this way, as I have hundreds of them. Unless, I am misunderstanding.

longvar = "I need; to create; 3 short variables"; output;
longvar="Must; create; three short variables"; output;
run;

 

Reeza
Super User

Then maybe start your process at the next step and ensure it refers to the correct input data set that's appropriate for you.

Mscarboncopy
Pyrite | Level 9

Thank you. Yes, that makes sense 🙂 

It worked really well. Another basic question: How do I make the Col vars permanent variables?

I will need to rename them as well. 

Also, the wide file does not keep the IDs, it transformed them into obs. I would need to keep the actual ID.

Mscarboncopy
Pyrite | Level 9

I tried this to see if I could carry all the variables I needed (A B C), including the numericID, but it does not work. Is the solution to this, merging the wide data file back with the main data file? I would then rename the COL variables to make them permanent variables. 

Thanks again.

data long (keep= A B C numericID LONGVAR phrase);

set have;

STRINGid = _n_;

nTerms = count(LONGVAR, ";")+1;

 

do i=1 to nTerms;

phrase = scan(LONGVAR, i, ";");

output;

end;

run;

 

proc transpose data=long out=wide;

by STRINGid;

var phrase;

run;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1474 views
  • 4 likes
  • 5 in conversation