I receive an excel file where one of the variables contains multiple coded entries with many different delimiters. Example: cell contains NV1,NV2 NV5 & NV7;NV4.
How can I account for every delimiter and parse that one cell into many?
I can see two options.
1. Inside Enterprise Guide, create a new query based on your data. Assume your concatenated variable is called InVar. Create a new computed column, with an advanced expression, the expression should be
scan(InVar)
Name the result Out1 (Note there are various parms for scan, you can fiddle with them to pick exactly the delimiters you want).
Create new expressions like this up to the maximum number of strings that are contained within your variable. It's cumbersome, but it works.
2. Using SAS code. Assuming you have a key for your data file (I just used the record number, called it SeqNum), the following code should work.
data inter;
length OutWord $32;
set have;
NumWords = countw(InVar);
do i = 1 to NumWords;
OutWord = scan(InVar, i);
output;
end;
run;
proc transpose data=inter out=want prefix=COL;
var OutWord;
by SeqNum;
id i;
run;
Tom
Use the SCAN function.
However, you'll have to decide on the logic for how you want to deal with AND. My guess is you have survey data and the AND allow for multiple responses, so you need to explicitly decide how to deal with it.
Is there a certain logic that forces you to treat the different delimiters (space, comma, semicolon, ampersand) differently?
No. I'm just a novice here and I don't see how or where I can do this in enterprise guide.
A previous post suggest the SCAN function but again, not sure how. Could use some examples.
I can see two options.
1. Inside Enterprise Guide, create a new query based on your data. Assume your concatenated variable is called InVar. Create a new computed column, with an advanced expression, the expression should be
scan(InVar)
Name the result Out1 (Note there are various parms for scan, you can fiddle with them to pick exactly the delimiters you want).
Create new expressions like this up to the maximum number of strings that are contained within your variable. It's cumbersome, but it works.
2. Using SAS code. Assuming you have a key for your data file (I just used the record number, called it SeqNum), the following code should work.
data inter;
length OutWord $32;
set have;
NumWords = countw(InVar);
do i = 1 to NumWords;
OutWord = scan(InVar, i);
output;
end;
run;
proc transpose data=inter out=want prefix=COL;
var OutWord;
by SeqNum;
id i;
run;
Tom
Thank you.
Used option 2 then just created a summary table.
Many, many, many thanks. 🙂
Glad it worked.
This is part of the SAS "secret sauce". The non-programmer oriented tools like EG will do most of the work, but when you need "that little bit extra", the SAS programming language is completely oriented to data management and transformation, and in my opinion much easier than the alternatives. It makes for a powerful combination!
Tom
Attached is a sample.
Column A is what I get. I would like to convert that to Columns B, C, D, E, ...
It's easy to do in excel (Date - Text to columns) so I'm sure I'm just being blind in SAS EG.
No, you're not being blind. That's simply an operation that they haven't built into the "easy to use" tasks in EG. Easy to do with SAS code.
How are you getting this data? A text file initially?
As a spreadsheet.
Is it something I can do on import?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.