Hi all,
I wish to extract some words from the variable observation.
My goal is to extract these words (such as " H1", "N1") from the variable (observation) to the variable observation1.
Thanks all in advance,
So my data is described below:
data test2; set test2;
user | Observation | observation1 |
367 | sample type indicated the presence of subtype H1. However, | H1 |
1427 | sample type indicated the presence of subtype H1. However, the | H1 |
3046 | sample type this case and indicated the presence of multiple types, including H1, H3, N1, and N2. | H1, H3 and N1 |
3146 | sample type and indicated the presence of multiple types, including H1, H3, N1, and N2. | H1, H3, N1, and N2 |
8910 | sample type (H,N) fluid indicated the presence of multiple types, including H1, H3 and N2. | H1, H3 and N2 |
9091 | sample type indicated the presence of type N1. | N1 |
Please try this:
data test_1;
set test;
length word $32. Observation_1 $72.;
delims = " =;,.()<>/'"; /* delimiters: space, comma, period, ... */
numWords = countw(Observation, delims); /* determining how many words in the text */
Observation_1='';
do i = 1 to numWords; /* split text into words */
word = scan(Observation, i, delims);
if compress(word) in ('H1', 'H2', 'H3', 'N1', 'N2') then
Observation_1=trim(Observation_1)||compress(word)||',';
end;
Observation_1=substr(compress(Observation_1), 1, length(compress(Observation_1))-1);
keep user Observation Observation_1;
run;
Dear ballardw,
My final result for "observation1 variable" initially is to obtain information from each line.
For example, I would like these "multiple types, including h1, H3 and N1.
So, if I have this result counting as you mention. Later, I can transform it into an outcome like "h1, H3 and N1" or "N1" in case of the other result.
Thanks in advance again,
Daniel
@Moraes86 wrote:
Dear ballardw,
My final result for "observation1 variable" initially is to obtain information from each line.
For example, I would like these "multiple types, including h1, H3 and N1.
So, if I have this result counting as you mention. Later, I can transform it into an outcome like "h1, H3 and N1" or "N1" in case of the other result.
Thanks in advance again,
Daniel
And this is exactly why we request that you do not post duplicate questions. I asked a question in the OTHER thread https://communities.sas.com/t5/SAS-Programming/How-to-extract-a-specific-text-from-each-cell-from-th...
and you "answer" it here. Actually still haven't answered it because the question was about how consistent your text actually is, and how the *bleep* you are going to use the resulting variable.
Please try this:
data test_1;
set test;
length word $32. Observation_1 $72.;
delims = " =;,.()<>/'"; /* delimiters: space, comma, period, ... */
numWords = countw(Observation, delims); /* determining how many words in the text */
Observation_1='';
do i = 1 to numWords; /* split text into words */
word = scan(Observation, i, delims);
if compress(word) in ('H1', 'H2', 'H3', 'N1', 'N2') then
Observation_1=trim(Observation_1)||compress(word)||',';
end;
Observation_1=substr(compress(Observation_1), 1, length(compress(Observation_1))-1);
keep user Observation Observation_1;
run;
Hi Jerrya00,
It worked very well.
Thanks all for the valuable help.
Daniel
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: