Hi all,
I wish to extract some words from the variable observation.
My goal is to extract these words (such as " H1", "N1") from the variable (observation) to the variable observation1.
Thanks all in advance,
So my data is described below:
data test2; set test2;
user | Observation | observation1 |
367 | sample type indicated the presence of subtype H1. However, | H1 |
1427 | sample type indicated the presence of subtype H1. However, the | H1 |
3046 | sample type this case and indicated the presence of multiple types, including H1, H3, N1, and N2. | H1, H3 and N1 |
3146 | sample type and indicated the presence of multiple types, including H1, H3, N1, and N2. | H1, H3, N1, and N2 |
8910 | sample type (H,N) fluid indicated the presence of multiple types, including H1, H3 and N2. | H1, H3 and N2 |
9091 | sample type indicated the presence of type N1. | N1 |
Please try this:
data test_1;
set test;
length word $32. Observation_1 $72.;
delims = " =;,.()<>/'"; /* delimiters: space, comma, period, ... */
numWords = countw(Observation, delims); /* determining how many words in the text */
Observation_1='';
do i = 1 to numWords; /* split text into words */
word = scan(Observation, i, delims);
if compress(word) in ('H1', 'H2', 'H3', 'N1', 'N2') then
Observation_1=trim(Observation_1)||compress(word)||',';
end;
Observation_1=substr(compress(Observation_1), 1, length(compress(Observation_1))-1);
keep user Observation Observation_1;
run;
Dear ballardw,
My final result for "observation1 variable" initially is to obtain information from each line.
For example, I would like these "multiple types, including h1, H3 and N1.
So, if I have this result counting as you mention. Later, I can transform it into an outcome like "h1, H3 and N1" or "N1" in case of the other result.
Thanks in advance again,
Daniel
@Moraes86 wrote:
Dear ballardw,
My final result for "observation1 variable" initially is to obtain information from each line.
For example, I would like these "multiple types, including h1, H3 and N1.
So, if I have this result counting as you mention. Later, I can transform it into an outcome like "h1, H3 and N1" or "N1" in case of the other result.
Thanks in advance again,
Daniel
And this is exactly why we request that you do not post duplicate questions. I asked a question in the OTHER thread https://communities.sas.com/t5/SAS-Programming/How-to-extract-a-specific-text-from-each-cell-from-th...
and you "answer" it here. Actually still haven't answered it because the question was about how consistent your text actually is, and how the *bleep* you are going to use the resulting variable.
Please try this:
data test_1;
set test;
length word $32. Observation_1 $72.;
delims = " =;,.()<>/'"; /* delimiters: space, comma, period, ... */
numWords = countw(Observation, delims); /* determining how many words in the text */
Observation_1='';
do i = 1 to numWords; /* split text into words */
word = scan(Observation, i, delims);
if compress(word) in ('H1', 'H2', 'H3', 'N1', 'N2') then
Observation_1=trim(Observation_1)||compress(word)||',';
end;
Observation_1=substr(compress(Observation_1), 1, length(compress(Observation_1))-1);
keep user Observation Observation_1;
run;
Hi Jerrya00,
It worked very well.
Thanks all for the valuable help.
Daniel
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.