Hi Experts,
In a SAS table, I have a variable (character) that has text/character values in it. I need to create new variables and supply values to them based on an already existing column's available values. For example:
A column named COMMENT that has values( in multiple rows: comment1, comment2, comment3 etc.) everything under the sun like: comment1_value- Strictly follow the guidelines. Refrain from generalization. Oral use only.
comment2_value: Oral use only. Manufacturer not responsible for any mishaps.comment3_value: Available in NC <state> only.comment4-value: Product type= non invasive, hypodermic use only. comment5_value: Use if only confident.
comment6_value: Oral or usp: EVER oral unless otherwise noted.comment7_value: Oral only.Chk brochure, refer to point #9.comment8_value:Usage type: oral only. comment9_value:Invasive, IV use.
So now I need to return columns like these:
1.usage_mode and supply 'oral' as its value excluding other words (source- comment1_value: Strictly follow the guidelines. Refrain from generalization.Oral use only)
2.where_available and supply 'NC' as its value(source- comment3_value: Available in NC <state> only)
3.usage_type and supply 'usp' as its value (source-comment6_value: Oral or usp: EVER oral unless otherwise noted.)
4.product_type and supply 'IV use' as its value (source_ comment9_value:Invasive, IV use.)
and the list goes on
The only common thing is I need to search for relevant words from the available info (in the preceding/following column-in the same row) and extract relevant words/phrases/values and supply values to the adjacent rows under the newly created columns.
Can you please share any helpful links to any character functions(text mining) in SAS (and/or proc sql) other than scan function?
Thanks for your assistance in advance.
In a SAS data set this basically meaningless: "A column named COMMENT that has values( in multiple rows:"
A SAS variable exists on a single observation.
One suspects the source of this data but If you are reading a TEXT file of some sort you might provide that instead of this rambling description. This sort of thing is often better (and sometimes easier) addressed when reading the data.
Basically without an actual example of your data we can't provide much help (yet).
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Maybe, the sort of stuff you describe may not handle conversion to a data step well.
Thanks for the suggestion.
I am sorry for the ramblings. I am looking for suggestions on what functions could be employed. I have limited knowledge of scan and tranwrd functions, which I don't believe, are sufficient in this case. Or it may be my ignorance.
A better way to search text: Perl regular expressions in SAS
https://support.sas.com/resources/papers/proceedings11/006-2011.pdf
Skim all character functions:
A simple approach to text analysis using SAS functions
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2557-2018.pdf
@inquistive wrote:
Hi Experts,
In a SAS table, I have a variable (character) that has text/character values in it. I need to create new variables and supply values to them based on an already existing column's available values. For example:
A column named COMMENT that has values( in multiple rows: comment1, comment2, comment3 etc.) everything under the sun like: comment1_value- Strictly follow the guidelines. Refrain from generalization. Oral use only.
comment2_value: Oral use only. Manufacturer not responsible for any mishaps.comment3_value: Available in NC <state> only.comment4-value: Product type= non invasive, hypodermic use only. comment5_value: Use if only confident.
comment6_value: Oral or usp: EVER oral unless otherwise noted.comment7_value: Oral only.Chk brochure, refer to point #9.comment8_value:Usage type: oral only. comment9_value:Invasive, IV use.So now I need to return columns like these:
1.usage_mode and supply 'oral' as its value excluding other words (source- comment1_value: Strictly follow the guidelines. Refrain from generalization.Oral use only)
2.where_available and supply 'NC' as its value(source- comment3_value: Available in NC <state> only)
3.usage_type and supply 'usp' as its value (source-comment6_value: Oral or usp: EVER oral unless otherwise noted.)
4.product_type and supply 'IV use' as its value (source_ comment9_value:Invasive, IV use.)
and the list goes on
The only common thing is I need to search for relevant words from the available info (in the preceding/following column-in the same row) and extract relevant words/phrases/values and supply values to the adjacent rows under the newly created columns.
Can you please share any helpful links to any character functions(text mining) in SAS (and/or proc sql) other than scan function?
Thanks for your assistance in advance.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.