BookmarkSubscribeRSS Feed
inquistive
Quartz | Level 8

Hi Experts,
In a SAS table, I have a variable (character) that has text/character values in it. I need to create new variables and supply values to them based on an already existing column's available values. For example:
A column named COMMENT that has values( in multiple rows: comment1, comment2, comment3 etc.) everything under the sun like: comment1_value- Strictly follow the guidelines. Refrain from generalization. Oral use only.
comment2_value: Oral use only. Manufacturer not responsible for any mishaps.comment3_value: Available in NC <state> only.comment4-value: Product type= non invasive, hypodermic use only. comment5_value: Use if only confident.
comment6_value: Oral or usp: EVER oral unless otherwise noted.comment7_value: Oral only.Chk brochure, refer to point #9.comment8_value:Usage type: oral only. comment9_value:Invasive, IV use.

So now I need to return columns like these:
1.usage_mode and supply 'oral' as its value excluding other words (source- comment1_value: Strictly follow the guidelines. Refrain from generalization.Oral use only)
2.where_available and supply 'NC' as its value(source- comment3_value: Available in NC <state> only)
3.usage_type and supply 'usp' as its value (source-comment6_value: Oral or usp: EVER oral unless otherwise noted.)
4.product_type and supply 'IV use' as its value (source_ comment9_value:Invasive, IV use.)
and the list goes on
The only common thing is I need to search for relevant words from the available info (in the preceding/following column-in the same row) and extract relevant words/phrases/values and supply values to the adjacent rows under the newly created  columns.
Can you please share any helpful links to any character functions(text mining) in SAS (and/or proc sql) other than scan function?
Thanks for your assistance in advance.

3 REPLIES 3
ballardw
Super User

In a SAS data set this basically meaningless: "A column named COMMENT that has values( in multiple rows:"

A SAS variable exists on a single observation.

 

One suspects the source of this data but If you are reading a TEXT file of some sort you might provide that instead of this rambling description. This sort of thing is often better (and sometimes easier) addressed when reading the data.

Basically without an actual example of your data we can't provide much help (yet).

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

Maybe, the sort of stuff you describe may not handle conversion to a data step well.

 

 

inquistive
Quartz | Level 8

@ballardw,

Thanks for the suggestion.

I am sorry for the ramblings. I am looking for suggestions on what functions could be employed. I have limited knowledge of scan and tranwrd  functions, which  I don't believe, are sufficient in this case. Or it may be my  ignorance.

Reeza
Super User

A better way to search text: Perl regular expressions in SAS

https://support.sas.com/resources/papers/proceedings11/006-2011.pdf

 

Skim all character functions:

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n01f5qrjoh9h4hn1olbdpb5pr2td.h...

 

A simple approach to text analysis using SAS functions

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2557-2018.pdf

 


@inquistive wrote:

Hi Experts,
In a SAS table, I have a variable (character) that has text/character values in it. I need to create new variables and supply values to them based on an already existing column's available values. For example:
A column named COMMENT that has values( in multiple rows: comment1, comment2, comment3 etc.) everything under the sun like: comment1_value- Strictly follow the guidelines. Refrain from generalization. Oral use only.
comment2_value: Oral use only. Manufacturer not responsible for any mishaps.comment3_value: Available in NC <state> only.comment4-value: Product type= non invasive, hypodermic use only. comment5_value: Use if only confident.
comment6_value: Oral or usp: EVER oral unless otherwise noted.comment7_value: Oral only.Chk brochure, refer to point #9.comment8_value:Usage type: oral only. comment9_value:Invasive, IV use.

So now I need to return columns like these:
1.usage_mode and supply 'oral' as its value excluding other words (source- comment1_value: Strictly follow the guidelines. Refrain from generalization.Oral use only)
2.where_available and supply 'NC' as its value(source- comment3_value: Available in NC <state> only)
3.usage_type and supply 'usp' as its value (source-comment6_value: Oral or usp: EVER oral unless otherwise noted.)
4.product_type and supply 'IV use' as its value (source_ comment9_value:Invasive, IV use.)
and the list goes on
The only common thing is I need to search for relevant words from the available info (in the preceding/following column-in the same row) and extract relevant words/phrases/values and supply values to the adjacent rows under the newly created  columns.
Can you please share any helpful links to any character functions(text mining) in SAS (and/or proc sql) other than scan function?
Thanks for your assistance in advance.


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 326 views
  • 1 like
  • 3 in conversation