BookmarkSubscribeRSS Feed
kapow
Calcite | Level 5

If someone can help with this problem, you will make life!

I have a variable ("paragraph") that has random output data with both characters and numeric value. For example, one observation looks like this:

CUSIP NO. 90130N 10 3 - --------------------- - -------------------------------------------------------------------------------- (1) Name of Reporting PersonS.S. or I.R.S. Identification No. of Above Person American International Group, Inc. (I.R.S. Identification No. 13-2592361) - --------------------------------------------------------------------------------

I've tried parse macros and explode macros but the data seems to be too messy for both of these. The only information I need is the underlined part (American International Group, Inc.), but I have no idea how to tell SAS to do this. Furthermore, there's no standardization between observations and the information I need from each observation changes. Said differently, I need the name of each reporting person, which changes with each observation.

Any help would be very much appreciated!

Thanks so much!

6 REPLIES 6
slchen
Lapis Lazuli | Level 10

data have;

infile cards truncover;

input;

name=prxchange('s/.* Person (.*) \(.*/$1/io',-1,_infile_);

cards;

CUSIP NO. 90130N 10 3 - -------------- (1) Name of Reporting PersonS.S. or I.R.S. Identification No. of Above Person American International Group, Inc. (I.R.S. Identification No. 13-2592361) - ---

run;

Patrick
Opal | Level 21

Using Regular Expressions feels like the way to go. In order to come up with a RegEx realistic for your data could you please provide some more sample data (as heterogeneous as possible).

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Irrespective of solution, e.g. RegEx, or index+substr or something else, you would need to have some indicator of where the data you want starts and ends.  If it is the word person and (, then its pretty straightforward,

substr(text,index(text,"Person")+1,length(text)-index(text,"(I.R.S"));

Patrick
Opal | Level 21

Yep - that's why I'm asking for more sample data so that we can get an idea if there is a pattern at all which allows us to identify the wanted sub-string.

kapow
Calcite | Level 5

Thank you for your help with this request! Some additional examples are below. I've underlined the part I need. It seems like the text I need either follows "(entities only)" or is sandwiched between "name of reporting persons." and "I.R.S. Identification."

1 NAMES OF REPORTING PERSONS CLARUS CAPITAL GROUP MANAGEMENT LPI.R.S. IDENTIFICATION NO. OF ABOVE PERSON (ENTITIES ONLY)20-8098367

SCHEDULE 13D CUSIP No. 068306109 1) NAMES OF REPORTING PERSONS I.R.S. IDENTIFICATION NOS. OF ABOVE PERSONS (ENTITIES ONLY) Bernard C. Sherman 2) CHECK THE APPROPRIATE BOX IF A MEMBER OF A GROUP (SEE INSTRUCTIONS)

1. Names of Reporting Persons. I.R.S. Identification Nos. of above persons (entities only) Textron Inc.

1 NAMES OF REPORTING PERSONS Lonnie J. Stout II


1. Names of Reporting Persons. P STYLEmargin-top0pxmargin-bottom0pxI.R.S. Identification Nos. of above persons (entities only) P STYLEmargin-top0pxmargin-bottom1pxWal-Mart Stores, Inc.

Ksharp
Super User

data have;

length a $ 200;

a='1 NAMES OF REPORTING PERSONS CLARUS CAPITAL GROUP MANAGEMENT LPI.R.S. IDENTIFICATION NO. OF ABOVE PERSON (ENTITIES ONLY)20-8098367 ';output;

a='SCHEDULE 13D CUSIP No. 068306109 1) NAMES OF REPORTING PERSONS I.R.S. IDENTIFICATION NOS. OF ABOVE PERSONS (ENTITIES ONLY) Bernard C. Sherman 2) CHECK THE APPROPRIATE BOX IF A MEMBER OF A GROUP (SEE INSTRUCTIONS)';output;

run;

data want;

set have;

re = prxparse('/NAMES OF REPORTING PERSONS(.+)I\.R\.S\./io');

if prxmatch(re, a) then first = prxposn(re, 1, a);

if missing(first) then do;

re = prxparse('/ENTITIES ONLY\)(\D+)/io');

if prxmatch(re, a) then first = prxposn(re, 1, a);

end;

run;

Xia Keshan

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1218 views
  • 0 likes
  • 5 in conversation