If someone can help with this problem, you will make life!
I have a variable ("paragraph") that has random output data with both characters and numeric value. For example, one observation looks like this:
CUSIP NO. 90130N 10 3 - --------------------- - -------------------------------------------------------------------------------- (1) Name of Reporting PersonS.S. or I.R.S. Identification No. of Above Person American International Group, Inc. (I.R.S. Identification No. 13-2592361) - --------------------------------------------------------------------------------
I've tried parse macros and explode macros but the data seems to be too messy for both of these. The only information I need is the underlined part (American International Group, Inc.), but I have no idea how to tell SAS to do this. Furthermore, there's no standardization between observations and the information I need from each observation changes. Said differently, I need the name of each reporting person, which changes with each observation.
Any help would be very much appreciated!
Thanks so much!
data have;
infile cards truncover;
input;
name=prxchange('s/.* Person (.*) \(.*/$1/io',-1,_infile_);
cards;
CUSIP NO. 90130N 10 3 - -------------- (1) Name of Reporting PersonS.S. or I.R.S. Identification No. of Above Person American International Group, Inc. (I.R.S. Identification No. 13-2592361) - ---
run;
Using Regular Expressions feels like the way to go. In order to come up with a RegEx realistic for your data could you please provide some more sample data (as heterogeneous as possible).
Irrespective of solution, e.g. RegEx, or index+substr or something else, you would need to have some indicator of where the data you want starts and ends. If it is the word person and (, then its pretty straightforward,
substr(text,index(text,"Person")+1,length(text)-index(text,"(I.R.S"));
Yep - that's why I'm asking for more sample data so that we can get an idea if there is a pattern at all which allows us to identify the wanted sub-string.
Thank you for your help with this request! Some additional examples are below. I've underlined the part I need. It seems like the text I need either follows "(entities only)" or is sandwiched between "name of reporting persons." and "I.R.S. Identification."
1 NAMES OF REPORTING PERSONS CLARUS CAPITAL GROUP MANAGEMENT LPI.R.S. IDENTIFICATION NO. OF ABOVE PERSON (ENTITIES ONLY)20-8098367
SCHEDULE 13D CUSIP No. 068306109 1) NAMES OF REPORTING PERSONS I.R.S. IDENTIFICATION NOS. OF ABOVE PERSONS (ENTITIES ONLY) Bernard C. Sherman 2) CHECK THE APPROPRIATE BOX IF A MEMBER OF A GROUP (SEE INSTRUCTIONS)
1. Names of Reporting Persons. I.R.S. Identification Nos. of above persons (entities only) Textron Inc.
1 NAMES OF REPORTING PERSONS Lonnie J. Stout II
1. Names of Reporting Persons. P STYLEmargin-top0pxmargin-bottom0pxI.R.S. Identification Nos. of above persons (entities only) P STYLEmargin-top0pxmargin-bottom1pxWal-Mart Stores, Inc.
data have;
length a $ 200;
a='1 NAMES OF REPORTING PERSONS CLARUS CAPITAL GROUP MANAGEMENT LPI.R.S. IDENTIFICATION NO. OF ABOVE PERSON (ENTITIES ONLY)20-8098367 ';output;
a='SCHEDULE 13D CUSIP No. 068306109 1) NAMES OF REPORTING PERSONS I.R.S. IDENTIFICATION NOS. OF ABOVE PERSONS (ENTITIES ONLY) Bernard C. Sherman 2) CHECK THE APPROPRIATE BOX IF A MEMBER OF A GROUP (SEE INSTRUCTIONS)';output;
run;
data want;
set have;
re = prxparse('/NAMES OF REPORTING PERSONS(.+)I\.R\.S\./io');
if prxmatch(re, a) then first = prxposn(re, 1, a);
if missing(first) then do;
re = prxparse('/ENTITIES ONLY\)(\D+)/io');
if prxmatch(re, a) then first = prxposn(re, 1, a);
end;
run;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.