Help using Base SAS procedures

Code to extract information from the text

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 96
Accepted Solution

Code to extract information from the text

Hi All,

I need your help in extracting an information from the text. I have a column with executive information. From this column, I need to create a variable, UnderGrad, that takes value 1 if the executive has a bachelors degree in accounting, finance, or economics; 2 if the degree is in field other than accounting, finance, or economics; and 3 if the field is missing. I will greatly appreciate if someone suggest me a code to create this variable. Subsample of my data looks as follows:

NameBiography
William DoreyMr. William G. Dorey, Bill, served as the Chief Executive Officer of Granite Construction Inc., from January 1, 2004 to August 31, 2010 and as its President from February 2003 to August 31, 2010. Mr. Dorey joined Granite Construction Inc. in 1968 and served as its Chief Operating Officer from 1998 to February 2003, Senior Vice President from February 2003 to January 8, 2004, Manager of Branch Division from 1987 to 1998, Vice President and Assistant Manager of Branch Division from 1983 to 1987. He started his career with Granite Construction in 1967. He has extensive experience within the infrastructure construction industry. Mr. Dorey serves as a Director of Wilder Construction Company. He has been a Director at 1st Capital Bank since April 17, 2012, Astec Industries Inc., since April 28, 2011 and Granite Construction Inc., since January 22, 2004. He serves as Trustee of the Norman Y. Mineta International Institute for Surface Transportation Policy Studies, Member on the Construction Industry Round Table (CIRT), Director of the California Chamber of Commerce, and Director of the California Business Roundtable. He served as a Director of Carmel Youth Baseball. He served as a Director of TIC Holdings, Inc. from 1997 to 2002. He is Founding Chairman of Construction Industry Ethics Compliance Initiative (CIECI) Steering Committee. He has served on Cal Poly Dean's Advisory Council, and the Caltrans/ AGC Liaison Committee. He serves as the President of White Rock Club. He was honored for his work in the community by the Junior Achievement of Silicon Valley and Monterey Bay in 2009 and is an active supporter of the Boys and Girls Club of the Monterey Peninsula. He holds a B.S. degree in Construction Engineering from Arizona State University in 1967.
William EllingsonMr. William Ellingson, Bill is an Investment Professional of Galena Capital LLC. Mr. Ellingson has over 33 years of transactional experience including large leveraged leases and acquisition and financing in the refining, chemical and mining industries. Examples of transactions are the acquisition and leasing of a 250 ton per day methanol plant, the origination and structuring of a Section 29 synfuel plant which generated in excess of $1,000,000,000 in tax credits, and the origination, structuring, and financing of a utility coal supply company, including acquisition of eleven unit car train sets (110 rail cars per set), involving a purchase price and financing in excess of $110,000,000. He served as Principal at Apollo Capital Group and Vice President at LInc. Scientific Leasing. He served as Director, Structured Finance at Republic Financial Corporation. Mr. Ellingson serves as a Managing Member of a special opportunity finance company. Mr. Ellingson has been an Outside Director of 1st NRG Corp since July 2011. He received a B.S. in Finance and Economics from the University of Montana in 1975.
Kevin NorrisMr. Kevin P. Norris has been Chief Executive Officer of 1st NRG Corp. since December 1, 2009. Mr. Norris is the Founder of e2 Business Services, Inc. and serves as its Chief Executive Officer and Controlling Shareholder. He has 33 years of industry experience with various energy companies including Apache Corporation, Universal Fuels Company, TOP Gas Gathering and BlueCreek Energy. Through his career, Mr. Norris has been involved in the drilling, operating, transportation and marketing of both oil and gas wells and specifically CBM (Coal bed Methane) wells for over 11 years. He has over 28 years of industry experience. He has been a Director of Eco-Trade Corporation since April 22, 2013. Mr. Norris serves as Director of 1st NRG Corp. Mr. Norris serves as Member of Board of Directors at BlueCreek Energy, Inc. and e2 Business Services, Inc. He was the Chairman of the IPAMS Natural Gas Committee. Mr. Norris received a Bachelor of Science degree in Business Administration from Colorado State University in 1979.
William SharpMr. William John Sharp has been President of Global Industrial Consulting since 2001. Mr. Sharp serves as a consultant to various private equity groups. He has actively advised leading global private equity investors in their investments in China. He has more than 40 years of experience in the tire manufacturing industry. He served as President of North American Tire for The Goodyear Tire & Rubber Company from 1999 to 2000. He began his career with Goodyear in 1964. Following various assignments in the United States and abroad, He served as Director of European Tire Production in 1984. From 1964 to 2000, Mr. Sharp served at Goodyear Tire and Rubber Company in various capacities. He served as Vice President of Tire Manufacturing in 1987 and later as Executive Vice President of Product Supply since 1991. In 1992, he became General Manager of Goodyear's European Regional Operations. He served as president of Goodyear Europe, Middle East and Africa from 1992 to 1996. Mr. Sharp served as President of Goodyear Global Support Operations from 1996 to 1999. He has been an Independent Director of Acquity Group LLC since 2008 and China Zenix Auto International Ltd., since May 2011. He has been an Independent Director of Acquity Group Limited since 2010. He has been an Independent Non-Executive Director of Xingda International Holdings Ltd. since August 2005. He served as a Director of Ferro Corp. from 1998 to April 2012; 2020 ChinaCap Acquirco Inc since January 2007 and Exceed Company Ltd from November 2009 to February 4, 2012. Mr. Sharp graduated with a Bachelor's degree of Science, majoring in Industrial Engineering, from The Ohio State University in 1963.
Derek BurkeMr. Derek C. Burke is Founder of WBQ Design & Engineering Inc. and serves as its President. Mr. Burke is a licensed professional engineer. He served as the Chairman of the City of Orlando Downtown Development Board and Community Redevelopment Agency Advisory Board. He has been a Director of 1st United Bancorp, Inc., since April 1, 2012. He has been a Director of Old Harbor Bank since April 2012. He serves as a Director at Anderen Bank, a subsidiary of Anderen Financial, Inc. Mr. Burke also serves on the Board of the Orlando Neighborhood Development Corporation. Mr. Burke served as a Director of Florida Choice Bankshares, Inc. and Florida Choice Bank from January 2005 to January 2008. He has been the president of the Orlando-based consulting firm WBQ Design & Engineering, Inc. since its founding in 1994. He served as Director of Southern Community Bank until its acquisition by First National Bank of Florida. Mr. Burke is also a noted civic leader who has served on several Committees and Boards. Mr. Burke is highly respected in the engineering community and was nominated for the 2008 Central Florida Engineer's Week Leadership Excellence award. He holds a Master's degree in civil engineering from the University of Central Florida in June 1988.

For the above subsample, the output should be as follows:

NameUnderGrad
William Dorey2
William Ellingson1
Kevin Norris1
William Sharp2
Derek Burke3

Thank you for your time.

S


Accepted Solutions
Solution
‎11-04-2013 12:23 AM
PROC Star
Posts: 7,363

Re: Code to extract information from the text

Whether you use the prxmatch function as Jagadish recommended, or something like the following use of non-regular expression functions, I think you want to isolate the search to the last sentence of each paragraph.  It wouldn't make a difference with your 5 examples, but could easily find false matches given other examples.

Also, Jagadish added a criterion that you didn't specify, namely 'admin', which I didn't, thus I ended up showing the third record as having a value of 2 for undergrad:

data want (keep=name undergrad);

  informat last $150.;

  informat name $30.;

  informat biography $upcase32767.;

  infile "C:\txtinput.txt" lrecl=32767 dlm="09"x firstobs=2;

  input name biography;

  biography=tranwrd(biography,"B.S.","BACHELOR");

  x=find(biography,'.',-1*(length(strip(biography))-2));

  last=substr(biography,x);

  bachelor=index(last,"BACHELOR");

  accounting=index(last,"ACCOUNTING");

  finance=index(last,"FINANCE");

  economics=index(last,"ECONOMICS");

  if bachelor then do;

    if accounting or finance or economics then undergrad=1;

    else undergrad=2;

  end;

  else undergrad=3;

run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,131

Re: Code to extract information from the text

Please try

data want;

    set have;

    if prxmatch('m/b.s|bachelor/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)>0 then UnderGrad=1;

    if prxmatch('m/b.s|bachelor/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)=0 then UnderGrad=2;

    if prxmatch('m/b.s|bachelor/oi',Biography)=0 and prxmatch('m/finance|economic|account|admin/oi',Biography)=0 then UnderGrad=3;

run;

Thanks,

Jagadish

Thanks,
Jag
Frequent Contributor
Posts: 96

Re: Code to extract information from the text

Thanks Jagadish for the code. For some reason I am not able to get Undergrad =3 for the last observation (Dereck Burke) of my subsample. I will work on it and see what the problem is. Thanks again for your prompt reply.

PROC Star
Posts: 7,363

Re: Code to extract information from the text

The reason (I think) that you were unsuccessful with Jagadish's code is twofold.  First, it was looking at the entire biography section, not just the last sentence.

Second, and the greater of the problems, was that the period (i.e., in b.s) was serving as a meta-character rather than a literal period.  However, my knowledge of regular expressions is still too limited to be of much help there.

Trusted Advisor
Posts: 1,131

Re: Code to extract information from the text

Hi Sir, thank you for your suggestion.

@shalmali, I modified my code a bit by including the \b  to match the desired word exactly, Please try the below one

data want;

    set have;

    if prxmatch('m/b.s\b|bachelor\b/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)>0 then UnderGrad=1;

    else if prxmatch('m/b.s\b|bachelor\b/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)=0 then UnderGrad=2;

    else UnderGrad=3;

run;

Thanks,

Jagadish

Thanks,
Jag
Frequent Contributor
Posts: 96

Re: Code to extract information from the text

Thanks Jagadish for the code. If I need to create another variable, GRAD, that takes value 1 if the executive has a masters degree in accounting, finance, or economics; 2 if the degree is in field other than accounting, finance, or economics; and 3 if the field is missing, how should I modified the above code?

Also from the subset of data, I need to create a new variable, CPA, that takes value 1 if the executive is a CPA or certified public accountant. The subsample is as follows:

Derek BurkeMr. Derek C. Burke is Founder of WBQ Design & Engineering Inc. and serves as its President. Mr. Burke is a licensed professional engineer. He served as the Chairman of the City of Orlando Downtown Development Board and Community Redevelopment Agency Advisory Board. He has been a Director of 1st United Bancorp, Inc., since April 1, 2012. He has been a Director of Old Harbor Bank since April 2012. He serves as a Director at Anderen Bank, a subsidiary of Anderen Financial, Inc. Mr. Burke also serves on the Board of the Orlando Neighborhood Development Corporation. Mr. Burke served as a Director of Florida Choice Bankshares, Inc. and Florida Choice Bank from January 2005 to January 2008. He has been the president of the Orlando-based consulting firm WBQ Design & Engineering, Inc. since its founding in 1994. He served as Director of Southern Community Bank until its acquisition by First National Bank of Florida. Mr. Burke is also a noted civic leader who has served on several Committees and Boards. Mr. Burke is highly respected in the engineering community and was nominated for the 2008 Central Florida Engineer's Week Leadership Excellence award. He holds a Master's degree in civil engineering from the University of Central Florida in June 1988.
william deanMr. William Dean Karrash, CPA serves as an Executive Vice President of Burke, Lawton, Brewer & Burke, LLC. Mr. Karrash serves as Chief Compliance Officer at Venture Securities Corp. Mr. Karrash  was Advisory Director of Abraxas Petroleum Corp. from November 2011 to May 2012.  He serves as Portfolio Manager of BLB & B Advisors, LLC. He has over 20 years of experience in the financial services industry. Mr. Karrash served as the President and Chief Executive Officer of Rutherford, Brown & Catherwood, LLC. He served as Executive Vice President and Chief Financial Officer of Walnut Asset Management, LLC. He also served as Vice President of Finance for Lincoln Investment Planning, Inc. and was a Senior Manager with Pricewaterhouse Coopers. He served as Chairman of FINRA's District No. 9 Business Conduct and Nominating Committees. He has been a Director of Abraxas Petroleum Corp. since May 2012. Mr. Karrash is a member of FINRA's Small Firm Advisory Board (SFAB) and the Financial Responsibility committee. He has also served as a member of First Clearing, LLC's Correspondent Advisory Board. Mr. Karrash is a CPA, CPF, and is registered with FINRA as a General Securities Registered Representative (Series 7), a General Securities Principal (Series 24), a Municipal Securities Principal (Series 53), and an Investment Advisor Representative (series 65). Mr. Karrash is a 1996 graduate of the Temple University Executive M.B.A. program and a 1983 graduate of Pennsylvania State University where he obtained a B.S. in Accounting.
james andersenMr. R. James Andersen, Jim, CA, CPA (Illinois), CFP has been the Chief Financial Officer and Vice President of Finance at Avalon Rare Metals Inc. since June 2001. Mr. Andersen has been a Partner of Forbes Andersen LLP for nine years. He is a Partner of the Toronto accounting firm Andersen and Company PC. He founded Andersen & Company, PC and served as its President from January 2007 to October 2011. He served as Chief Financial Officer of Highvista Gold Inc. until October 6, 2011. He served as the Chief Financial Officer of Pele Mountain Resources Inc. from May 9, 2007 to September 30, 2011 and served as its Vice President of Finance. He served as the Chief Financial Officer and Secretary of Triumph Ventures Corp. until October 2011. He has more than 15 years of experience in public practice, and a wide variety of experience in small-cap companies. He served as the Chief Financial Officer of Baymount Incorp. from September 2005 to June 2010 and served as its Vice President of Finance since September 2005. He served as an Interim Secretary and Chief Financial Officer of Phantom Fiber Corp. since April 20, 2005. He served as the Chief Financial Officer and Vice President of Finance at Academy Capital Corp. since September 12, 2005. He served as the Chief Financial Officer of Kilo Goldmines, Inc. Mr. Andersen served as the Chief Financial Officer and Vice President of Finance of Macusani Yellowcake Inc. from May 29, 2007 to September 11, 2008. He served as a Director of Triumph Ventures Corp. from January 2010 to October 2011. He is experienced in mining accounting as well as corporate governance rules and regulations. He served as a Director of Macusani Yellowcake Inc. He was a part-time professor in the MBA program at the Schulich School of Business at York University. Mr. Andersen is also a Certified Public Accountant. He holds generic degree with high distinction from Trinity College at the University of Toronto in 1991 and placed 20th in Ontario on the CICA's Uniform Final Exam in 1992.
Wade D. MiquelonMr. Wade D. Miquelon has been the Chief Financial Officer of Walgreen Co. since June 16, 2008 and has been its Executive Vice President since July 2009. Mr. Miquelon has also been President of International at Walgreen Co. since September 17, 2012. He joined Walgreen Co. on June 16, 2008 as Senior Vice President and was responsible for its accounting, tax and treasury functions, including investor relations. Prior to Walgreens, he served as the Chief Financial Officer and Executive Vice President of Tyson Foods Inc. from June 29, 2006 to May 12, 2008 and was responsible for its worldwide finance and accounting functions. From 1989 to 2006, Mr. Miquelon served at Procter & Gamble as Vice President of Finance, Western Europe. He served as the senior most financial officer of P&G and was responsible for the 17-country Western Europe operation. He also served as the Chief Financial Officer and Senior Director for P&G's 42 country ASEAN, Australia and India region based in Singapore. He served as the Head of Finance and Accounting for the ASEAN, Australasia, and India region, and served as a Director and Investment Partner of I Venturesd. Mr. Miquelon co-founded Emmperative Marketing, Inc. and served as its Chief Financial Officer and Senior Vice President of Business Development and Human Resources. He has been a Director of Alliance Boots GMBH since August 2, 2012. Mr. Miquelon has been a Director of Acadia Healthcare Company, Inc. since January 19, 2011. He serves as a Member of The Board of The Lyric Opera of Chicago and the Shedd Aquarium in Chicago. He also serves as a Member of the Dean s Advisory Board for the Sam M. Walton College of Business, University of Arkansas. Mr. Miquelon holds a Bachelor of Science Degree in Civil Engineering from Purdue University in 1987 and an M.B.A. in Finance and Marketing from Washington University in 1989.
Arthur S. Locke, IIIMr. Arthur S. Locke, Art, III served as the Chief Financial Officer and Senior Vice President of Websense, Inc. from July 31, 2009 to August 2011 and also served as its Principal Accounting Officer until August 2011. Mr. Locke served as the Chief Financial Officer of Microstrategy Inc. since January 2005 until March 2009 and served as its Executive Vice President of Finance until March 25, 2009. Mr. Locke served as Vice President of Finance of Microstrategy Inc. since ... January 2005 and served as its Principal Accounting Officer. Prior to joining MicroStrategy Inc., Mr. Locke served as Chief Financial Officer of Metropolitan Area Networks, a start-up wireless broadband company, from February 2000 to January 2001, and as corporate controller of EIS International, Inc., from March 1997 to February 2000. Mr. Locke is a certified public accountant and holds a Bachelor of Science in Business Administration (BSBA) in Accounting and Computer Systems from American University.

From this the output should look like

NameBiographycpa
Derek BurkeMr. Derek C. Burke is Founder of WBQ Design & Engineering Inc. and serves as its President. Mr. Burke is a licensed professional engineer. He served as the Chairman of the City of Orlando Downtown Development Board and Community Redevelopment Agency Advisory Board. He has been a Director of 1st United Bancorp, Inc., since April 1, 2012. He has been a Director of Old Harbor Bank since April 2012. He serves as a Director at Anderen Bank, a subsidiary of Anderen Financial, Inc. Mr. Burke also serves on the Board of the Orlando Neighborhood Development Corporation. Mr. Burke served as a Director of Florida Choice Bankshares, Inc. and Florida Choice Bank from January 2005 to January 2008. He has been the president of the Orlando-based consulting firm WBQ Design & Engineering, Inc. since its founding in 1994. He served as Director of Southern Community Bank until its acquisition by First National Bank of Florida. Mr. Burke is also a noted civic leader who has served on several Committees and Boards. Mr. Burke is highly respected in the engineering community and was nominated for the 2008 Central Florida Engineer's Week Leadership Excellence award. He holds a Master's degree in civil engineering from the University of Central Florida in June 1988.0
william deanMr. William Dean Karrash, CPA serves as an Executive Vice President of Burke, Lawton, Brewer & Burke, LLC. Mr. Karrash serves as Chief Compliance Officer at Venture Securities Corp. Mr. Karrash  was Advisory Director of Abraxas Petroleum Corp. from November 2011 to May 2012.  He serves as Portfolio Manager of BLB & B Advisors, LLC. He has over 20 years of experience in the financial services industry. Mr. Karrash served as the President and Chief Executive Officer of Rutherford, Brown & Catherwood, LLC. He served as Executive Vice President and Chief Financial Officer of Walnut Asset Management, LLC. He also served as Vice President of Finance for Lincoln Investment Planning, Inc. and was a Senior Manager with Pricewaterhouse Coopers. He served as Chairman of FINRA's District No. 9 Business Conduct and Nominating Committees. He has been a Director of Abraxas Petroleum Corp. since May 2012. Mr. Karrash is a member of FINRA's Small Firm Advisory Board (SFAB) and the Financial Responsibility committee. He has also served as a member of First Clearing, LLC's Correspondent Advisory Board. Mr. Karrash is a CPA, CPF, and is registered with FINRA as a General Securities Registered Representative (Series 7), a General Securities Principal (Series 24), a Municipal Securities Principal (Series 53), and an Investment Advisor Representative (series 65). Mr. Karrash is a 1996 graduate of the Temple University Executive M.B.A. program and a 1983 graduate of Pennsylvania State University where he obtained a B.S. in Accounting.1
james andersenMr. R. James Andersen, Jim, CA, CPA (Illinois), CFP has been the Chief Financial Officer and Vice President of Finance at Avalon Rare Metals Inc. since June 2001. Mr. Andersen has been a Partner of Forbes Andersen LLP for nine years. He is a Partner of the Toronto accounting firm Andersen and Company PC. He founded Andersen & Company, PC and served as its President from January 2007 to October 2011. He served as Chief Financial Officer of Highvista Gold Inc. until October 6, 2011. He served as the Chief Financial Officer of Pele Mountain Resources Inc. from May 9, 2007 to September 30, 2011 and served as its Vice President of Finance. He served as the Chief Financial Officer and Secretary of Triumph Ventures Corp. until October 2011. He has more than 15 years of experience in public practice, and a wide variety of experience in small-cap companies. He served as the Chief Financial Officer of Baymount Incorp. from September 2005 to June 2010 and served as its Vice President of Finance since September 2005. He served as an Interim Secretary and Chief Financial Officer of Phantom Fiber Corp. since April 20, 2005. He served as the Chief Financial Officer and Vice President of Finance at Academy Capital Corp. since September 12, 2005. He served as the Chief Financial Officer of Kilo Goldmines, Inc. Mr. Andersen served as the Chief Financial Officer and Vice President of Finance of Macusani Yellowcake Inc. from May 29, 2007 to September 11, 2008. He served as a Director of Triumph Ventures Corp. from January 2010 to October 2011. He is experienced in mining accounting as well as corporate governance rules and regulations. He served as a Director of Macusani Yellowcake Inc. He was a part-time professor in the MBA program at the Schulich School of Business at York University. Mr. Andersen is also a Certified Public Accountant. He holds generic degree with high distinction from Trinity College at the University of Toronto in 1991 and placed 20th in Ontario on the CICA's Uniform Final Exam in 1992.1
Wade D. MiquelonMr. Wade D. Miquelon has been the Chief Financial Officer of Walgreen Co. since June 16, 2008 and has been its Executive Vice President since July 2009. Mr. Miquelon has also been President of International at Walgreen Co. since September 17, 2012. He joined Walgreen Co. on June 16, 2008 as Senior Vice President and was responsible for its accounting, tax and treasury functions, including investor relations. Prior to Walgreens, he served as the Chief Financial Officer and Executive Vice President of Tyson Foods Inc. from June 29, 2006 to May 12, 2008 and was responsible for its worldwide finance and accounting functions. From 1989 to 2006, Mr. Miquelon served at Procter & Gamble as Vice President of Finance, Western Europe. He served as the senior most financial officer of P&G and was responsible for the 17-country Western Europe operation. He also served as the Chief Financial Officer and Senior Director for P&G's 42 country ASEAN, Australia and India region based in Singapore. He served as the Head of Finance and Accounting for the ASEAN, Australasia, and India region, and served as a Director and Investment Partner of I Venturesd. Mr. Miquelon co-founded Emmperative Marketing, Inc. and served as its Chief Financial Officer and Senior Vice President of Business Development and Human Resources. He has been a Director of Alliance Boots GMBH since August 2, 2012. Mr. Miquelon has been a Director of Acadia Healthcare Company, Inc. since January 19, 2011. He serves as a Member of The Board of The Lyric Opera of Chicago and the Shedd Aquarium in Chicago. He also serves as a Member of the Dean s Advisory Board for the Sam M. Walton College of Business, University of Arkansas. Mr. Miquelon holds a Bachelor of Science Degree in Civil Engineering from Purdue University in 1987 and an M.B.A. in Finance and Marketing from Washington University in 1989.0
Arthur S. Locke, IIIMr. Arthur S. Locke, Art, III served as the Chief Financial Officer and Senior Vice President of Websense, Inc. from July 31, 2009 to August 2011 and also served as its Principal Accounting Officer until August 2011. Mr. Locke served as the Chief Financial Officer of Microstrategy Inc. since January 2005 until March 2009 and served as its Executive Vice President of Finance until March 25, 2009. Mr. Locke served as Vice President of Finance of Microstrategy Inc. since ... January 2005 and served as its Principal Accounting Officer. Prior to joining MicroStrategy Inc., Mr. Locke served as Chief Financial Officer of Metropolitan Area Networks, a start-up wireless broadband company, from February 2000 to January 2001, and as corporate controller of EIS International, Inc., from March 1997 to February 2000. Mr. Locke is a certified public accountant and holds a Bachelor of Science in Business Administration (BSBA) in Accounting and Computer Systems from American University.1

Thank you for patiently answering all my question.

Shalmali

Trusted Advisor
Posts: 1,131

Re: Code to extract information from the text

Please try

data want;

    set have;

    if prxmatch('m/b.s\b|bachelor\b/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)>0 then UnderGrad=1;

    else if prxmatch('m/b.s\b|bachelor\b/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)=0 then UnderGrad=2;

    else UnderGrad=3;

    if prxmatch('m/m.b.a\b|master\b/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)>0 then Grad=1;

    else if prxmatch('m/m.b.a\b|master\b/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)=0 then Grad=2;

    else Grad=3;

   

    if prxmatch('m/cpa\b|accounting\b|accountant\b/oi',Biography)>0  then cpa=1;

    else cpa=0;

run;

Thanks,

jagadish

Thanks,
Jag
Frequent Contributor
Posts: 96

Re: Code to extract information from the text

Thanks a lot Jagadish for all the help.

PROC Star
Posts: 7,363

Re: Code to extract information from the text

: Methinks your code has to be a bit more complex.  Since you now need to parse two separate degrees, I think that you have to separate the last sentence based on degree.  Since your examples were all males, I didn't extend the code to accommodate the responses for females, but that should be easy to add. You also didn't include CA as an accounting designation but, since your data appears to reflect Canadian degrees, I presume that will also be necessary.

data want (keep=name undergrad grad cpa);

  informat last btext mtext $500.;

  informat name $30.;

  informat biography $upcase32767.;

  infile "C:\txtinput.txt" lrecl=32767 dlm="09"x firstobs=2;

  input name biography;

  biography=tranwrd(biography,"B.S.","BACHELOR");

  biography=tranwrd(biography,"M.B.A.","MASTER");

  x=max(find(biography,'. MR',-1*(length(strip(biography))-2)),

        find(biography,'. HE',-1*(length(strip(biography))-2)));

  last=substr(biography,x);

  x=prxmatch('m/b.s\b|bachelor\b/oi',last);

  y=prxmatch('m/m.b.a\b|master\b/oi',last);

  if x gt y then do;

    btext=substr(last,x);

    if y then mtext=substr(last,1,x-1);

  end;

  else if y gt x then do;

    mtext=substr(last,y);

    if x then btext=substr(last,1,y-1);

  end;

  if x then do;

    if prxmatch('m/finance|economic|account|admin/oi',btext)>0 then UnderGrad=1;

    else UnderGrad=2;

  end;

  else UnderGrad=3;

  if y then do;

    if prxmatch('m/finance|economic|account|admin/oi',mtext)>0 then Grad=1;

    else Grad=2;

  end;

  else Grad=3;

  if prxmatch('m/cpa\b|accountant\b/oi',biography)>0  then cpa=1;

  else cpa=0;

run;

Frequent Contributor
Posts: 96

Re: Code to extract information from the text

Thanks a lot Arthur for the code. I would not have been able to write it by myself. One last question: my file is an excel file. In your code, you import a text file. I would really appreciate if you tell me how to modify the above code so that I could use an excel file. Thanks.

PROC Star
Posts: 7,363

Re: Code to extract information from the text

If you don't want to change anything, just make sure that the first row is either blank or has column labels and save the file (in Excel) as a tab delimited file called c:\txtinput.txt

Frequent Contributor
Posts: 96

Re: Code to extract information from the text

Thanks a lot for patiently answering all my questions.

PROC Star
Posts: 7,363

Re: Code to extract information from the text

The more I've thought about it, while I still think you only want to search the last sentence, you could use an escape character (i.e., a \) to get the period to be treated as a character:

data want (keep=name undergrad);

  informat name $30.;

  informat biography $32767.;

  infile "C:\txtinput.txt" lrecl=32767 dlm="09"x firstobs=2;

  input name biography;

  if prxmatch('m/b\.s|bachelor/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)>0 then UnderGrad=1;

  else if prxmatch('m/b\.s|bachelor/oi',Biography)>0 and prxmatch('m/finance|economic|account|admin/oi',Biography)=0 then UnderGrad=2;

  else UnderGrad=3;

run;

Frequent Contributor
Posts: 96

Re: Code to extract information from the text

Thanks Arthur.

Solution
‎11-04-2013 12:23 AM
PROC Star
Posts: 7,363

Re: Code to extract information from the text

Whether you use the prxmatch function as Jagadish recommended, or something like the following use of non-regular expression functions, I think you want to isolate the search to the last sentence of each paragraph.  It wouldn't make a difference with your 5 examples, but could easily find false matches given other examples.

Also, Jagadish added a criterion that you didn't specify, namely 'admin', which I didn't, thus I ended up showing the third record as having a value of 2 for undergrad:

data want (keep=name undergrad);

  informat last $150.;

  informat name $30.;

  informat biography $upcase32767.;

  infile "C:\txtinput.txt" lrecl=32767 dlm="09"x firstobs=2;

  input name biography;

  biography=tranwrd(biography,"B.S.","BACHELOR");

  x=find(biography,'.',-1*(length(strip(biography))-2));

  last=substr(biography,x);

  bachelor=index(last,"BACHELOR");

  accounting=index(last,"ACCOUNTING");

  finance=index(last,"FINANCE");

  economics=index(last,"ECONOMICS");

  if bachelor then do;

    if accounting or finance or economics then undergrad=1;

    else undergrad=2;

  end;

  else undergrad=3;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 564 views
  • 9 likes
  • 4 in conversation