BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have an example dataset as follows:

 

DATA have;
  input InvestorID Type & $char1000.;
  length Type $1000;
DATALINES;
1001 Accelerator/Incubator 
1002 Accelerator/Incubator, Angel (individual), Angel Group, Asset Manager, Business Development Company, Corporate Development, Corporate Venture Capital, Corporation, Family Office, Fund of Funds, Fundless Sponsor, Government, Growth/Expansion, Hedge Fund, Holding Company, Impact Investing, Infrastructure, Investment Bank, Leasing, Lender/Debt Provider, Limited Partner, Merchant Banking Firm, Mezzanine, Mutual Fund, Not-For-Profit Venture Capital, Other Private Equity, PE-Backed Company, PE/Buyout, Real Estate, SBIC, Secondary Buyer, Sovereign Wealth Fund, University, VC-Backed Company, Venture Capital 
1003 Accelerator/Incubator, Angel (individual), Business Development Company, Venture Capital 
1004 Corporate Venture Capital, Fund of Funds 
;
RUN;

What I want to do is produce the following dataset:

DATA want;
  input InvestorID Type & $char1000. CVC VC;
  length Type $1000;
DATALINES;
1001 Accelerator/Incubator  0 0
1002 Accelerator/Incubator, Angel (individual), Angel Group, Asset Manager, Business Development Company, Corporate Development, Corporate Venture Capital, Corporation, Family Office, Fund of Funds, Fundless Sponsor, Government, Growth/Expansion, Hedge Fund, Holding Company, Impact Investing, Infrastructure, Investment Bank, Leasing, Lender/Debt Provider, Limited Partner, Merchant Banking Firm, Mezzanine, Mutual Fund, Not-For-Profit Venture Capital, Other Private Equity, PE-Backed Company, PE/Buyout, Real Estate, SBIC, Secondary Buyer, Sovereign Wealth Fund, University, VC-Backed Company, Venture Capital  1 1
1003 Accelerator/Incubator, Angel (individual), Business Development Company, Venture Capital  0 1
1004 Corporate Venture Capital, Fund of Funds  1 0
;
RUN;

The CVC variable takes on a value of 1 if in Type, the word Corporate Venture Capital appears, otherwise it is 0. Likewise the VC variable takes on a value of 1 if in Type, the word Venture Capital appears, otherwise it is 0. These words Corporate Venture Capital and Venture Capital are always spelt with the same capitalization.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You have to tell INDEXW() what characters constituent end of a word.  Or just use INDEX() instead.

DATA have;
  input InvestorID Type & $1000.;
  CVC = 0 < indexw(Type,'Corporate Venture Capital',' ,.()');
  VC= 0 < indexw(Type,'Venture Capital',' ,.()');
DATALINES;
1001 Accelerator/Incubator 
1002 Accelerator/Incubator, Angel (individual), Angel Group, Asset Manager, Business Development Company, Corporate Development, Corporate Venture Capital, Corporation, Family Office, Fund of Funds, Fundless Sponsor, Government, Growth/Expansion, Hedge Fund, Holding Company, Impact Investing, Infrastructure, Investment Bank, Leasing, Lender/Debt Provider, Limited Partner, Merchant Banking Firm, Mezzanine, Mutual Fund, Not-For-Profit Venture Capital, Other Private Equity, PE-Backed Company, PE/Buyout, Real Estate, SBIC, Secondary Buyer, Sovereign Wealth Fund, University, VC-Backed Company, Venture Capital 
1003 Accelerator/Incubator, Angel (individual), Business Development Company, Venture Capital 
1004 Corporate Venture Capital, Fund of Funds 
;

proc print;
run;

Tom_0-1695730132923.png

 

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

I ask for clarification. In several places in your text, you indicate you are looking for a "word", but then the search strings are made up of several words, such as "Corporate Venture Capital". Are we searching for one of these words, or do we have to find all of "Corporate Venture Capital" to get a match?

--
Paige Miller
elbarto
Obsidian | Level 7

Sorry, I mean the entire string "Corporate Venture Capital" to get a match. Likewise for the entire string "Venture Capital".

Tom
Super User Tom
Super User

If you just want to find the string, even if it is part of a longer word, then use INDEX() or FIND().  If you want to only find it when it is a stand only word (or phrase in this case) then use INDEXW() or FINDW().

 

All of those functions return the location the value was found. Which you could treat as TRUE or FALSE since SAS considers any number other than zero or missing as TRUE.

 

Or you could just add a simple comparison operator to force it to be a true 0/1 boolean value.

 

Another advantage of the FIND() and FINDW() function is that they accept modifiers to allow you to ignore case and/or trim leading/trailing delimiters.

CVC = 0 < indexw(Type,'Corporate Venture Capital');
CV= 0 < indexw(Type,'Venture Capital');

 

elbarto
Obsidian | Level 7

Thank you. However, when I try the following:

data want;
set have;
CVC = 0 < indexw(Type,'Corporate Venture Capital');
VC= 0 < indexw(Type,'Venture Capital');
run;

It does not seem to be picking up the "Corporate Venture Capital", for example, CVC is 0 for both investorID=1002 and 1004, when it should be 1.

Patrick
Opal | Level 21

Does below work for you?

DATA have;
  input InvestorID Type & $char1000. CVC_want VC_want;
  length Type $1000;
DATALINES;
1001 Accelerator/Incubator  0 0
1002 Accelerator/Incubator, Angel (individual), Angel Group, Asset Manager, Business Development Company, Corporate Development, Corporate Venture Capital, Corporation, Family Office, Fund of Funds, Fundless Sponsor, Government, Growth/Expansion, Hedge Fund, Holding Company, Impact Investing, Infrastructure, Investment Bank, Leasing, Lender/Debt Provider, Limited Partner, Merchant Banking Firm, Mezzanine, Mutual Fund, Not-For-Profit Venture Capital, Other Private Equity, PE-Backed Company, PE/Buyout, Real Estate, SBIC, Secondary Buyer, Sovereign Wealth Fund, University, VC-Backed Company, Venture Capital  1 1
1003 Accelerator/Incubator, Angel (individual), Business Development Company, Venture Capital  0 1
1004 Corporate Venture Capital, Fund of Funds  1 0
;

data want;
  set have;
  cvc= find(type,'Corporate Venture Capital','i')>0;
  cv= prxmatch('/(?<!Corporate )Venture Capital/i',strip(type))>0;
run;

proc print data=want(drop=type);
run;

Patrick_0-1695707325766.png

 

elbarto
Obsidian | Level 7

Thanks that works! I also tried the following, which works as well:

 

type_nc = compress(type);
CVC = 0 < indexw(type_nc,'CorporateVentureCapital',", ");
VC = 0 < indexw(type_nc,'VentureCapital',", ");
Tom
Super User Tom
Super User

You have to tell INDEXW() what characters constituent end of a word.  Or just use INDEX() instead.

DATA have;
  input InvestorID Type & $1000.;
  CVC = 0 < indexw(Type,'Corporate Venture Capital',' ,.()');
  VC= 0 < indexw(Type,'Venture Capital',' ,.()');
DATALINES;
1001 Accelerator/Incubator 
1002 Accelerator/Incubator, Angel (individual), Angel Group, Asset Manager, Business Development Company, Corporate Development, Corporate Venture Capital, Corporation, Family Office, Fund of Funds, Fundless Sponsor, Government, Growth/Expansion, Hedge Fund, Holding Company, Impact Investing, Infrastructure, Investment Bank, Leasing, Lender/Debt Provider, Limited Partner, Merchant Banking Firm, Mezzanine, Mutual Fund, Not-For-Profit Venture Capital, Other Private Equity, PE-Backed Company, PE/Buyout, Real Estate, SBIC, Secondary Buyer, Sovereign Wealth Fund, University, VC-Backed Company, Venture Capital 
1003 Accelerator/Incubator, Angel (individual), Business Development Company, Venture Capital 
1004 Corporate Venture Capital, Fund of Funds 
;

proc print;
run;

Tom_0-1695730132923.png

 

Astounding
PROC Star

@elbarto,

Whichever method you use, searching through a 1000-character string can be time-consuming.  If speed is an issue, you can probably speed things up by taking advantage of the fact that "Venture Capital" is a subset of "Corporate Venture Capital".  Consider this structure:

 

VC = (whatever method you choose);
if VC then CVC = (whatever method you choose);
else CVC = 0;

This avoids searching for "Corporate Venture Capital" in cases where it definitely won't exist.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3538 views
  • 1 like
  • 5 in conversation