BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Dear all,

 

how can I delete the strings beginning at 'DEAD'

 

for example,

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
ABB 'A' (LON) DEAD - DELIST 06/04/98
ABB 'B' (LON) DEAD - DELISTED
ABB AG (XSQ) DEAD - 19/06/99
ABB BF (LON) DEAD - DEAD
ABBEY NATIONAL DEAD - T/O SEE 702853
;
run;

I expect to get

ABB 'A' (LON) 
ABB 'B' (LON) 
ABB AG (XSQ) 
ABB BF (LON) 
ABBEY NATIONAL

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

@Alexxxxxxx   May be some of the records do not have substr 'DEAD' in it.

 

So perhaps a conditional construct like this??

 

if find(name,'DEAD')>0 then want=substr(name,1,find(name,'DEAD')-1);

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20
data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
ABB 'A' (LON) DEAD - DELIST 06/04/98
ABB 'B' (LON) DEAD - DELISTED
ABB AG (XSQ) DEAD - 19/06/99
ABB BF (LON) DEAD - DEAD
ABBEY NATIONAL DEAD - T/O SEE 702853
;
run;

data want;
set have;
want=substr(name,1,findw(name,'DEAD')-1);
run;

 

Alexxxxxxx
Pyrite | Level 9

Dear @novinosrin ,

 

thanks, the code works on the sample,

 

however, I get some 'NOTE' when I process the main dataset by following codes,

Data want;
Set Step9.Datastream_gb_Step23;
NAME_Step23_1=substr(NAME_Step23,1,findw(NAME_Step23,'DEAD')-1);
Run;

and one of them is 

 

NOTE: Invalid third argument to function SUBSTR at line 2591 column 15.
Type=13283U NAME=INTERCEDE GROUP COMPANY_NAME=INTERCEDE GROUP PLC COMPANY_NAME_1=INTERCEDE GROUP PLC.
COMPANY_NAME___SHORT=INTERCEDE GROUP PLC PREVIOUS_NAME=  GEOGRAPHIC_DESCR_=GB MNEMONIC=IGP
EQUITIES_STATUS=ACT. ISIN_CODE=GB0003287249 ISIN_NUMBER=GB0003287249 ISIN_ISSUER_CTRY=GB
ISIN_CODE_THAI_FB=  QUOTE_INDICATOR=P ISIN_NUMBER_1=GB0003287249 INACTIVE_DATE=.
NAME_Step1=INTERCEDE GROUP COMPANY_NAME_Step1=INTERCEDE GROUP PLC
COMPANY_NAME_1_Step1=INTERCEDE GROUP PLC. COMPANY_NAME___SHORT_Step1=INTERCEDE GROUP PLC
PREVIOUS_NAME_Step1=  NAME_Step2=INTERCEDE GROUP chflagNAME_2v1=0
COMPANY_NAME_Step2=INTERCEDE GROUP PLC chflagCOMPANY_NAME_2v1=0
COMPANY_NAME_1_Step2=INTERCEDE GROUP PLC. chflagCOMPANY_NAME_1_2v1=0
COMPANY_NAME___SHORT_Step2=INTERCEDE GROUP PLC chflagCOMPANY_NAME___SHORT_2v1=0 PREVIOUS_NAME_Step2=
chflagPREVIOUS_NAME_2v1=0 NAME_Step21=INTERCEDE GROUP chflagNAME_21v2=0
COMPANY_NAME_Step21=INTERCEDE GROUP PLC chflagCOMPANY_NAME_21v2=0
COMPANY_NAME_1_Step21=INTERCEDE GROUP PLC. chflagCOMPANY_NAME_1_21v2=0
COMPANY_NAME___SHORT_Step21=INTERCEDE GROUP PLC chflagCOMPANY_NAME___SHORT_21v2=0
PREVIOUS_NAME_Step21=  chflagPREVIOUS_NAME_21v2=0 NAME_Step22=INTERCEDE GROUP chflagNAME_22v21=0
COMPANY_NAME_Step22=INTERCEDE GROUP PLC chflagCOMPANY_NAME_22v21=0
COMPANY_NAME_1_Step22=INTERCEDE GROUP PLC. chflagCOMPANY_NAME_1_22v21=0
COMPANY_NAME___SHORT_Step22=INTERCEDE GROUP PLC chflagCOMPANY_NAME___SHORT_22v21=0
PREVIOUS_NAME_Step22=  chflagPREVIOUS_NAME_22v21=0 NAME_Step23=INTERCEDE GROUP cflagNAME_22vs23=0
COMPANY_NAME_Step23=INTERCEDE GROUP PLC cflagCOMPANY_NAME_22vs23=0
COMPANY_NAME_1_Step23=INTERCEDE GROUP PLC. cflagCOMPANY_NAME_1_22vs23=0
COMPANY_NAME___SHORT_Step23=INTERCEDE GROUP PLC cflagCOMPANY_NAME___SHORT_22vs23=0
PREVIOUS_NAME_Step23=  cflagPREVIOUS_NAME_22vs23=0 NAME_Step23_1=INTERCEDE GROUP _ERROR_=1 _N_=7

How can I fix this?

novinosrin
Tourmaline | Level 20

Hi @Alexxxxxxx   It's difficult to guess what your real is. The assumption i went with is that your sample is good representative of your real.

Try FIND instead of FINDW and experiment

 

data want;
set have;
want=substr(name,1,find(name,'DEAD')-1);
run;

 

 

 

novinosrin
Tourmaline | Level 20

@Alexxxxxxx   May be some of the records do not have substr 'DEAD' in it.

 

So perhaps a conditional construct like this??

 

if find(name,'DEAD')>0 then want=substr(name,1,find(name,'DEAD')-1);
ErikLund_Jensen
Rhodochrosite | Level 12

@Alexxxxxxx 

 

The note is caused by an illegal third parameter to substr. If then string "DEAD" is not found, find returns 0, and then your substring vill be substr(name,1,0), which is illegal. This removes the notes:

 

data want(drop=found); set have;
 	found = find(name,'DEAD');
	if found > 1 then name=substr(name,1,found-1);
run;

If found = 0 then name remains unchanged.

 

 

Tom
Super User Tom
Super User

You need to include examples that do NOT contain the text DEAD so that solutions can be made that handle that.

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
ABB 'A' (LON) DEAD - DELIST 06/04/98
ABB 'B' (LON) DEAD - DELISTED
ABB AG (XSQ) DEAD - 19/06/99
No Change
ABB BF (LON) DEAD - DEAD
ABBEY NATIONAL DEAD - T/O SEE 702853
DEADHEAD Alive
;

data want;
  set have;
  if index(name,' DEAD ') then
    name=substrn(name,1,index(name,' DEAD '))
  ;
run;

proc print;
run;
Obs    name

 1     ABB 'A' (LON)
 2     ABB 'B' (LON)
 3     ABB AG (XSQ)
 4     No Change
 5     ABB BF (LON)
 6     ABBEY NATIONAL
 7     DEADHEAD Alive
mkeintz
PROC Star

You can take advantage of a unique property of the SUBSTR function - namely that it can be on the left of an assignment statement.  For instance

      substr(name,15)=' ';

replaces all characters starting at position 15 in NAME by blanks.

 

In your case

 

data have ;
  infile datalines truncover;
  input name $100.;
  datalines;
ABB 'A' (LON) DEAD - DELIST 06/04/98
ABB 'B' (LON) DEAD - DELISTED
ABB AG (XSQ) DEAD - 19/06/99
ABB BF (LON) DEAD - DEAD
ABBEY NATIONAL DEAD - T/O SEE 702853
;
run;

data want;
  set have;
  substr(name,indexw(name,'DEAD'))=' ';
run;

 

Edited additional comment:

 

By using the INDEXW function to find the character position, instead of the INDEX function, you won't get false positives from words like 'UNDEAD' or 'DEADLY'.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ErikLund_Jensen
Rhodochrosite | Level 12
This will also give the "Invalid argument to substr" note, if the string "DEAD" is not found.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1776 views
  • 4 likes
  • 5 in conversation