Dear all,
how can I delete the strings beginning at 'DEAD'
for example,
data have ;
infile datalines truncover;
input name $100.;
datalines;
ABB 'A' (LON) DEAD - DELIST 06/04/98
ABB 'B' (LON) DEAD - DELISTED
ABB AG (XSQ) DEAD - 19/06/99
ABB BF (LON) DEAD - DEAD
ABBEY NATIONAL DEAD - T/O SEE 702853
;
run;
I expect to get
ABB 'A' (LON)
ABB 'B' (LON)
ABB AG (XSQ)
ABB BF (LON)
ABBEY NATIONAL
@Alexxxxxxx May be some of the records do not have substr 'DEAD' in it.
So perhaps a conditional construct like this??
if find(name,'DEAD')>0 then want=substr(name,1,find(name,'DEAD')-1);
data have ;
infile datalines truncover;
input name $100.;
datalines;
ABB 'A' (LON) DEAD - DELIST 06/04/98
ABB 'B' (LON) DEAD - DELISTED
ABB AG (XSQ) DEAD - 19/06/99
ABB BF (LON) DEAD - DEAD
ABBEY NATIONAL DEAD - T/O SEE 702853
;
run;
data want;
set have;
want=substr(name,1,findw(name,'DEAD')-1);
run;
Dear @novinosrin ,
thanks, the code works on the sample,
however, I get some 'NOTE' when I process the main dataset by following codes,
Data want;
Set Step9.Datastream_gb_Step23;
NAME_Step23_1=substr(NAME_Step23,1,findw(NAME_Step23,'DEAD')-1);
Run;
and one of them is
NOTE: Invalid third argument to function SUBSTR at line 2591 column 15.
Type=13283U NAME=INTERCEDE GROUP COMPANY_NAME=INTERCEDE GROUP PLC COMPANY_NAME_1=INTERCEDE GROUP PLC.
COMPANY_NAME___SHORT=INTERCEDE GROUP PLC PREVIOUS_NAME= GEOGRAPHIC_DESCR_=GB MNEMONIC=IGP
EQUITIES_STATUS=ACT. ISIN_CODE=GB0003287249 ISIN_NUMBER=GB0003287249 ISIN_ISSUER_CTRY=GB
ISIN_CODE_THAI_FB= QUOTE_INDICATOR=P ISIN_NUMBER_1=GB0003287249 INACTIVE_DATE=.
NAME_Step1=INTERCEDE GROUP COMPANY_NAME_Step1=INTERCEDE GROUP PLC
COMPANY_NAME_1_Step1=INTERCEDE GROUP PLC. COMPANY_NAME___SHORT_Step1=INTERCEDE GROUP PLC
PREVIOUS_NAME_Step1= NAME_Step2=INTERCEDE GROUP chflagNAME_2v1=0
COMPANY_NAME_Step2=INTERCEDE GROUP PLC chflagCOMPANY_NAME_2v1=0
COMPANY_NAME_1_Step2=INTERCEDE GROUP PLC. chflagCOMPANY_NAME_1_2v1=0
COMPANY_NAME___SHORT_Step2=INTERCEDE GROUP PLC chflagCOMPANY_NAME___SHORT_2v1=0 PREVIOUS_NAME_Step2=
chflagPREVIOUS_NAME_2v1=0 NAME_Step21=INTERCEDE GROUP chflagNAME_21v2=0
COMPANY_NAME_Step21=INTERCEDE GROUP PLC chflagCOMPANY_NAME_21v2=0
COMPANY_NAME_1_Step21=INTERCEDE GROUP PLC. chflagCOMPANY_NAME_1_21v2=0
COMPANY_NAME___SHORT_Step21=INTERCEDE GROUP PLC chflagCOMPANY_NAME___SHORT_21v2=0
PREVIOUS_NAME_Step21= chflagPREVIOUS_NAME_21v2=0 NAME_Step22=INTERCEDE GROUP chflagNAME_22v21=0
COMPANY_NAME_Step22=INTERCEDE GROUP PLC chflagCOMPANY_NAME_22v21=0
COMPANY_NAME_1_Step22=INTERCEDE GROUP PLC. chflagCOMPANY_NAME_1_22v21=0
COMPANY_NAME___SHORT_Step22=INTERCEDE GROUP PLC chflagCOMPANY_NAME___SHORT_22v21=0
PREVIOUS_NAME_Step22= chflagPREVIOUS_NAME_22v21=0 NAME_Step23=INTERCEDE GROUP cflagNAME_22vs23=0
COMPANY_NAME_Step23=INTERCEDE GROUP PLC cflagCOMPANY_NAME_22vs23=0
COMPANY_NAME_1_Step23=INTERCEDE GROUP PLC. cflagCOMPANY_NAME_1_22vs23=0
COMPANY_NAME___SHORT_Step23=INTERCEDE GROUP PLC cflagCOMPANY_NAME___SHORT_22vs23=0
PREVIOUS_NAME_Step23= cflagPREVIOUS_NAME_22vs23=0 NAME_Step23_1=INTERCEDE GROUP _ERROR_=1 _N_=7
How can I fix this?
Hi @Alexxxxxxx It's difficult to guess what your real is. The assumption i went with is that your sample is good representative of your real.
Try FIND instead of FINDW and experiment
data want;
set have;
want=substr(name,1,find(name,'DEAD')-1);
run;
@Alexxxxxxx May be some of the records do not have substr 'DEAD' in it.
So perhaps a conditional construct like this??
if find(name,'DEAD')>0 then want=substr(name,1,find(name,'DEAD')-1);
The note is caused by an illegal third parameter to substr. If then string "DEAD" is not found, find returns 0, and then your substring vill be substr(name,1,0), which is illegal. This removes the notes:
data want(drop=found); set have;
found = find(name,'DEAD');
if found > 1 then name=substr(name,1,found-1);
run;
If found = 0 then name remains unchanged.
You need to include examples that do NOT contain the text DEAD so that solutions can be made that handle that.
data have ;
infile datalines truncover;
input name $100.;
datalines;
ABB 'A' (LON) DEAD - DELIST 06/04/98
ABB 'B' (LON) DEAD - DELISTED
ABB AG (XSQ) DEAD - 19/06/99
No Change
ABB BF (LON) DEAD - DEAD
ABBEY NATIONAL DEAD - T/O SEE 702853
DEADHEAD Alive
;
data want;
set have;
if index(name,' DEAD ') then
name=substrn(name,1,index(name,' DEAD '))
;
run;
proc print;
run;
Obs name 1 ABB 'A' (LON) 2 ABB 'B' (LON) 3 ABB AG (XSQ) 4 No Change 5 ABB BF (LON) 6 ABBEY NATIONAL 7 DEADHEAD Alive
You can take advantage of a unique property of the SUBSTR function - namely that it can be on the left of an assignment statement. For instance
substr(name,15)=' ';
replaces all characters starting at position 15 in NAME by blanks.
In your case
data have ;
infile datalines truncover;
input name $100.;
datalines;
ABB 'A' (LON) DEAD - DELIST 06/04/98
ABB 'B' (LON) DEAD - DELISTED
ABB AG (XSQ) DEAD - 19/06/99
ABB BF (LON) DEAD - DEAD
ABBEY NATIONAL DEAD - T/O SEE 702853
;
run;
data want;
set have;
substr(name,indexw(name,'DEAD'))=' ';
run;
Edited additional comment:
By using the INDEXW function to find the character position, instead of the INDEX function, you won't get false positives from words like 'UNDEAD' or 'DEADLY'.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.