have to look from the last by space delimiter and remove the last three strings WHICH IS COUNTRY , ZIPCODE,STATE CODE
I HAVE
2001 P LARRY MN MINNESOTA MN 606123714 US
345 R TEENA TR FRANCE ST TN 60612345 UK
WHAT I WANT:
2001 P LARRY MN MINNESOTA
345 R TEENA TR FRANCE ST
Brute force attack:
do i = 1 to countw(charvar) - 3;
tempvar = catx(' ',tempvar,scan(charvar,i));
end;
charvar = tempvar;
drop i tempvar;
For tested code, provide example data in a data step with datalines.
Brute force attack:
do i = 1 to countw(charvar) - 3;
tempvar = catx(' ',tempvar,scan(charvar,i));
end;
charvar = tempvar;
drop i tempvar;
For tested code, provide example data in a data step with datalines.
hi,
I tried your approach, but for countw function it gives me warning .
data A;
INPUT ADDRESS $50.;
DATALINES;
2001 P LARRY MN MINNESOTA MN 606123714 US
345 R TEENA TR FRANCE ST TN 60612345 UK
RUN;
DATA B;
SET A;
do i = 1 to countw(ADDRESS) - 3;
tempvar = catx(' ',tempvar,scan(ADDRESS,i));
end;
charvar = tempvar;
drop i tempvar;
RUN;
Add a statement that forces tempvar to be created as character:
data b;
set a;
length tempvar $50;
do i = 1 to countw(address) - 3;
tempvar = catx(' ',tempvar,scan(address,i));
end;
charvar = tempvar;
drop i tempvar;
run;
I have attached input excel
PROC IMPORT OUT= HAVE
DATAFILE= "\LOCATION\INPUT.XLSX"
DBMS=EXCEL REPLACE;
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
Need to remove city state , zip from address field . If I go by position of space placed , there would be a problem when city is having space in between. (example : new York)
So I have tried below method to derive only address
data want(DROP=I ADDRESS COMB);
set have ;
length hcp_address_1 $50.;
comb = cat(strip(City)," ",strip(STATE)," ",strip(ZIP));
do i = 1 to countw(address) - countw(comb);
hcp_address_1 = catx(' ', address_1,scan(address,i-1));
end;
run;
POSION=prxmatch('m/(\S+\s+\S+\s+\S+)$/',trim(text));
if POSION>1 then result=substr(text,1,POSION-1);
That second example row does not look right. We do not have "zipcode's" in the UK, nor "states". What I would suggest, as its obviously US data you are using, is to find the first numeric, then substr to that -3, something like:
data want; set have; want=substr(have,1,findc(have,"","d")-3); run;
data have;
input x $60.;
call scan(x,-3,p,l,' ');
want=substr(x,1,p-1);
drop p l;
cards;
2001 P LARRY MN MINNESOTA MN 606123714 US
345 R TEENA TR FRANCE ST TN 60612345 UK
;
run;
proc print;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.