Solved: Re: How to remove digits from the observations?

AbuChowdhury · Posted 03-16-2015 02:33 PM

Hi Experts,

The following is the sample of my dataset. I need to remove the first two and last two digits of "firms".

date firms

22/10/2008	AN8068571086
26/02/2010	AN8068571086
05/08/2011	AN8068571086
27/09/2011	AN8068571086
25/10/2011	AN8068571086
23/11/2011	AN8068571086
26/06/2012	AN8068571086

It means, now "firms" will be like 80685710.

Regards,

Abu

Ksharp · Posted 03-17-2015 09:02 AM

data have;

input date ddmmyy10. firms :$15.;

datalines;

22/10/2008 AN8068571086

26/02/2010 AN8068571086

05/08/2011 AN8068571086

27/09/2011 AN8068571086

25/10/2011 AN8068571086

23/11/2011 AN8068571086

26/06/2012 AN8068571086

;

data want;

set have;

length want $ 20;

want=substr(firms,3,length(firms)-4);

run;

View solution in original post

Community_Help · Posted 03-16-2015 03:14 PM

Hi Abu, I was out on the communities today and noticed your question. I'm sure that a SAS expert will answer you soon. I happened to see this, just in case it helps:

naveen_srini · Posted 03-16-2015 03:21 PM

Hi Abu,

data have;

input date ddmmyy10. firms :$15.;

datalines;

22/10/2008 AN8068571086

26/02/2010 AN8068571086

05/08/2011 AN8068571086

27/09/2011 AN8068571086

25/10/2011 AN8068571086

23/11/2011 AN8068571086

26/06/2012 AN8068571086

;

data want;

set have;

do i= 1 to length(firms);

new_firms=substr(firms,i,1);

if i in (1,2) or i=length(firms)-1 or i=length(firms) then continue;

group+1;

output;

end;

call missing(group);

run;

data final;

array a(15) $1 _temporary_;

call missing (of a{*});

do until(last.date);

set want;

by date;

a(group)=new_firms;

firm=cats(of a{*});

end;

keep date firm;

format date ddmmyy10.;

run;

HTH,

Naveen Srinivasan

L&T Infotech

AbuChowdhury · Posted 03-17-2015 08:48 AM

Hi,

After running your codes, more observations are shown in the "Final" dataset than the original dataset. How to address this issue?

naveen_srini · Posted 03-17-2015 09:47 AM

I tested mine using the sample data you provided. It appeared to me the dataset is sorted by date. So, in the 1st datastep, the dataset gets flipped as each date duplicates forms a group. So here unless your real matches with your sample, i can't tell until you let us know what it really looks like. I just reran the same code that I gave you and that seems ok.

Either way, prx is perhaps the most ideal for character manipulation, so I'd suggest implement Haikuo's code.

AbuChowdhury · Posted 03-17-2015 10:20 AM

I include the SAS file.I cannot attach the SAS file, but I have attached the CSV file.

Astounding · Posted 03-16-2015 03:46 PM

It sounds like you are looking for a simple "chop off some characters":

firms = substr(firms, 3, 8);

If it is possible that number of characters changes from record to record, you could use:

firms = substr(firms, 3, length(firms)-4);

Good luck.

AbuChowdhury · Posted 03-17-2015 08:34 AM

Hi,

It is not working and shows the following messages:

NOTE: Numeric values have been converted to character values at the places given by:

(Line):(Column).

127:16

NOTE: Character values have been converted to numeric values at the places given by:

(Line):(Column).

127:9

Ksharp · Posted 03-17-2015 09:02 AM

data have;

input date ddmmyy10. firms :$15.;

datalines;

22/10/2008 AN8068571086

26/02/2010 AN8068571086

05/08/2011 AN8068571086

27/09/2011 AN8068571086

25/10/2011 AN8068571086

23/11/2011 AN8068571086

26/06/2012 AN8068571086

;

data want;

set have;

length want $ 20;

want=substr(firms,3,length(firms)-4);

run;

Tom · Posted 03-17-2015 10:29 AM

You did not include the key information from the log that would show what statement was on line 127 and what variable was being referenced at columns 9 and 16.

Are you trying to remove digits from the middle of a NUMBER? If so then those are not errors just notes showing that SAS had to convert the number to a character string to perform the substr() and then convert the string result back to a number.

AbuChowdhury · Posted 03-17-2015 10:32 AM

No, I want to remove the first two digits and last two digits from "cusip" observations. Those notes were shown in log file. And the output dataset shows blank observations i.e. no observation at all.

Haikuo · Posted 03-16-2015 09:40 PM

data have;

input date ddmmyy10. firms :$15.;

datalines;

22/10/2008 AN8068571086

26/02/2010 AN8068571086

05/08/2011 AN8068571086

27/09/2011 AN8068571086

25/10/2011 AN8068571086

23/11/2011 AN8068571086

26/06/2012 AN8068571086

;

data want;

set have;

new_firms=prxchange('s/^\w\w|\w\w$//io',-1, strip(firms));

run;

art297 · Posted 03-17-2015 09:04 AM

Abu,

Both Astounding's and Haikuo's code should have worked. I think you will have to post an example of your data (in the form of a data step), and the code you ran that produced the notes, in order for anyone here to be able to help you.

AbuChowdhury · Posted 03-17-2015 10:25 AM

Hi,

I cannot attach the SAS file. So I have attached the CSV file.

naveen_srini · Posted 03-17-2015 11:41 AM

Thanks Abu for making it more clear. Thumb rule is always to explain the question well by helping us to help you. I have made the necessary changes in the following:

data want;

set have;

do i= 1 to length(cusip);

new_cusip=substr(cusip,i,1);

if i in (1,2) or i=length(cusip)-1 or i=length(cusip) then continue;

group=_n_;

output;

end;

run;

data final;

array a(15) $1 _temporary_;

call missing (of a{*});

do _n_=1 by 1 until(last.group);

set want;

by group;

a(_n_)=new_cusip;

trunc_cusip=cats(of a{*});

end;

keep tradedate trunc_cusip;

format tradedate ddmmyy10.;

run;

Have a nice day,

Naveen Srinivasan

L&T Infotech

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away