- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hey all,
I am trying for a few days already to cut the last 8 digits from a string into a new variable and delete theses digits from the old one. It needs to be done over 100 variables, so it has to be within an array.
The original variables look like that:
1234b8:r:20060607
627::20071012
382027:l:20071212
182b::20071615
17293b:r:20081281
There are 100 of these kind.
I tried scan and substr, that works for a single variable, but somehow within the array, the values with a character between the :: get lost.
The result should look like that:
Var Date
1234b8:r 20060607
382027:l 20071212
182b 20071615
17293b:r 20081281
Probably there is an easy way to do that, but I can't find it.
Thank you in advance
All the best!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The error shows you the problem:
dat_ops95= dat_ops96= dat_ops97= dat_ops98= dat_ops99= dat_ops100= i=101 _ERROR_=1 _N_=1
There is not data in dat_ops95, or 96 etc. Therefore any substr() evaluation will fail as lengthn(string)=0 then -7 = -7, invalid for substr().
Again, post test data exactly as you have it, as a datastep. Follow this post:
It is impossible for me to cover data that you have if I don't know about it (i.e. you post on one column, there is 100 in the log).
Put an if around the calculation;
data test1; set test; array ops_ko ops_ko1-ops_ko100; array dat_ops dat_ops1-dat_ops100; do i=1 to 100; if lengthn(ops_ko{i}) > 0 then do; dat_ops{i}=substr(ops_ko{i},lengthn(ops_ko{i})-7,10); ops_ko{i}=tranwrd(ops_ko{i},strip(dat_ops{i}),""); end; end; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Something like:
data want; length thevar var date $50; thevar="1234b8:r:20060607"; date=substr(thevar,lengthn(thevar)-7,10); var=tranwrd(thevar,strip(date),""); run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this:
data have;
input string : $20.;
cards;
1234b8:r:20060607
627::20071012
382027:l:20071212
182b::20071615
17293b:r:20081281
;
run;
data want;
set have;
length datestr $8;
format date yymmdd10.;
datestr = scan(string,-1,':','m');
date = input(datestr,yymmdd8.);
string = catx(':',scan(string,1,':','m'),scan(string,2,':','m'));
run;
proc print data=want noobs;
run;
The result:
string datestr date 1234b8:r 20060607 2006-06-07 627 20071012 2007-10-12 382027:l 20071212 2007-12-12 182b 20071615 . 17293b:r 20081281 .
Note that you have some invalid dates.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@RW9's code will result in a missing value for var if the first and second strings are the same.
The following is a slight modification of his code that corrects for that:
data have; length thevar $50; input; thevar=_infile_; cards; 1234b8:r:20060607 627::20071012 382027:l:20071212 182b::20071615 17293b:r:20081281 2008128120081281 ; data want; set have; length var date $50; date=substr(thevar,lengthn(thevar)-7,10); var=substr(thevar,1,lengthn(thevar)-8); run;
HTH,
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you can do it for one variable accurately, using arrays shouldn't change the outcome. One example:
data want;
set have;
array original {100} original_1 - original_100;
array dates {100} $ 8 date_1 - date_100;
do _n_=1 to 100;
dates{_n_} = scan(original{_n_}, -1, ':');
original{_n_} = substr(original{_n_}, 1, length(original{_n_})-9);
if substr(original{_n_}, length(original{_n_}), 1) = ':' then substr(original{_n_}, length(original{_n_}), 1) = ' ';
end;
run;
It's untested, but I think I got the parentheses in the proper place.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
Another variation.
data want;
S1='1234b8:r:20060607';
S2=substr(S1,lengthn(S1)-7);
substr(S1,lengthn(S1)-7)='';
run;
Using the substr function on left of =
More on this here: https://v8doc.sas.com/sashtml/lgref/z0212267.htm
Daniel Santos @ www.cgd.pt
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good morning,
I propose this solution
data test;
input var $50.;
cards;
1234b8:r:20060607
627::20071012
382027:l:20071212
182b::20070615
17293b:r:20081231
;run;
DATA test1 (keep=var: date);
format date yymmdd10.;
set test end=eof;
length code $10.;
IF _N_=1 THEN RE = PRXPARSE("/\:\d{8}/");
retain RE;
CALL PRXSUBSTR(RE,var,START,LENGTH);
IF START GT 0 THEN DO;
code = SUBSTRN(var,START + 1,LENGTH - 1);
date =input (code,yymmdd10.);
var2=SUBSTRN(var,1,LENGTH(var) - LENGTH);
output;
end;
run ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all!
But all of the codes make my log file running full when they are used within an array.
Why is that and is there a solution for it?
Somehow it works fine for the first variable, but when used in an array, in the second variable the date sometimes isn't cut off.
Do you know why?
ops_ko{i} is the variable.
DATA testila;
set test;
array ops_ko ops_ko1-ops_ko100;
array dat_ops dat_ops1-dat_ops100;
do i=1 to 100;
if _N_=1 then RE = PRXPARSE("/\:\d{8}/");
retain RE;
call PRXSUBSTR(RE,ops_ko{i},START,LENGTH);
if START GT 0 then do;
dat_ops{i} = SUBSTRN(ops_ko{i},START + 1,LENGTH - 1);
ops_ko{i} = SUBSTRN(ops_ko{i},1,LENGTH(ops_ko{i}) - 9);
output;
end;
end;
drop i Start Length RE;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then your doing something other than the suggestions posted here. My datastep above should be printed to the log, followed by three other text rows with note, and two rows of time taken to run. Nothing more. Without seeing your log/code we can't tell. Is it your setup, are you running other things, are you printing something to log etc?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the fast reply!
I am only running this code:
data test1;
set test;
array ops_ko ops_ko1-ops_ko100;
array dat_ops dat_ops1-dat_ops100;
do i=1 to 100;
dat_ops{i}=substr(ops_ko{i},lengthn(ops_ko{i})-7,10);
ops_ko{i}=tranwrd(ops_ko{i},strip(dat_ops{i}),"");
end;
run;
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data test;
input @1 var $17. @18 var1 $20.;
cards;
1234b8:r:20060607 1234b8:r:20060607
152::20071012 627::20091012
382027:l:20071212 382027:l:20051212
182b::20070615 182b::20010615
17293b:r:20081231 17293b:r:20001231
;run;
DATA test1;
format date date1 yymmdd10.;
set test;
array vr (*) $ var var1;
array cd (*) $ code code1;
array dt(*) $ date date1;
array pr (*) $ prt1 prt2;
do i=1 to dim(vr);
if _N_=1 then RE = PRXPARSE("/\:\d{8}/");
retain RE;
call PRXSUBSTR(RE,vr(i),START,LENGTH);
if START GT 0 then do;
cd{i} = SUBSTRN(vr{i},START + 1,LENGTH - 1);
dt(i) =input (cd(i),yymmdd10.);
pr(i)=SUBSTRN(vr(i),1,LENGTH(vr(i)) - LENGTH);
end;
end;
drop i Start Length RE;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you post the log? Are you getting lots of notes or something?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you post the log? Are you getting lots of notes or something?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi RW9,
this is the code:
data test2;
set test;
array ops_ko ops_ko1-ops_ko100;
array dat_ops dat_ops1-dat_ops100;
do i=1 to 100;
*length ops_ko date $50;
dat_ops{i}=substr(ops_ko{i},lengthn(ops_ko{i})-7,10);
ops_ko{i}=substr(ops_ko{i},1,lengthn(ops_ko{i})-8);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, you have actually posted the bit of the log right between important bits. I am specifically looking at notes/errors/warnings. The reason being is this kind of output occurs when you substr invalid data, or input invalid numeric data or something along those lines. What SAS does is print out all the data for that observation where that error occurs, hence as all your data seems to have the same problem, i.e. a logic problem, then you are getting large amounts of output in the log.
It is likely down to your array statements:
array ops_ko ops_ko1-ops_ko100;
These create numeric arrays, and you are trying to put substring information into them. Try chaning to something like:
array ops_ko $200. ops_ko1-ops_ko100;
Also note, this is a very good example of why it is recommended to post test data - in the form of a datastep - so that we can see exactly what you are working with. I do not see what you do, nor can I run what you are running!