How to split a string at less than 200 character in order to derive SDTM variable like DVTERMxx? My variable value has a length of 650 character so my assumption is it ends up in 4 variables. I am not sure. Any help will be appreciated.
An example of what you have and what the output should be would help. Differe types of string values may be manipulated easiest with different approaches.
Here is an example:
Var1= I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.
I want to split this var at 200 character without chopping any word but get close to 200 and make the first 200 as my first var like xxTERM and the rest of the string will go to the next variable xxTERM01 , xxTERM02 and so on. I hope this make sense?
data have;
var1="I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.";
run;
proc sql;
select cats(ceil(max(length(var1))/200)) into :dim from have;
quit;
data want;
set have;
array split(&dim) $200. xxTERM1-xxTERM&dim;j=1;
do i=1 to &dim WHILE (not missing(scan(var1,j)));
len=0;
do j=j by 1 while (not missing(scan(var1,j)));
if length(scan(var1,j))+len>200 then leave;
split(i)=catx(' ',split(i),scan(var1,j));
len=length(split(i));
end;
end;
keep xx:;
run;
Haikuo
If you don't want to keep var1 in the output dataset then try this. The proc sql setting the variable dim by Haikuo needs further testing. As long as your text contains short words, using ceil will most likely provide a number large enough to create the required number of substring-variables.
proc sql noprint;
select ceil(max(length(var1)) / 200)
into :dim separated by '' /* strips leading blanks */
from have
;
quit;
data want;
set have;
length
Part01-Part&dim $ 200
i 8
;
array parts Part:;
i = 1;
do while (length(Var1) > 200);
blankPos = find(var1, ' ', -200);
parts = substr(Var1, 1, blankPos -1);
i = i + 1;
Var1 = substr(Var1, blankPos);
end;
if length(var1) > 0 then do;
parts = var1;
end;
drop var1 i blankPos /*Buffer*/;
run;
Or transpose it into vertical.
data have; length var1 $ 4000 ; var1='I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.'; output; var1='I am going to market to get some groceries.I will not return till the evening. If there is anyway possible for you to cook the dinner.I will not return till the evening. If there is anyway possible for you to cook the dinner.I will not return till the evening. If there is anyway possible for you to cook the dinner.I will not return till the evening. If there is anyway possible for you to cook the dinner.I will not return till the evening. If there is anyway possible for you to cook the dinner.'; output; run; data temp(keep=id x); set have; id=_n_; do i=1 to countw(var1,' '); x=scan(var1,i,' '); output; end; run; data temp; set temp; length xx $ 400; retain xx; retain n 1; xx=catx(' ',xx,x); if id ne lag(id) then do;n=1; xx=x;end; if length(xx) gt 200 then do;n+1;xx=x;end; run; data x; set temp; by n notsorted; if last.n; run; proc transpose data=x out=want prefix=temp; by id; id n; var xx; run;
Ksharp
Hi,
I have a couple of questions in the code.
data temp;
set temp;
length xx $ 400; <<<<<----length of var1 was 40000 and why are we giving a length of only 400 to xx??????
retain xx;
retain n 1; <<<<<<<<<<<<<<<<----------------------------does this mean n variable to have a length of 1 while it retains???
xx=catx(' ',xx,x);
if id ne lag(id) then do;n=1; xx=x;end;
if length(xx) gt 200 then do;n+1;xx=x;end; /*i also have difficulty in understanding what these 2 lines of code does???*/
run;
Please help
Thanks
"
length xx $ 400; | <<<<<----length of var1 was 40000 and why are we giving a length of only 400 to xx?????? |
"
It is used to hold a variable which has $200. you need . I make it as $400. is to make sure greater than $200, and we can judge if its length is greater than $200 or not. (if length(xx) gt 200 )
"retain n 1; <<<<<<<<<<<<<<<<----------------------------does this mean n variable to have a length of 1 while it retains???"
no, it means n is numeric variable and its initiative value is 1 .
"
if id ne lag(id) then do;n=1; xx=x;end;"
at the beginning of every group, I initiative xx as x (xx has the same value with x).and reinstate group id (variable n) as 1.
"if length(xx) gt 200 then do;n+1;xx=x;end; /*i also have difficulty in understanding what these 2 lines of code does???*/"
I make several groups (variable n), each group 's length ( variable xx 's length) is not greater than 200 which you need to split.
Actually here xx variable is what we need, then proc transpose to make it what we finally want.
Ksharp
Message was edited by: xia keshan
Hi,
I ran through the code again.
I was wondering about the following statement:
if id ne lag(id) then do;n=1; xx=x;end;"
There are 151 words in the first statement
there are 94 words in the second statement
lag(id) is the same for the first 151 records except for the 1st record so n should not equls 1.But to my surprise I find n=1 after running this code
Could some one explain to me please...Same with the other 94 records
Thanks
Actually it equals if first.id then .............
when it is the first obs of group id (first.id) then assign the first word of 151 words to variable xx and n=1;
at this time, if length(xx) gt 200 will evaluate the first word 's length ,it should not be great than 200 so n will not +1 .
Ksharp
If you just want to chop into 200 character pieces then just do it.
Let's say you have variable that is 650. So that will take 4 pieces, 1-200, 201-400, 401-600 and 601 on.
data want ;
set have ;
array _x $200 DVTERM01 - DVTERM04 ;
do i=1 to 650 by 200 ;
_x(i) = substr(dvterm,i);
end;
run;
I like better the solution given by andreas_Ids than the first one presented by Ksharp. It's more clear, more simple and did not lose the dots that var1 contais to separate the sentences.
@CTorres: But that solution will fail if a sufficiently long word exists in the last split that would cause 5 parts rather than 4 even though the string length is 800. Below is such a case where the code would fail, along with code that wouldn't fail given a string like that:
data have (drop=i);
var1="I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast ofhaving the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to those open allnight grocery stores.";
do i=1 to 1000;output;end;
run;
data need (keep=_line _partnum part);
set have;
length part $200;
_line+1;
_partnum=1;
j=1;
do WHILE (not missing(scan(var1,j,' ')));
if length(catx(' ',part,scan(var1,j,' '))) gt 200 or
missing(scan(var1,j+1,' ')) then do;
output;
_partnum+1;
part=scan(var1,j,' ');
end;
else part=catx(' ',part,scan(var1,j,' '));
j+1;
end;
run;
proc transpose data=need out=want (drop=_:) prefix=part;
var part;
id _partnum;
by _line;
run;
BTW. my code will still keep the dots variable contains .
I find the following quite useful.
%LET DIM = ;
%LET DSID = %SYSFUNC(OPEN(HAVE,I));
%LET DIM = %SYSFUNC(CEIL(%EVAL(%SYSFUNC(VARLEN(&DSID,%SYSFUNC(VARNUM(&DSID,STDM))))/200)));
%LET RC = %SYSFUNC(CLOSE(&DSID));
DATA WANT ;
SET HAVE;
ARRAY SPLITIT $200 DVTERM1-DVTERM&DIM. _CHARACTER_;
DO I = 1 TO &DIM.;
SPLITIT{I} = SUBSTR(STDM,(I-1)*200+1);
END;
RUN;
%PUT &DIM.;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.