- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi I have dataset as follows:
obs col1
1. ar/t; br/t
2. k
3. m-p, i
I need to create another dataset as follows:
obs col1
1. a
2 b
3 k
4 m
5 n
6 o
7 p
8 i
can you help me with that. Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, it was dinner time 🙂
data have;
original="ar/t; br/t";
output;
original="k";
output;
original="m-p, i";
output;
run;
data want;
set have;
*number of terms;
n_terms=countc(original, ";,")+1;
*remove r/t;
p1=tranwrd(original, 'r/t', '');
do i=1 to n_terms;
*separate into items;
term=scan(p1, i, ';, ');
*check if fields contains hyphen;
if find(term, '-') then
do;
*find start and end of loop by converting to ascii and back;
start_letter=rank(scan(term, 1, '-'));
end_letter=rank(scan(term, 2, '-'));
*loop through and output for each letter;
do j=start_letter to end_letter;
term=byte(j);
output;
end;
end;
*if no hyphen output;
else output;
end;
keep term original n_terms;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is a start, don't know your exact rules.
data want;
length colout1 $1 _coltemp $ 11;
set have;
do _i = 1 to countw(col1,":,"); *loop all the words;
_coltemp = strip(scan(col1,_i,":,"));
*Three different cases, a list, a span or a single value.;
if index(_coltemp,"-")>0 then do;
*Span, Might be problematic if the casing is different from start to end. ;
_start = scan(_coltemp,1,"-");
_end = scan(_coltemp,2,"-");
*Rank gets the ASCII value for us to loop from start to end.;
do _j = rank(_start) to rank(_end);
*Use byte function to go from ASCII to string. ;
colout1 = byte(_j);
output;
end;
end;
else if index(_coltemp,":")>0 then do;
*A list of values separated (in this case with : ). ;
do _j =1 to countw(_coltemp,":");
*Loop all of the values. ;
colout1 = scan(_coltemp,_j);
output;
end;
end;
else do;
*Single value version;
colout1 = _coltemp;
output;
end;
end;
drop _:;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure why this isn't working for the last i, but need to be done. Perhaps someone else can fix it, it's pretty close.
data have;
original="ar/t; br/t";
output;
original="k";
output;
original="m-p, i";
output;
run;
data want;
set have;
*number of terms;
n_terms=countc(original, ";,")+1;
p1=compress(original, 'r/t');
do i=1 to n_terms;
term=scan(p1, i, ';, ');
if find(term, '-') then
do;
start_letter=rank(scan(term, 1, '-'));
end_letter=rank(scan(term, 2, '-'));
do i=start_letter to end_letter;
term=byte(i);
output;
end;
end;
else output;
end;
keep term original;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are reusing the counter variable (i) in your inner loop. So, it will be more than two the second time it tries to do the outer loop. 🙂
Also, I would use
p1=tranwrd(original, 'r/t','');
instead of the compress. Compress removes all chars, not just that exact string. So, if you have "t-u", then the code will remove "t" as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, it was dinner time 🙂
data have;
original="ar/t; br/t";
output;
original="k";
output;
original="m-p, i";
output;
run;
data want;
set have;
*number of terms;
n_terms=countc(original, ";,")+1;
*remove r/t;
p1=tranwrd(original, 'r/t', '');
do i=1 to n_terms;
*separate into items;
term=scan(p1, i, ';, ');
*check if fields contains hyphen;
if find(term, '-') then
do;
*find start and end of loop by converting to ascii and back;
start_letter=rank(scan(term, 1, '-'));
end_letter=rank(scan(term, 2, '-'));
*loop through and output for each letter;
do j=start_letter to end_letter;
term=byte(j);
output;
end;
end;
*if no hyphen output;
else output;
end;
keep term original n_terms;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much! that works.