BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kothasaikrishna
Calcite | Level 5

Hi I have dataset as follows:

obs  col1

 1.      ar/t; br/t

 2.      k

 3.      m-p, i

 

I need to create another dataset as follows:

obs  col1

1.     a

2      b

3      k

4     m

5     n

6      o

7      p

8     i

 

can you help me with that. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Thanks, it was dinner time 🙂

 

data have;
	original="ar/t; br/t";
	output;
	original="k";
	output;
	original="m-p, i";
	output;
run;

data want;
	set have;
	*number of terms;
	n_terms=countc(original, ";,")+1;
	
	*remove r/t;
    p1=tranwrd(original, 'r/t', '');


	do i=1 to n_terms;
	    *separate into items;
		term=scan(p1, i, ';, ');

        *check if fields contains hyphen;
		if find(term, '-') then
			do;
			    *find start and end of loop by converting to ascii and back;
				start_letter=rank(scan(term, 1, '-'));
				end_letter=rank(scan(term, 2, '-'));

                *loop through and output for each letter;
				do j=start_letter to end_letter;
					term=byte(j);
					output;
				end;
			end;
	     *if no hyphen output;
	     else output;

	end;
	keep term original n_terms;
run;

View solution in original post

9 REPLIES 9
Reeza
Super User
Are those all the rules? How do you know which to expand and which to truncate?
kothasaikrishna
Calcite | Level 5
I have series alphabets from m to p which were mentioned as m-p, I need to expand those and put them in multiple rows
Reeza
Super User
So we can ignore the first two observations and everything from them? Only worry about that last record then with the hyphen? It's always a hyphen and no other fields have hyphens? If we don't understand the rules, we can't code a solution.
kothasaikrishna
Calcite | Level 5
I have the instructions that i need to take out r/t in the first observation and put the two alphabets a, b in two different observations, for the third observation i need to expand from m to p and put them in different observations and alphabet i comes as the last observation. Thanks!
heffo
Pyrite | Level 9

This is a start, don't know your exact rules. 

 

data want;
	length colout1 $1 _coltemp $ 11;
	set have;
	do _i = 1 to countw(col1,":,"); *loop all the words;
		_coltemp = strip(scan(col1,_i,":,")); 
		*Three different cases, a list, a span or a single value.;
		if index(_coltemp,"-")>0 then do;
			*Span, Might be problematic if the casing is different from start to end. ;
			_start = scan(_coltemp,1,"-");
			_end = scan(_coltemp,2,"-");
			*Rank gets the ASCII value for us to loop from start to end.;
			do _j = rank(_start) to rank(_end);
				*Use byte function to go from ASCII to string. ;
				colout1 = byte(_j);
				output;
			end;
		end;
		else if index(_coltemp,":")>0 then do;
			*A list of values separated (in this case with : ). ;
			do _j =1 to countw(_coltemp,":");
				*Loop all of the values. ;
				colout1 = scan(_coltemp,_j);
				output;
			end;
		end;
		else do;
			*Single value version;
			colout1 = _coltemp;
			output;
		end;
	end;
	drop _:;
run;
Reeza
Super User

Not sure why this isn't working for the last i, but need to be done. Perhaps someone else can fix it, it's pretty close. 

 

data have;
	original="ar/t; br/t";
	output;
	original="k";
	output;
	original="m-p, i";
	output;
run;

data want;
	set have;
	*number of terms;
	n_terms=countc(original, ";,")+1;
    p1=compress(original, 'r/t');

	do i=1 to n_terms;
		term=scan(p1, i, ';, ');

		if find(term, '-') then
			do;
				start_letter=rank(scan(term, 1, '-'));
				end_letter=rank(scan(term, 2, '-'));

				do i=start_letter to end_letter;
					term=byte(i);
					output;
				end;
			end;
	     else output;

	end;
	keep term original;
run;
heffo
Pyrite | Level 9

You are reusing the counter variable (i) in your inner loop. So, it will be more than two the second time it tries to do the outer loop. 🙂

 

Also, I would use 

p1=tranwrd(original, 'r/t','');

instead of the compress. Compress removes all chars, not just that exact string. So, if you have "t-u", then the code will remove "t" as well.

 

Reeza
Super User

Thanks, it was dinner time 🙂

 

data have;
	original="ar/t; br/t";
	output;
	original="k";
	output;
	original="m-p, i";
	output;
run;

data want;
	set have;
	*number of terms;
	n_terms=countc(original, ";,")+1;
	
	*remove r/t;
    p1=tranwrd(original, 'r/t', '');


	do i=1 to n_terms;
	    *separate into items;
		term=scan(p1, i, ';, ');

        *check if fields contains hyphen;
		if find(term, '-') then
			do;
			    *find start and end of loop by converting to ascii and back;
				start_letter=rank(scan(term, 1, '-'));
				end_letter=rank(scan(term, 2, '-'));

                *loop through and output for each letter;
				do j=start_letter to end_letter;
					term=byte(j);
					output;
				end;
			end;
	     *if no hyphen output;
	     else output;

	end;
	keep term original n_terms;
run;
kothasaikrishna
Calcite | Level 5

Thank you so much! that works. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3399 views
  • 4 likes
  • 3 in conversation