Solved: Re: Creating multiple records from single record

kothasaikrishna · Posted 07-16-2019 06:18 PM

Hi I have dataset as follows:

obs col1

1. ar/t; br/t

2. k

3. m-p, i

I need to create another dataset as follows:

obs col1

1. a

2 b

3 k

4 m

5 n

6 o

7 p

8 i

can you help me with that. Thanks!

Reeza · Posted 07-16-2019 10:32 PM

Thanks, it was dinner time 🙂

data have;
	original="ar/t; br/t";
	output;
	original="k";
	output;
	original="m-p, i";
	output;
run;

data want;
	set have;
	*number of terms;
	n_terms=countc(original, ";,")+1;
	
	*remove r/t;
    p1=tranwrd(original, 'r/t', '');


	do i=1 to n_terms;
	    *separate into items;
		term=scan(p1, i, ';, ');

        *check if fields contains hyphen;
		if find(term, '-') then
			do;
			    *find start and end of loop by converting to ascii and back;
				start_letter=rank(scan(term, 1, '-'));
				end_letter=rank(scan(term, 2, '-'));

                *loop through and output for each letter;
				do j=start_letter to end_letter;
					term=byte(j);
					output;
				end;
			end;
	     *if no hyphen output;
	     else output;

	end;
	keep term original n_terms;
run;

View solution in original post

Reeza · Posted 07-16-2019 06:31 PM

Are those all the rules? How do you know which to expand and which to truncate?

kothasaikrishna · Posted 07-16-2019 06:37 PM

I have series alphabets from m to p which were mentioned as m-p, I need to expand those and put them in multiple rows

Reeza · Posted 07-16-2019 06:40 PM

So we can ignore the first two observations and everything from them? Only worry about that last record then with the hyphen? It's always a hyphen and no other fields have hyphens? If we don't understand the rules, we can't code a solution.

kothasaikrishna · Posted 07-16-2019 06:55 PM

I have the instructions that i need to take out r/t in the first observation and put the two alphabets a, b in two different observations, for the third observation i need to expand from m to p and put them in different observations and alphabet i comes as the last observation. Thanks!

heffo · Posted 07-16-2019 07:35 PM

This is a start, don't know your exact rules.

data want;
	length colout1 $1 _coltemp $ 11;
	set have;
	do _i = 1 to countw(col1,":,"); *loop all the words;
		_coltemp = strip(scan(col1,_i,":,")); 
		*Three different cases, a list, a span or a single value.;
		if index(_coltemp,"-")>0 then do;
			*Span, Might be problematic if the casing is different from start to end. ;
			_start = scan(_coltemp,1,"-");
			_end = scan(_coltemp,2,"-");
			*Rank gets the ASCII value for us to loop from start to end.;
			do _j = rank(_start) to rank(_end);
				*Use byte function to go from ASCII to string. ;
				colout1 = byte(_j);
				output;
			end;
		end;
		else if index(_coltemp,":")>0 then do;
			*A list of values separated (in this case with : ). ;
			do _j =1 to countw(_coltemp,":");
				*Loop all of the values. ;
				colout1 = scan(_coltemp,_j);
				output;
			end;
		end;
		else do;
			*Single value version;
			colout1 = _coltemp;
			output;
		end;
	end;
	drop _:;
run;

Reeza · Posted 07-16-2019 09:38 PM

Not sure why this isn't working for the last i, but need to be done. Perhaps someone else can fix it, it's pretty close.

data have;
	original="ar/t; br/t";
	output;
	original="k";
	output;
	original="m-p, i";
	output;
run;

data want;
	set have;
	*number of terms;
	n_terms=countc(original, ";,")+1;
    p1=compress(original, 'r/t');

	do i=1 to n_terms;
		term=scan(p1, i, ';, ');

		if find(term, '-') then
			do;
				start_letter=rank(scan(term, 1, '-'));
				end_letter=rank(scan(term, 2, '-'));

				do i=start_letter to end_letter;
					term=byte(i);
					output;
				end;
			end;
	     else output;

	end;
	keep term original;
run;

heffo · Posted 07-16-2019 10:13 PM

You are reusing the counter variable (i) in your inner loop. So, it will be more than two the second time it tries to do the outer loop. 🙂

Also, I would use

p1=tranwrd(original, 'r/t','');

instead of the compress. Compress removes all chars, not just that exact string. So, if you have "t-u", then the code will remove "t" as well.

Reeza · Posted 07-16-2019 10:32 PM

Thanks, it was dinner time 🙂

data have;
	original="ar/t; br/t";
	output;
	original="k";
	output;
	original="m-p, i";
	output;
run;

data want;
	set have;
	*number of terms;
	n_terms=countc(original, ";,")+1;
	
	*remove r/t;
    p1=tranwrd(original, 'r/t', '');


	do i=1 to n_terms;
	    *separate into items;
		term=scan(p1, i, ';, ');

        *check if fields contains hyphen;
		if find(term, '-') then
			do;
			    *find start and end of loop by converting to ascii and back;
				start_letter=rank(scan(term, 1, '-'));
				end_letter=rank(scan(term, 2, '-'));

                *loop through and output for each letter;
				do j=start_letter to end_letter;
					term=byte(j);
					output;
				end;
			end;
	     *if no hyphen output;
	     else output;

	end;
	keep term original n_terms;
run;

kothasaikrishna · Posted 07-17-2019 01:08 PM

Thank you so much! that works.