Fluorite | Level 6

Split multiple variables into multiple rows

Hello,

The basic goal of the program is to split any values that has range of values separated by dash or delimiter to be split into rows. The file we get is in excel format and we do not know which rows will need the split, the program should be able to detect it. So, my program does check the individual columns, I also know how to do the splitting for just one variable, but I am not sure of how to perform multiple splits on the same data. So, only the first column to be split works correctly, but the next column does not get split correctly.

The data looks like this:

 var1 var2 var3 999997-999999 1 999997-999999 7,8,9 999997-999999 2 999997-999999 7,8,9 1 999997-999999 1 1

This is what the expected data set should look like:

 var1 var2 var3 999997 1 999998 1 999999 1 999997 7 999998 7 999999 7 999997 8 999998 8 999999 8 999997 9 999998 9 999999 9 999997 2 999998 2 999999 2 999997 7 999998 7 999999 7 999997 8 999998 8 999999 8 999997 9 999998 9 999999 9 999997 1 1 999998 1 1 999999 1 1

Here is the code I am using:

```data test;
array vars {*} _character_;

do i = 1 to dim(vars);
if findc(vars[i],'-') > 0 then do;
first=scan(vars[i],1,'-');
last=scan(vars[i],2,'-');
pos = indexc(vars[i], '-');
do range = first to last;
len = length(compress(put(range,32.)));
len_new = pos - 1 - len;
if len_new ne 0 then vars[i] = cats(repeat('0', len_new-1), range);
else vars[i] = range;
output;
end;
end;
if findc(vars[i],',') > 0 then do;
do j=1 by 1 while(scan(vars[i],j,',') ^=' ');
new=scan(vars[i],j,',');
output;
end;
end;
end;

*drop i j;
run;```

Please, help me fix this. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Ammonite | Level 13

Re: Split multiple variables into multiple rows

After all of your explanations, it appears that you want to cross all numbers in the VAR1 range with all VAR2 values and then with all VAR3 values and output every combination. In principle, your code looks like a diligent stab at this logic. I'd suggest a bit simpler scheme:

``````data have ;
input (var1 var2 var3) (:\$13.) ;
cards ;
999997-999999  .      1
999997-999999  .      7,8,9
999997-999999  .      2
999997-999999  7,8,9  .
999997-999999  1      .
999995-999997  1,2    3,4
999994         1,3    5,7,8
;
run ;

data want (drop = _:) ;
set have (rename=var1=_v1) ;
_v2 = var2 ;
_v3 = var3 ;
if cmiss (_v2) then _v2 = "#" ;
if cmiss (_v3) then _v3 = "#" ;
do _x2 = 1 to countw (_v2) ;
var2 = scan (_v2, _x2, ",") ;
var2 = ifc (var2 = "#", "", var2) ;
do _x3 = 1 to countw (_v3) ;
var3 = scan (_v3, _x3, ",") ;
var3 = ifc (var3 = "#", "", var3) ;
var1 = input (scan (_v1, 1), 8.) ;
do var1 = var1 to var1 max input (scan (_v1, 2), 8.) ;
output ;
end ;
end ;
end ;
run ;
``````

The hash sign serves as a dummy character to simplify the scanning by eschewing more numerous IFs.

Kind regards

Paul D.

7 REPLIES 7
Super User

Re: Split multiple variables into multiple rows

Can you explain why the row

 999997-999999 7,8,9 1

has no values for var3 when expanded

but

 999997-999999 1 1

does.

And you really need to show the expected result for something like

 999997-999999 3,5,7 1,2

or explicitly state that there are absolutely never going to be cases where var2 and var3 both have more than one value at the same time.

Also, when there are values in var1 are they always  sequential or could you have something like: 999995, 999997, 999999?

If you, or some one in your organization, took existing data and combined it into that form from another you might be better off going to an earlier data set.

Fluorite | Level 6

Re: Split multiple variables into multiple rows

The value for var3 should be 1 when expanding
999997-999999  7,8,9 1
sorry I made a mistake in the expected data.
There can be cases were all three variables have a range of values like this:
999997-999999 3,5,7 1,2
var 1 can have sometimes values not showing range, it could have something like: 999995, 999997, 999999
This is how we receive data
Super User

Re: Split multiple variables into multiple rows

@sson2019 wrote:
The value for var3 should be 1 when expanding
999997-999999  7,8,9 1
sorry I made a mistake in the expected data.
There can be cases were all three variables have a range of values like this:
999997-999999 3,5,7 1,2
var 1 can have sometimes values not showing range, it could have something like: 999995, 999997, 999999
This is how we receive data

So, what do the results for data like this look like in the output?

999997-999999 3,5,7 1,2

Is the order important?

Fluorite | Level 6

Re: Split multiple variables into multiple rows

This is what it should look like, order is not important at all
999997 3 1
999998 3 1
999999 3 1
999997 3 2
999998 3 2
999999 3 2
999997 5 1
999998 5 1
999999 5 1
999997 5 2
999998 5 2
999999 5 2
999997 7 1
999998 7 1
999999 7 1
999997 7 2
999998 7 2
999999 7 2
Super User

Re: Split multiple variables into multiple rows

It might be a lot easier to just convert your data into code instead.  The values of your variables look like values you could use in a DO loop.  For the ranges just replace the hyphen with TO. And make sure to replace the blanks with a missing value character.

``````filename code temp;
data _null_;
file code;
set have ;
array vars var1-var3 ;
put 'row=' _n_ ';' ;
do i=1 to dim(vars);
if missing(vars[i]) then vars[i]='.';
vars[i]=tranwrd(vars[i],'-',' to ');
put 'do ' vars[i]= ';' ;
end;
put 'output;';
do i=1 to dim(vars);
put 'end;';
end;
run;

data want ;
%include code ;
stop;
run;``````
Ammonite | Level 13

Re: Split multiple variables into multiple rows

After all of your explanations, it appears that you want to cross all numbers in the VAR1 range with all VAR2 values and then with all VAR3 values and output every combination. In principle, your code looks like a diligent stab at this logic. I'd suggest a bit simpler scheme:

``````data have ;
input (var1 var2 var3) (:\$13.) ;
cards ;
999997-999999  .      1
999997-999999  .      7,8,9
999997-999999  .      2
999997-999999  7,8,9  .
999997-999999  1      .
999995-999997  1,2    3,4
999994         1,3    5,7,8
;
run ;

data want (drop = _:) ;
set have (rename=var1=_v1) ;
_v2 = var2 ;
_v3 = var3 ;
if cmiss (_v2) then _v2 = "#" ;
if cmiss (_v3) then _v3 = "#" ;
do _x2 = 1 to countw (_v2) ;
var2 = scan (_v2, _x2, ",") ;
var2 = ifc (var2 = "#", "", var2) ;
do _x3 = 1 to countw (_v3) ;
var3 = scan (_v3, _x3, ",") ;
var3 = ifc (var3 = "#", "", var3) ;
var1 = input (scan (_v1, 1), 8.) ;
do var1 = var1 to var1 max input (scan (_v1, 2), 8.) ;
output ;
end ;
end ;
end ;
run ;
``````

The hash sign serves as a dummy character to simplify the scanning by eschewing more numerous IFs.

Kind regards

Paul D.

Fluorite | Level 6

Re: Split multiple variables into multiple rows

Thanks Paul, that worked perfectly!
Discussion stats
• 7 replies
• 2447 views
• 1 like
• 4 in conversation