BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sson2019
Fluorite | Level 6

Hello,

The basic goal of the program is to split any values that has range of values separated by dash or delimiter to be split into rows. The file we get is in excel format and we do not know which rows will need the split, the program should be able to detect it. So, my program does check the individual columns, I also know how to do the splitting for just one variable, but I am not sure of how to perform multiple splits on the same data. So, only the first column to be split works correctly, but the next column does not get split correctly.

The data looks like this:

var1var2var3
999997-999999 1
999997-999999 7,8,9
999997-999999 2
999997-9999997,8,91
999997-99999911

 

This is what the expected data set should look like:

var1var2var3
999997 1
999998 1
999999 1
999997 7
999998 7
999999 7
999997 8
999998 8
999999 8
999997 9
999998 9
999999 9
999997 2
999998 2
999999 2
9999977 
9999987 
9999997 
9999978 
9999988 
9999998 
9999979 
9999989 
9999999 
99999711
99999811
99999911

Here is the code I am using:

data test;
	set adult;
   	array vars {*} _character_;
	
	do i = 1 to dim(vars);
   		if findc(vars[i],'-') > 0 then do;
			first=scan(vars[i],1,'-');
			last=scan(vars[i],2,'-');
			pos = indexc(vars[i], '-');
			do range = first to last;
				len = length(compress(put(range,32.)));
				len_new = pos - 1 - len;
				if len_new ne 0 then vars[i] = cats(repeat('0', len_new-1), range);
				else vars[i] = range;
				output;
			end;
		end;
		if findc(vars[i],',') > 0 then do;
			do j=1 by 1 while(scan(vars[i],j,',') ^=' ');
				new=scan(vars[i],j,',');
				output;
			end;	
		end;
   	end;
	
   	*drop i j;
run;

Please, help me fix this. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@sson2019:

After all of your explanations, it appears that you want to cross all numbers in the VAR1 range with all VAR2 values and then with all VAR3 values and output every combination. In principle, your code looks like a diligent stab at this logic. I'd suggest a bit simpler scheme:

data have ;                                                                                                                             
  input (var1 var2 var3) (:$13.) ;                                                                                                      
  cards ;                                                                                                                               
999997-999999  .      1                                                                                                                 
999997-999999  .      7,8,9                                                                                                             
999997-999999  .      2                                                                                                                 
999997-999999  7,8,9  .                                                                                                                 
999997-999999  1      .                                                                                                                 
999995-999997  1,2    3,4                                                                                                               
999994         1,3    5,7,8                                                                                                             
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  set have (rename=var1=_v1) ;                                                                                                          
  _v2 = var2 ;                                                                                                                          
  _v3 = var3 ;                                                                                                                          
  if cmiss (_v2) then _v2 = "#" ;                                                                                                       
  if cmiss (_v3) then _v3 = "#" ;                                                                                                       
  do _x2 = 1 to countw (_v2) ;                                                                                                          
    var2 = scan (_v2, _x2, ",") ;                                                                                                       
    var2 = ifc (var2 = "#", "", var2) ;                                                                                                 
    do _x3 = 1 to countw (_v3) ;                                                                                                        
      var3 = scan (_v3, _x3, ",") ;                                                                                                     
      var3 = ifc (var3 = "#", "", var3) ;                                                                                               
      var1 = input (scan (_v1, 1), 8.) ;                                                                                                
      do var1 = var1 to var1 max input (scan (_v1, 2), 8.) ;                                                                            
        output ;                                                                                                                        
      end ;                                                                                                                             
    end ;                                                                                                                               
  end ;                                                                                                                                 
run ;                                     

The hash sign serves as a dummy character to simplify the scanning by eschewing more numerous IFs.

 

Kind regards

Paul D.  

View solution in original post

7 REPLIES 7
ballardw
Super User

Can you explain why the row

999997-999999 7,8,9 1

 

has no values for var3 when expanded

but

999997-999999 1 1

does.

 

And you really need to show the expected result for something like

999997-999999 3,5,7 1,2

or explicitly state that there are absolutely never going to be cases where var2 and var3 both have more than one value at the same time.

Also, when there are values in var1 are they always  sequential or could you have something like: 999995, 999997, 999999?

 

If you, or some one in your organization, took existing data and combined it into that form from another you might be better off going to an earlier data set.

 

sson2019
Fluorite | Level 6
The value for var3 should be 1 when expanding
999997-999999  7,8,9 1
sorry I made a mistake in the expected data.
There can be cases were all three variables have a range of values like this:
999997-999999 3,5,7 1,2
var 1 can have sometimes values not showing range, it could have something like: 999995, 999997, 999999
This is how we receive data
ballardw
Super User

@sson2019 wrote:
The value for var3 should be 1 when expanding
999997-999999  7,8,9 1
sorry I made a mistake in the expected data.
There can be cases were all three variables have a range of values like this:
999997-999999 3,5,7 1,2
var 1 can have sometimes values not showing range, it could have something like: 999995, 999997, 999999
This is how we receive data

So, what do the results for data like this look like in the output?

999997-999999 3,5,7 1,2

Is the order important?

sson2019
Fluorite | Level 6
This is what it should look like, order is not important at all
999997 3 1
999998 3 1
999999 3 1
999997 3 2
999998 3 2
999999 3 2
999997 5 1
999998 5 1
999999 5 1
999997 5 2
999998 5 2
999999 5 2
999997 7 1
999998 7 1
999999 7 1
999997 7 2
999998 7 2
999999 7 2
Tom
Super User Tom
Super User

It might be a lot easier to just convert your data into code instead.  The values of your variables look like values you could use in a DO loop.  For the ranges just replace the hyphen with TO. And make sure to replace the blanks with a missing value character.

filename code temp;
data _null_;
  file code;
  set have ;
  array vars var1-var3 ;
  put 'row=' _n_ ';' ;
  do i=1 to dim(vars);
    if missing(vars[i]) then vars[i]='.';
    vars[i]=tranwrd(vars[i],'-',' to ');
    put 'do ' vars[i]= ';' ;
  end;
  put 'output;';
  do i=1 to dim(vars);
    put 'end;';
  end;
run;

data want ;
  %include code ;
  stop;
run;
hashman
Ammonite | Level 13

@sson2019:

After all of your explanations, it appears that you want to cross all numbers in the VAR1 range with all VAR2 values and then with all VAR3 values and output every combination. In principle, your code looks like a diligent stab at this logic. I'd suggest a bit simpler scheme:

data have ;                                                                                                                             
  input (var1 var2 var3) (:$13.) ;                                                                                                      
  cards ;                                                                                                                               
999997-999999  .      1                                                                                                                 
999997-999999  .      7,8,9                                                                                                             
999997-999999  .      2                                                                                                                 
999997-999999  7,8,9  .                                                                                                                 
999997-999999  1      .                                                                                                                 
999995-999997  1,2    3,4                                                                                                               
999994         1,3    5,7,8                                                                                                             
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  set have (rename=var1=_v1) ;                                                                                                          
  _v2 = var2 ;                                                                                                                          
  _v3 = var3 ;                                                                                                                          
  if cmiss (_v2) then _v2 = "#" ;                                                                                                       
  if cmiss (_v3) then _v3 = "#" ;                                                                                                       
  do _x2 = 1 to countw (_v2) ;                                                                                                          
    var2 = scan (_v2, _x2, ",") ;                                                                                                       
    var2 = ifc (var2 = "#", "", var2) ;                                                                                                 
    do _x3 = 1 to countw (_v3) ;                                                                                                        
      var3 = scan (_v3, _x3, ",") ;                                                                                                     
      var3 = ifc (var3 = "#", "", var3) ;                                                                                               
      var1 = input (scan (_v1, 1), 8.) ;                                                                                                
      do var1 = var1 to var1 max input (scan (_v1, 2), 8.) ;                                                                            
        output ;                                                                                                                        
      end ;                                                                                                                             
    end ;                                                                                                                               
  end ;                                                                                                                                 
run ;                                     

The hash sign serves as a dummy character to simplify the scanning by eschewing more numerous IFs.

 

Kind regards

Paul D.  

sson2019
Fluorite | Level 6
Thanks Paul, that worked perfectly!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2392 views
  • 1 like
  • 4 in conversation