Dear friends;
I need to split a value from a row into equal width intervals. I have this:
data HAVE; infile datalines delimiter=',';
input Name $ A; datalines;
Well1 100
Well2 200
Well3 300
;
What I want as a result is this:
| Name | A | 
| Well1 | 100 | 
| 120 | |
| 140 | |
| 160 | |
| 180 | |
| Well2 | 200 | 
| 220 | |
| 240 | |
| 260 | |
| 280 | |
| Well3 | 300 | 
| 320 | |
| 340 | |
| 360 | |
| 380 | 
Can you please help me with this  problem?
I need the general methodology.
Thank you very much
Best Regards
Farshid
I assume that in your sample output, you meant to place 23 in line with Well3; otherwise, I don't catch the pattern.
If I'm right, you just have to add B to the CALL MISSING variable list:
data have ;                                                
  input Name $ A split B ;                                 
  cards ;                                                  
Well1 100   5  18                                          
Well2 120  20  21                                          
Well3 280  15  23                                          
Well4 325  50  28                                          
Well5 400  40  42                                          
;                                                          
run ;                                                      
                                                           
data want (drop = _:) ;                                    
  merge have have (firstobs=2 keep=A rename=A=_A) ;        
  output ;                                                 
  call missing (name, B) ;                                 
  do A = A + split to max (A, _A) by split while (A < _A) ;
    output ;                                               
  end ;                                                    
run ;                                                      
Kind regards
Paul D.
Try this:
data have ;            
  input Name $ A ;     
  cards ;              
Well1 100              
Well2 200              
Well3 300              
;                      
run ;                  
                       
data want ;            
  set have ;           
  output ;             
  call missing (name) ;
  do _n_ = 1 to 4 ;    
    A + 20 ;           
    output ;           
  end ;                
run ;                  
Kind regards
Paul D.
Dear Paul ,
I have a new challenge now (I have several, but take one thing at a time).
Now, I have the following input:
data have ; 
   infile datalines delimiter=',';           
  input Name $ A Split;     
  cards ;              
Well1, 100,  10              
Well2, 200,  20            
Well3, 300,  50            
;                      
run ;The "Split" will decide how many times the values between the two rows will be divided. I like to have the following result:
| Name | A | 
| Well1 | 100 | 
| 110 | |
| 120 | |
| 130 | |
| 140 | |
| 150 | |
| 160 | |
| 170 | |
| 180 | |
| 190 | |
| Well2 | 200 | 
| 220 | |
| 240 | |
| 260 | |
| 280 | |
| Well3 | 300 | 
| 350 | |
| Well4 | 400 | 
| ... | 
Can you please help me with that too?
It's easy:
data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100  10                                                                                                                                                                                                                                                   
Well2 200  20                                                                                                                                                                                                                                                   
Well3 300  50                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want ;                                                                                                                                                                                                                                                     
  set have ;                                                                                                                                                                                                                                                    
  output ;                                                                                                                                                                                                                                                      
  call missing (name) ;                                                                                                                                                                                                                                         
  do _n_ = 1 to 100 / split - 1 ;                                                                                                                                                                                                                               
    A + split ;                                                                                                                                                                                                                                                 
    output ;                                                                                                                                                                                                                                                    
  end ;                                                                                                                                                                                                                                                         
run ;    
Kind regards
Paul D.
Yes Paul,
In this case the distance between each row is 100 and you use
do _n_ = 1 to 100.
How about when it is not the case.
Please look the following case:
data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100  5                                                                                                                                                                                                                                                   
Well2 120  20                                                                                                                                                                                                                                                   
Well3 280  15
Well4 325  50                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ; The result should be like this:
| Well1 | 100 | 
| 105 | |
| 110 | |
| 115 | |
| Well2 | 120 | 
| 140 | |
| 160 | |
| 180 | |
| 200 | |
| 220 | |
| 240 | |
| 260 | |
| Well3 | 280 | 
| 295 | |
| 310 | |
| Well4 | 325 | 
It'd be simpler if you set your full output requirements at once rather than incrementally, as a seemingly insignificant change in them can affect the logic in a significant way - in particular, if you want to rely on the leading record's value's record to set generate output values for the current record (as you apparently want now) since in this case some look-forward mechanism needs to be involved. Below, it is done by merging the input file with its own copy minus the first record:
data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100   5                                                                                                                                                                                                                                                   
Well2 120  20                                                                                                                                                                                                                                                   
Well3 280  15                                                                                                                                                                                                                                                   
Well4 325  50                                                                                                                                                                                                                                                   
Well5 400  40                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (drop = _:) ;                                                                                                                                                                                                                                                     
  merge have have (firstobs=2 keep=A rename=A=_A) ;                                                                                                                                                                                                             
  output ;                                                                                                                                                                                                                                                      
  call missing (name) ;                                                                                                                                                                                                                                         
  do A = A + split to max (A, _A) by split while (A < _A) ;                                                                                                                                                                                                     
    output ;                                                                                                                                                                                                                                                    
  end ;                                                                                                                                                                                                                                                         
run ;                                                       
Note:
Alternatively, you can omit the WHILE condition and code the DO loop instead as:
  do A = A + split to max (A, _A) by split ;                                                                                                                                                                                                     
    if A < _A then output ;                                                                                                                                                                                                                                                    
  end ;      Kind regards
Paul D.
Nice!
It forms step by step!
Let's go to the final step.
My aim is to perform a linear regression later.
First I have to split the rows and you have helped me with that alreay. Thank you very much!
What I have now is this table:
| Name | A | split | B | 
| Well1 | 100 | 5 | 18 | 
| Well2 | 120 | 20 | 21 | 
| Well3 | 280 | 15 | 23 | 
| Well4 | 325 | 50 | 28 | 
| Well5 | 400 | 40 | 42 | 
What I want to have is the following table:
| Name | A | split | B | 
| Well1 | 100 | 5 | 18 | 
| 105 | 5 | ||
| 110 | 5 | ||
| 115 | 5 | ||
| Well2 | 120 | 20 | 21 | 
| 140 | 20 | ||
| 160 | 20 | ||
| 180 | 20 | ||
| 200 | 20 | ||
| 220 | 20 | ||
| 240 | 20 | 23 | |
| 260 | 20 | ||
| Well3 | 280 | 15 | |
| 295 | 15 | ||
| 310 | 15 | ||
| Well4 | 325 | 50 | 28 | 
| 375 | 50 | ||
| Well5 | 400 | 40 | 42 | 
Can you please help me with the final step?
I assume that in your sample output, you meant to place 23 in line with Well3; otherwise, I don't catch the pattern.
If I'm right, you just have to add B to the CALL MISSING variable list:
data have ;                                                
  input Name $ A split B ;                                 
  cards ;                                                  
Well1 100   5  18                                          
Well2 120  20  21                                          
Well3 280  15  23                                          
Well4 325  50  28                                          
Well5 400  40  42                                          
;                                                          
run ;                                                      
                                                           
data want (drop = _:) ;                                    
  merge have have (firstobs=2 keep=A rename=A=_A) ;        
  output ;                                                 
  call missing (name, B) ;                                 
  do A = A + split to max (A, _A) by split while (A < _A) ;
    output ;                                               
  end ;                                                    
run ;                                                      
Kind regards
Paul D.
Dear Paul,
It works fine. Thank you very vey much!
However, it is not working in DI Studio.
The following is my case:
data _INPUT1 ;
infile datalines delimiter=',';
input WELLBORE_NAME $ MD_FT_Round split UTM_X UTM_Y;
datalines;
77/45-A,0, 120,496323.55581, 6277756.1415
77/45-A,6000, 62, 496210.93588, 6277828.1526
77/45-A,9100, 60, 496210.86837, 6277828.2457
77/45-A,12100,62, 496210.68924, 6277828.2151
77/45-A,15200,60, 496210.25044, 6277828.0049
77/45-A,18200,62, 496209.55548, 6277827.3689
;
data _OUTPUT1 (drop = _:) ;
merge _INPUT1 _INPUT1 (firstobs=2 keep=MD_FT_Round rename=MD_FT_Round=_MD_FT_Round) ;
output ;
call missing (WELLBORE_NAME, UTM_X, UTM_Y) ;
do MD_FT_Round = MD_FT_Round + split to max (MD_FT_Round, _MD_FT_Round) by split while (MD_FT_Round < _MD_FT_Round) ;
output ;
end ;
run ;
It works well in SAS EG but it does not work in DI Studio. I get the following error:
"- ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid."
Do you know why?
Best regards
Farshid
The only thing I can imagine is that the input data set _INPUT1 you're creating in DIS isn't being created properly and has missing values in MD_FT_Round and/or Split. Have you taken a good look at the data set after it was created - if it was? DI Studio is, uh, ... special ... I'm not sure it honors CARDS/DATALINES/LINES in a custom code block (though I can be wrong on that head, as it's been some time since I touched DI for the last time). I'd try to create the data set in EGuide, store it in a permanent library, and have DIS read it from there, then see. I expect that DIS fully honors the rest of SAS Base code and syntax used in this step (I used to throw stuff much more complex than that at it, and it did work).
Kind regards
Paul D.
Dear Paul
I have solved the problem
Now, I have other type of problem
Do you have time to help me with my new challenge?
I need to perform a linear interpolation
I do it correctly when I have only one dataset in a table
However, I have many many different datasets in a table and each dataset has to have its own interpolation curve
Let's take a simple example:
I have the following two datasets (A and B) in a table:
| NAME | DEPTH | X | Y | 
| A | 0 | ||
| 0,50 | , | , | |
| 1,00 | , | , | |
| 1,50 | , | , | |
| 2,00 | , | , | |
| 2,50 | , | , | |
| 3,00 | , | , | |
| 3,50 | , | , | |
| 4,00 | , | , | |
| 4,50 | , | , | |
| 5,00 | , | , | |
| 5,50 | , | , | |
| 6,00 | , | , | |
| A | 6,50 | 496210,94 | 6277828,15 | 
| 7,00 | , | , | |
| 7,50 | , | , | |
| 8,00 | , | , | |
| 8,50 | , | , | |
| 9,00 | , | , | |
| 9,50 | , | , | |
| 10,00 | , | , | |
| 10,50 | , | , | |
| A | 11,00 | 496210,87 | 6277828,25 | 
| B | 0,00 | 516239.55347 | 6247816.3269 | 
| 0.5 | |||
| 1,00 | |||
| 1,50 | |||
| 2,00 | |||
| 2,50 | |||
| 3,00 | |||
| 3,50 | |||
| 4,00 | |||
| 4,50 | |||
| 5,00 | |||
| 5,50 | |||
| B | 6,00 | 516239.55347 | 6247816.3269 | 
| 6,50 | |||
| 7,00 | |||
| 7,50 | |||
| 8,00 | |||
| 8,50 | |||
| B | 9,00 | 516239.51534 | 6247816.3202 | 
| 9,50 | |||
| 10,00 | |||
| 10,50 | |||
| 11,00 | |||
| 11,50 | |||
| 12,00 | |||
| B | 12,50 | 516239.47966 | 6247816.3124 | 
I use the following proc procedure to perform linear regression for one dataset:
proc expand data=&_INPUT1 out=&_OUTPUT1;
convert X=linear_X / method=join;
id DEPTH;
run;
proc expand data=&_INPUT1 out=&_OUTPUT1;
convert Y=linear_Y / method=join;
id DEPTH;
run;
The interpolation will fill the missing values for X and Y
However, when I have two datasets I can't use this formula
Do you have time to look at it?
Best regards
Farshid
@farshidowrang wrote:
I have solved the problem
Thank You So Much Paul!
Now, I have other type of problem
Hello @farshidowrang,
I suggest this would be a good time to close this thread ("Splitting a value ..."), i.e., mark Paul's most helpful reply as the "Accepted Solution," and open a new one whose subject line contains the term "linear interpolation." Many more people would see this new thread and recognize it is about linear interpolation. Also, over the years it would be much easier for readers of these threads to find out what the problem was (initial post) and how it was solved ("Solution" highlighted in green). The number of these later readers should not be underestimated.
Thanks in advance.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
