BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
farshidowrang
Quartz | Level 8

Dear friends;

 

I need to split a value from a row into equal width intervals. I have this:

 

data HAVE;
 infile datalines delimiter=',';
input Name $ A; datalines;
Well1 100
Well2 200
Well3 300
;


What I want as a result is this:

NameA
Well1100
 120
 140
 160
 180
Well2200
 220
 240
 260
 280
Well3300
 320
 340
 360
 380



Can you please help me with this  problem?

I need the general methodology.

 

Thank you very much

 

Best Regards


Farshid

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@farshidowrang:

I assume that in your sample output, you meant to place 23 in line with Well3; otherwise, I don't catch the pattern. 

If I'm right, you just have to add B to the CALL MISSING variable list: 

data have ;                                                
  input Name $ A split B ;                                 
  cards ;                                                  
Well1 100   5  18                                          
Well2 120  20  21                                          
Well3 280  15  23                                          
Well4 325  50  28                                          
Well5 400  40  42                                          
;                                                          
run ;                                                      
                                                           
data want (drop = _:) ;                                    
  merge have have (firstobs=2 keep=A rename=A=_A) ;        
  output ;                                                 
  call missing (name, B) ;                                 
  do A = A + split to max (A, _A) by split while (A < _A) ;
    output ;                                               
  end ;                                                    
run ;                                                      

Kind regards

Paul D.

View solution in original post

12 REPLIES 12
hashman
Ammonite | Level 13

@farshidowrang:

Try this:

data have ;            
  input Name $ A ;     
  cards ;              
Well1 100              
Well2 200              
Well3 300              
;                      
run ;                  
                       
data want ;            
  set have ;           
  output ;             
  call missing (name) ;
  do _n_ = 1 to 4 ;    
    A + 20 ;           
    output ;           
  end ;                
run ;                  

Kind regards

Paul D. 

farshidowrang
Quartz | Level 8
Wow so quick you are!
Thank you very much Paul!
farshidowrang
Quartz | Level 8

Dear Paul ,

 

I have a new challenge now (I have several, but take one thing at a time).

Now, I have the following input:

data have ; 
infile datalines delimiter=','; input Name $ A Split; cards ; Well1, 100, 10 Well2, 200, 20 Well3, 300, 50 ; run ;

The "Split" will decide how many times the values between the two rows will be divided. I like to have the following result:

 

 

NameA
Well1100
 110
 120
 130
 140
 150
 160
 170
 180
 190
Well2200
 220
 240
 260
 280
Well3300
 350
Well4400
 ...

 

Can you please help me with that too?

hashman
Ammonite | Level 13

@farshidowrang:

It's easy:

data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100  10                                                                                                                                                                                                                                                   
Well2 200  20                                                                                                                                                                                                                                                   
Well3 300  50                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want ;                                                                                                                                                                                                                                                     
  set have ;                                                                                                                                                                                                                                                    
  output ;                                                                                                                                                                                                                                                      
  call missing (name) ;                                                                                                                                                                                                                                         
  do _n_ = 1 to 100 / split - 1 ;                                                                                                                                                                                                                               
    A + split ;                                                                                                                                                                                                                                                 
    output ;                                                                                                                                                                                                                                                    
  end ;                                                                                                                                                                                                                                                         
run ;    

Kind regards

Paul D. 

farshidowrang
Quartz | Level 8

Yes Paul,

 

In this case the distance between each row is 100 and you use

 

do _n_ = 1 to 100.

 

How about when it is not the case.

 

Please look the following case:

 

data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100  5                                                                                                                                                                                                                                                   
Well2 120  20                                                                                                                                                                                                                                                   
Well3 280  15
Well4 325 50 ; run ;

The result should be like this:

 

Well1100
 105
 110
 115
Well2120
 140
 160
 180
 200
 220
 240
 260
Well3280
 295
 310
Well4325

 

hashman
Ammonite | Level 13

@farshidowrang:

It'd be simpler if you set your full output requirements at once rather than incrementally, as a seemingly insignificant change in them can affect the logic in a significant way - in particular, if you want to rely on the leading record's value's record to set generate output values for the current record (as you apparently want now) since in this case some look-forward mechanism needs to be involved. Below, it is done by merging the input file with its own copy minus the first record:

data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100   5                                                                                                                                                                                                                                                   
Well2 120  20                                                                                                                                                                                                                                                   
Well3 280  15                                                                                                                                                                                                                                                   
Well4 325  50                                                                                                                                                                                                                                                   
Well5 400  40                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (drop = _:) ;                                                                                                                                                                                                                                                     
  merge have have (firstobs=2 keep=A rename=A=_A) ;                                                                                                                                                                                                             
  output ;                                                                                                                                                                                                                                                      
  call missing (name) ;                                                                                                                                                                                                                                         
  do A = A + split to max (A, _A) by split while (A < _A) ;                                                                                                                                                                                                     
    output ;                                                                                                                                                                                                                                                    
  end ;                                                                                                                                                                                                                                                         
run ;                                                       

Note:

  • max (A, _A) guards against _A=. in the last input record of the merged stream by ignoring the missing value of _A
  • while (A < _A) prevents writing an output record if the last split value is equal to or exceeds the value of A in the next input record

Alternatively, you can omit the WHILE condition and code the DO loop instead as:

  do A = A + split to max (A, _A) by split ;                                                                                                                                                                                                     
    if A < _A then output ;                                                                                                                                                                                                                                                    
  end ;      

Kind regards

Paul D.     

farshidowrang
Quartz | Level 8

Nice!

It forms step by step!

Let's go to the final step.

My aim is to perform a linear regression later. 

First I have to split the rows and you have helped me with that alreay. Thank you very much!

 

What I have now is this table:

 

NameAsplitB
Well1100518
Well21202021
Well32801523
Well43255028
Well54004042

 

What I want to have is the following table:

 

NameAsplitB
Well1100518
 1055 
 1105 
 1155 
Well21202021
 14020 
 16020 
 18020 
 20020 
 22020 
 2402023
 26020 
Well328015 
 29515 
 31015 
Well43255028
 37550 
Well54004042

 

Can you please help me with the final step?

hashman
Ammonite | Level 13

@farshidowrang:

I assume that in your sample output, you meant to place 23 in line with Well3; otherwise, I don't catch the pattern. 

If I'm right, you just have to add B to the CALL MISSING variable list: 

data have ;                                                
  input Name $ A split B ;                                 
  cards ;                                                  
Well1 100   5  18                                          
Well2 120  20  21                                          
Well3 280  15  23                                          
Well4 325  50  28                                          
Well5 400  40  42                                          
;                                                          
run ;                                                      
                                                           
data want (drop = _:) ;                                    
  merge have have (firstobs=2 keep=A rename=A=_A) ;        
  output ;                                                 
  call missing (name, B) ;                                 
  do A = A + split to max (A, _A) by split while (A < _A) ;
    output ;                                               
  end ;                                                    
run ;                                                      

Kind regards

Paul D.

farshidowrang
Quartz | Level 8

Dear Paul,

 

It works fine. Thank you very vey much!

 

However, it is not working in DI Studio.

 

The following is my case:

 

data _INPUT1 ;
infile datalines delimiter=',';
input WELLBORE_NAME $ MD_FT_Round split UTM_X UTM_Y;
datalines;

77/45-A,0, 120,496323.55581, 6277756.1415

77/45-A,6000, 62, 496210.93588, 6277828.1526

77/45-A,9100, 60, 496210.86837, 6277828.2457

77/45-A,12100,62, 496210.68924, 6277828.2151

77/45-A,15200,60, 496210.25044, 6277828.0049

77/45-A,18200,62, 496209.55548, 6277827.3689
;

data _OUTPUT1 (drop = _:) ;
merge _INPUT1 _INPUT1 (firstobs=2 keep=MD_FT_Round rename=MD_FT_Round=_MD_FT_Round) ;
output ;
call missing (WELLBORE_NAME, UTM_X, UTM_Y) ;
do MD_FT_Round = MD_FT_Round + split to max (MD_FT_Round, _MD_FT_Round) by split while (MD_FT_Round < _MD_FT_Round) ;
output ;
end ;
run ;

 

It works well in SAS EG but it does not work in DI Studio. I get the following error:

 

"- ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid."

 

Do you know why?

 

Best regards

 

Farshid 

hashman
Ammonite | Level 13

@farshidowrang:

The only thing I can imagine is that the input data set _INPUT1 you're creating in DIS isn't being created properly and has missing values in MD_FT_Round and/or Split. Have you taken a good look at the data set after it was created - if it was? DI Studio is, uh, ... special ... I'm not sure it honors CARDS/DATALINES/LINES in a custom code block (though I can be wrong on that head, as it's been some time since I touched DI for the last time). I'd try to create the data set in EGuide, store it in a permanent library, and have DIS read it from there, then see. I expect that DIS fully honors the rest of SAS Base code and syntax used in this step (I used to throw stuff much more complex than that at it, and it did work).

 

Kind regards

Paul D.    

farshidowrang
Quartz | Level 8

Dear Paul

 

I have solved the problem

 

Thank You So Much Paul!

 

Now, I have other type of problem

 

Do you have time to help me with my new challenge?

 

I need to perform a linear interpolation

 

I do it correctly when I have only one dataset in a table

 

However, I have many many different datasets in a table and each dataset has to have its own interpolation curve

 

Let's take a simple example:

 

I have the following two datasets (A and B)  in a table:

 

NAMEDEPTHXY
A0  
 0,50,,
 1,00,,
 1,50,,
 2,00,,
 2,50,,
 3,00,,
 3,50,,
 4,00,,
 4,50,,
 5,00,,
 5,50,,
 6,00,,
A6,50496210,946277828,15
 7,00,,
 7,50,,
 8,00,,
 8,50,,
 9,00,,
 9,50,,
 10,00,,
 10,50,,
A   11,00496210,876277828,25
B        0,00516239.553476247816.3269
 0.5  
 1,00  
 1,50  
 2,00  
 2,50  
 3,00  
 3,50  
 4,00  
 4,50  
 5,00  
 5,50  
B        6,00516239.553476247816.3269
 6,50  
 7,00  
 7,50  
 8,00  
 8,50  
B         9,00516239.515346247816.3202
 9,50  
 10,00  
 10,50  
 11,00  
 11,50  
 12,00  
B        12,50516239.479666247816.3124


I use the following proc procedure to perform linear regression for one dataset:

 

proc expand data=&_INPUT1 out=&_OUTPUT1;
convert X=linear_X / method=join;
id DEPTH;
run;

 

proc expand data=&_INPUT1 out=&_OUTPUT1;
convert Y=linear_Y / method=join;
id DEPTH;
run;

 

The interpolation will fill the missing values for X and Y

 

However, when I have two datasets I can't use this formula

 

Do you have time to look at it? 

 

Best regards

 

Farshid 

FreelanceReinh
Jade | Level 19

@farshidowrang wrote:

I have solved the problem

 

Thank You So Much Paul!

 

Now, I have other type of problem

 

Hello @farshidowrang,

 

I suggest this would be a good time to close this thread ("Splitting a value ..."), i.e., mark Paul's most helpful reply as the "Accepted Solution," and open a new one whose subject line contains the term "linear interpolation." Many more people would see this new thread and recognize it is about linear interpolation. Also, over the years it would be much easier for readers of these threads to find out what the problem was (initial post) and how it was solved ("Solution" highlighted in green). The number of these later readers should not be underestimated.

 

Thanks in advance.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2416 views
  • 1 like
  • 3 in conversation