Solved: Re: Splitting a value from a row into equal width intervals in SAS

farshidowrang · Posted 10-04-2019 01:47 PM

Dear friends;

I need to split a value from a row into equal width intervals. I have this:

data HAVE;
 infile datalines delimiter=',';
 input Name $ A;
   datalines;
Well1 100
Well2 200
Well3 300
;

What I want as a result is this:

Name	A
Well1	100
	120
	140
	160
	180
Well2	200
	220
	240
	260
	280
Well3	300
	320
	340
	360
	380

Can you please help me with this problem?

I need the general methodology.

Thank you very much

Best Regards

Farshid

hashman · Posted 10-05-2019 04:06 PM

@farshidowrang:

I assume that in your sample output, you meant to place 23 in line with Well3; otherwise, I don't catch the pattern.

If I'm right, you just have to add B to the CALL MISSING variable list:

data have ;                                                
  input Name $ A split B ;                                 
  cards ;                                                  
Well1 100   5  18                                          
Well2 120  20  21                                          
Well3 280  15  23                                          
Well4 325  50  28                                          
Well5 400  40  42                                          
;                                                          
run ;                                                      
                                                           
data want (drop = _:) ;                                    
  merge have have (firstobs=2 keep=A rename=A=_A) ;        
  output ;                                                 
  call missing (name, B) ;                                 
  do A = A + split to max (A, _A) by split while (A < _A) ;
    output ;                                               
  end ;                                                    
run ;

Kind regards

Paul D.

View solution in original post

hashman · Posted 10-04-2019 02:02 PM

@farshidowrang:

Try this:

data have ;            
  input Name $ A ;     
  cards ;              
Well1 100              
Well2 200              
Well3 300              
;                      
run ;                  
                       
data want ;            
  set have ;           
  output ;             
  call missing (name) ;
  do _n_ = 1 to 4 ;    
    A + 20 ;           
    output ;           
  end ;                
run ;

Kind regards

Paul D.

farshidowrang · Posted 10-04-2019 02:13 PM

Wow so quick you are!
Thank you very much Paul!

farshidowrang · Posted 10-04-2019 05:09 PM

Dear Paul ,

I have a new challenge now (I have several, but take one thing at a time).

Now, I have the following input:

data have ; 
   infile datalines delimiter=',';           
  input Name $ A Split;     
  cards ;              
Well1, 100,  10              
Well2, 200,  20            
Well3, 300,  50            
;                      
run ;

The "Split" will decide how many times the values between the two rows will be divided. I like to have the following result:

Name	A
Well1	100
	110
	120
	130
	140
	150
	160
	170
	180
	190
Well2	200
	220
	240
	260
	280
Well3	300
	350
Well4	400
	...

Can you please help me with that too?

hashman · Posted 10-04-2019 05:25 PM

@farshidowrang:

It's easy:

data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100  10                                                                                                                                                                                                                                                   
Well2 200  20                                                                                                                                                                                                                                                   
Well3 300  50                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want ;                                                                                                                                                                                                                                                     
  set have ;                                                                                                                                                                                                                                                    
  output ;                                                                                                                                                                                                                                                      
  call missing (name) ;                                                                                                                                                                                                                                         
  do _n_ = 1 to 100 / split - 1 ;                                                                                                                                                                                                                               
    A + split ;                                                                                                                                                                                                                                                 
    output ;                                                                                                                                                                                                                                                    
  end ;                                                                                                                                                                                                                                                         
run ;

Kind regards

Paul D.

farshidowrang · Posted 10-05-2019 04:25 AM

Yes Paul,

In this case the distance between each row is 100 and you use

do _n_ = 1 to 100.

How about when it is not the case.

Please look the following case:

data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100  5                                                                                                                                                                                                                                                   
Well2 120  20                                                                                                                                                                                                                                                   
Well3 280  15
Well4 325  50                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;

The result should be like this:

Well1	100
	105
	110
	115
Well2	120
	140
	160
	180
	200
	220
	240
	260
Well3	280
	295
	310
Well4	325

hashman · Posted 10-05-2019 12:36 PM

@farshidowrang:

It'd be simpler if you set your full output requirements at once rather than incrementally, as a seemingly insignificant change in them can affect the logic in a significant way - in particular, if you want to rely on the leading record's value's record to set generate output values for the current record (as you apparently want now) since in this case some look-forward mechanism needs to be involved. Below, it is done by merging the input file with its own copy minus the first record:

data have ;                                                                                                                                                                                                                                                     
  input Name $ A split ;                                                                                                                                                                                                                                        
  cards ;                                                                                                                                                                                                                                                       
Well1 100   5                                                                                                                                                                                                                                                   
Well2 120  20                                                                                                                                                                                                                                                   
Well3 280  15                                                                                                                                                                                                                                                   
Well4 325  50                                                                                                                                                                                                                                                   
Well5 400  40                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (drop = _:) ;                                                                                                                                                                                                                                                     
  merge have have (firstobs=2 keep=A rename=A=_A) ;                                                                                                                                                                                                             
  output ;                                                                                                                                                                                                                                                      
  call missing (name) ;                                                                                                                                                                                                                                         
  do A = A + split to max (A, _A) by split while (A < _A) ;                                                                                                                                                                                                     
    output ;                                                                                                                                                                                                                                                    
  end ;                                                                                                                                                                                                                                                         
run ;

Note:

max (A, _A) guards against _A=. in the last input record of the merged stream by ignoring the missing value of _A
while (A < _A) prevents writing an output record if the last split value is equal to or exceeds the value of A in the next input record

Alternatively, you can omit the WHILE condition and code the DO loop instead as:

  do A = A + split to max (A, _A) by split ;                                                                                                                                                                                                     
    if A < _A then output ;                                                                                                                                                                                                                                                    
  end ;

Kind regards

Paul D.

farshidowrang · Posted 10-05-2019 03:31 PM

Nice!

It forms step by step!

Let's go to the final step.

My aim is to perform a linear regression later.

First I have to split the rows and you have helped me with that alreay. Thank you very much!

What I have now is this table:

Name	A	split	B
Well1	100	5	18
Well2	120	20	21
Well3	280	15	23
Well4	325	50	28
Well5	400	40	42

What I want to have is the following table:

Name	A	split	B
Well1	100	5	18
	105	5
	110	5
	115	5
Well2	120	20	21
	140	20
	160	20
	180	20
	200	20
	220	20
	240	20	23
	260	20
Well3	280	15
	295	15
	310	15
Well4	325	50	28
	375	50
Well5	400	40	42

Can you please help me with the final step?

hashman · Posted 10-05-2019 04:06 PM

@farshidowrang:

I assume that in your sample output, you meant to place 23 in line with Well3; otherwise, I don't catch the pattern.

If I'm right, you just have to add B to the CALL MISSING variable list:

data have ;                                                
  input Name $ A split B ;                                 
  cards ;                                                  
Well1 100   5  18                                          
Well2 120  20  21                                          
Well3 280  15  23                                          
Well4 325  50  28                                          
Well5 400  40  42                                          
;                                                          
run ;                                                      
                                                           
data want (drop = _:) ;                                    
  merge have have (firstobs=2 keep=A rename=A=_A) ;        
  output ;                                                 
  call missing (name, B) ;                                 
  do A = A + split to max (A, _A) by split while (A < _A) ;
    output ;                                               
  end ;                                                    
run ;

Kind regards

Paul D.

farshidowrang · Posted 10-06-2019 08:51 AM

Dear Paul,

It works fine. Thank you very vey much!

However, it is not working in DI Studio.

The following is my case:

data _INPUT1 ;
infile datalines delimiter=',';
input WELLBORE_NAME $ MD_FT_Round split UTM_X UTM_Y;
datalines;

77/45-A,0, 120,496323.55581, 6277756.1415

77/45-A,6000, 62, 496210.93588, 6277828.1526

77/45-A,9100, 60, 496210.86837, 6277828.2457

77/45-A,12100,62, 496210.68924, 6277828.2151

77/45-A,15200,60, 496210.25044, 6277828.0049

77/45-A,18200,62, 496209.55548, 6277827.3689
;

data _OUTPUT1 (drop = _:) ;
merge _INPUT1 _INPUT1 (firstobs=2 keep=MD_FT_Round rename=MD_FT_Round=_MD_FT_Round) ;
output ;
call missing (WELLBORE_NAME, UTM_X, UTM_Y) ;
do MD_FT_Round = MD_FT_Round + split to max (MD_FT_Round, _MD_FT_Round) by split while (MD_FT_Round < _MD_FT_Round) ;
output ;
end ;
run ;

It works well in SAS EG but it does not work in DI Studio. I get the following error:

"- ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid."

Do you know why?

Best regards

Farshid

hashman · Posted 10-06-2019 12:16 PM

@farshidowrang:

The only thing I can imagine is that the input data set _INPUT1 you're creating in DIS isn't being created properly and has missing values in MD_FT_Round and/or Split. Have you taken a good look at the data set after it was created - if it was? DI Studio is, uh, ... special ... I'm not sure it honors CARDS/DATALINES/LINES in a custom code block (though I can be wrong on that head, as it's been some time since I touched DI for the last time). I'd try to create the data set in EGuide, store it in a permanent library, and have DIS read it from there, then see. I expect that DIS fully honors the rest of SAS Base code and syntax used in this step (I used to throw stuff much more complex than that at it, and it did work).

Kind regards

Paul D.

farshidowrang · Posted 10-07-2019 11:44 AM

Dear Paul

I have solved the problem

Thank You So Much Paul!

Now, I have other type of problem

Do you have time to help me with my new challenge?

I need to perform a linear interpolation

I do it correctly when I have only one dataset in a table

However, I have many many different datasets in a table and each dataset has to have its own interpolation curve

Let's take a simple example:

I have the following two datasets (A and B) in a table:

NAME	DEPTH	X	Y
A	0
	0,50	,	,
	1,00	,	,
	1,50	,	,
	2,00	,	,
	2,50	,	,
	3,00	,	,
	3,50	,	,
	4,00	,	,
	4,50	,	,
	5,00	,	,
	5,50	,	,
	6,00	,	,
A	6,50	496210,94	6277828,15
	7,00	,	,
	7,50	,	,
	8,00	,	,
	8,50	,	,
	9,00	,	,
	9,50	,	,
	10,00	,	,
	10,50	,	,
A	11,00	496210,87	6277828,25
B	0,00	516239.55347	6247816.3269
	0.5
	1,00
	1,50
	2,00
	2,50
	3,00
	3,50
	4,00
	4,50
	5,00
	5,50
B	6,00	516239.55347	6247816.3269
	6,50
	7,00
	7,50
	8,00
	8,50
B	9,00	516239.51534	6247816.3202
	9,50
	10,00
	10,50
	11,00
	11,50
	12,00
B	12,50	516239.47966	6247816.3124

I use the following proc procedure to perform linear regression for one dataset:

proc expand data=&_INPUT1 out=&_OUTPUT1;
convert X=linear_X / method=join;
id DEPTH;
run;

proc expand data=&_INPUT1 out=&_OUTPUT1;
convert Y=linear_Y / method=join;
id DEPTH;
run;

The interpolation will fill the missing values for X and Y

However, when I have two datasets I can't use this formula

Do you have time to look at it?

Best regards

Farshid

FreelanceReinh · Posted 10-07-2019 12:19 PM

@farshidowrang wrote:

I have solved the problem

Thank You So Much Paul!

Now, I have other type of problem

Hello @farshidowrang,

I suggest this would be a good time to close this thread ("Splitting a value ..."), i.e., mark Paul's most helpful reply as the "Accepted Solution," and open a new one whose subject line contains the term "linear interpolation." Many more people would see this new thread and recognize it is about linear interpolation. Also, over the years it would be much easier for readers of these threads to find out what the problem was (initial post) and how it was solved ("Solution" highlighted in green). The number of these later readers should not be underestimated.

Thanks in advance.

SAS Innovate 2025: Call for Content

Classroom Training Available!