Dear friends;
I need to split a value from a row into equal width intervals. I have this:
data HAVE; infile datalines delimiter=',';
input Name $ A; datalines;
Well1 100
Well2 200
Well3 300
;
What I want as a result is this:
Name | A |
Well1 | 100 |
120 | |
140 | |
160 | |
180 | |
Well2 | 200 |
220 | |
240 | |
260 | |
280 | |
Well3 | 300 |
320 | |
340 | |
360 | |
380 |
Can you please help me with this problem?
I need the general methodology.
Thank you very much
Best Regards
Farshid
I assume that in your sample output, you meant to place 23 in line with Well3; otherwise, I don't catch the pattern.
If I'm right, you just have to add B to the CALL MISSING variable list:
data have ; input Name $ A split B ; cards ; Well1 100 5 18 Well2 120 20 21 Well3 280 15 23 Well4 325 50 28 Well5 400 40 42 ; run ; data want (drop = _:) ; merge have have (firstobs=2 keep=A rename=A=_A) ; output ; call missing (name, B) ; do A = A + split to max (A, _A) by split while (A < _A) ; output ; end ; run ;
Kind regards
Paul D.
Try this:
data have ;
input Name $ A ;
cards ;
Well1 100
Well2 200
Well3 300
;
run ;
data want ;
set have ;
output ;
call missing (name) ;
do _n_ = 1 to 4 ;
A + 20 ;
output ;
end ;
run ;
Kind regards
Paul D.
Dear Paul ,
I have a new challenge now (I have several, but take one thing at a time).
Now, I have the following input:
data have ;
infile datalines delimiter=',';
input Name $ A Split;
cards ;
Well1, 100, 10
Well2, 200, 20
Well3, 300, 50
;
run ;
The "Split" will decide how many times the values between the two rows will be divided. I like to have the following result:
Name | A |
Well1 | 100 |
110 | |
120 | |
130 | |
140 | |
150 | |
160 | |
170 | |
180 | |
190 | |
Well2 | 200 |
220 | |
240 | |
260 | |
280 | |
Well3 | 300 |
350 | |
Well4 | 400 |
... |
Can you please help me with that too?
It's easy:
data have ;
input Name $ A split ;
cards ;
Well1 100 10
Well2 200 20
Well3 300 50
;
run ;
data want ;
set have ;
output ;
call missing (name) ;
do _n_ = 1 to 100 / split - 1 ;
A + split ;
output ;
end ;
run ;
Kind regards
Paul D.
Yes Paul,
In this case the distance between each row is 100 and you use
do _n_ = 1 to 100.
How about when it is not the case.
Please look the following case:
data have ;
input Name $ A split ;
cards ;
Well1 100 5
Well2 120 20
Well3 280 15
Well4 325 50
;
run ;
The result should be like this:
Well1 | 100 |
105 | |
110 | |
115 | |
Well2 | 120 |
140 | |
160 | |
180 | |
200 | |
220 | |
240 | |
260 | |
Well3 | 280 |
295 | |
310 | |
Well4 | 325 |
It'd be simpler if you set your full output requirements at once rather than incrementally, as a seemingly insignificant change in them can affect the logic in a significant way - in particular, if you want to rely on the leading record's value's record to set generate output values for the current record (as you apparently want now) since in this case some look-forward mechanism needs to be involved. Below, it is done by merging the input file with its own copy minus the first record:
data have ;
input Name $ A split ;
cards ;
Well1 100 5
Well2 120 20
Well3 280 15
Well4 325 50
Well5 400 40
;
run ;
data want (drop = _:) ;
merge have have (firstobs=2 keep=A rename=A=_A) ;
output ;
call missing (name) ;
do A = A + split to max (A, _A) by split while (A < _A) ;
output ;
end ;
run ;
Note:
Alternatively, you can omit the WHILE condition and code the DO loop instead as:
do A = A + split to max (A, _A) by split ;
if A < _A then output ;
end ;
Kind regards
Paul D.
Nice!
It forms step by step!
Let's go to the final step.
My aim is to perform a linear regression later.
First I have to split the rows and you have helped me with that alreay. Thank you very much!
What I have now is this table:
Name | A | split | B |
Well1 | 100 | 5 | 18 |
Well2 | 120 | 20 | 21 |
Well3 | 280 | 15 | 23 |
Well4 | 325 | 50 | 28 |
Well5 | 400 | 40 | 42 |
What I want to have is the following table:
Name | A | split | B |
Well1 | 100 | 5 | 18 |
105 | 5 | ||
110 | 5 | ||
115 | 5 | ||
Well2 | 120 | 20 | 21 |
140 | 20 | ||
160 | 20 | ||
180 | 20 | ||
200 | 20 | ||
220 | 20 | ||
240 | 20 | 23 | |
260 | 20 | ||
Well3 | 280 | 15 | |
295 | 15 | ||
310 | 15 | ||
Well4 | 325 | 50 | 28 |
375 | 50 | ||
Well5 | 400 | 40 | 42 |
Can you please help me with the final step?
I assume that in your sample output, you meant to place 23 in line with Well3; otherwise, I don't catch the pattern.
If I'm right, you just have to add B to the CALL MISSING variable list:
data have ; input Name $ A split B ; cards ; Well1 100 5 18 Well2 120 20 21 Well3 280 15 23 Well4 325 50 28 Well5 400 40 42 ; run ; data want (drop = _:) ; merge have have (firstobs=2 keep=A rename=A=_A) ; output ; call missing (name, B) ; do A = A + split to max (A, _A) by split while (A < _A) ; output ; end ; run ;
Kind regards
Paul D.
Dear Paul,
It works fine. Thank you very vey much!
However, it is not working in DI Studio.
The following is my case:
data _INPUT1 ;
infile datalines delimiter=',';
input WELLBORE_NAME $ MD_FT_Round split UTM_X UTM_Y;
datalines;
77/45-A,0, 120,496323.55581, 6277756.1415
77/45-A,6000, 62, 496210.93588, 6277828.1526
77/45-A,9100, 60, 496210.86837, 6277828.2457
77/45-A,12100,62, 496210.68924, 6277828.2151
77/45-A,15200,60, 496210.25044, 6277828.0049
77/45-A,18200,62, 496209.55548, 6277827.3689
;
data _OUTPUT1 (drop = _:) ;
merge _INPUT1 _INPUT1 (firstobs=2 keep=MD_FT_Round rename=MD_FT_Round=_MD_FT_Round) ;
output ;
call missing (WELLBORE_NAME, UTM_X, UTM_Y) ;
do MD_FT_Round = MD_FT_Round + split to max (MD_FT_Round, _MD_FT_Round) by split while (MD_FT_Round < _MD_FT_Round) ;
output ;
end ;
run ;
It works well in SAS EG but it does not work in DI Studio. I get the following error:
"- ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid."
Do you know why?
Best regards
Farshid
The only thing I can imagine is that the input data set _INPUT1 you're creating in DIS isn't being created properly and has missing values in MD_FT_Round and/or Split. Have you taken a good look at the data set after it was created - if it was? DI Studio is, uh, ... special ... I'm not sure it honors CARDS/DATALINES/LINES in a custom code block (though I can be wrong on that head, as it's been some time since I touched DI for the last time). I'd try to create the data set in EGuide, store it in a permanent library, and have DIS read it from there, then see. I expect that DIS fully honors the rest of SAS Base code and syntax used in this step (I used to throw stuff much more complex than that at it, and it did work).
Kind regards
Paul D.
Dear Paul
I have solved the problem
Now, I have other type of problem
Do you have time to help me with my new challenge?
I need to perform a linear interpolation
I do it correctly when I have only one dataset in a table
However, I have many many different datasets in a table and each dataset has to have its own interpolation curve
Let's take a simple example:
I have the following two datasets (A and B) in a table:
NAME | DEPTH | X | Y |
A | 0 | ||
0,50 | , | , | |
1,00 | , | , | |
1,50 | , | , | |
2,00 | , | , | |
2,50 | , | , | |
3,00 | , | , | |
3,50 | , | , | |
4,00 | , | , | |
4,50 | , | , | |
5,00 | , | , | |
5,50 | , | , | |
6,00 | , | , | |
A | 6,50 | 496210,94 | 6277828,15 |
7,00 | , | , | |
7,50 | , | , | |
8,00 | , | , | |
8,50 | , | , | |
9,00 | , | , | |
9,50 | , | , | |
10,00 | , | , | |
10,50 | , | , | |
A | 11,00 | 496210,87 | 6277828,25 |
B | 0,00 | 516239.55347 | 6247816.3269 |
0.5 | |||
1,00 | |||
1,50 | |||
2,00 | |||
2,50 | |||
3,00 | |||
3,50 | |||
4,00 | |||
4,50 | |||
5,00 | |||
5,50 | |||
B | 6,00 | 516239.55347 | 6247816.3269 |
6,50 | |||
7,00 | |||
7,50 | |||
8,00 | |||
8,50 | |||
B | 9,00 | 516239.51534 | 6247816.3202 |
9,50 | |||
10,00 | |||
10,50 | |||
11,00 | |||
11,50 | |||
12,00 | |||
B | 12,50 | 516239.47966 | 6247816.3124 |
I use the following proc procedure to perform linear regression for one dataset:
proc expand data=&_INPUT1 out=&_OUTPUT1;
convert X=linear_X / method=join;
id DEPTH;
run;
proc expand data=&_INPUT1 out=&_OUTPUT1;
convert Y=linear_Y / method=join;
id DEPTH;
run;
The interpolation will fill the missing values for X and Y
However, when I have two datasets I can't use this formula
Do you have time to look at it?
Best regards
Farshid
@farshidowrang wrote:
I have solved the problem
Thank You So Much Paul!
Now, I have other type of problem
Hello @farshidowrang,
I suggest this would be a good time to close this thread ("Splitting a value ..."), i.e., mark Paul's most helpful reply as the "Accepted Solution," and open a new one whose subject line contains the term "linear interpolation." Many more people would see this new thread and recognize it is about linear interpolation. Also, over the years it would be much easier for readers of these threads to find out what the problem was (initial post) and how it was solved ("Solution" highlighted in green). The number of these later readers should not be underestimated.
Thanks in advance.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.