Esteemed Advisers,
I need a solution to the following exemplar problem of subsetting one dataset (Have) contingent on start/stop values from a second dataset (Interval) to create dataset Want. Thanks in advance for any insights you can offer.
Data Intervals;
Input Interval $ Start $ Stop $;
1 B -
1 - D
2 F -
2 - J
3 L -
3 - N
;
Data Have;
Input ID $;
A
B
C
D
E
F
G
H
J
K
L
M
N
;
Data Want;
Input Interval $ ID $;
1 B
1 C
1 D
2 G
2 H
2 J
2 K
3 L
3 M
3 N
;
Before going to code some questions are in order.
Is this supposed to be a surrogate for more complex data, such as words or phrases phrases for the start/stop values of your "real" project? This may be very critical as "interval" defined on character values may be very hard to work with in general as the < or > operators do not always provide the result you expect when values are of different lengths. Not to mention the complications that would arise for mixed case of letters.
How many observations will exist in your real Have and Intervals data sets? One obvious way to do this could create an extreme number of intermediate values leading to long run times.
Why does the Interval data have start and stop on separate observations? The natural way to demonstrate and interval is with both values on a single observation (row).
Are the - in the Interval set supposed to indicate missing or blank values or does the dash have some meaning? In a data step reading values placing a period in the position to indicate a missing value will do so but a dash is a value.
Why is 2 F not in the Want set? The start of Interval 2 is F.
Correcting the Interval and Have data steps so they will run and create data (missing the datalines statement) and making the Interval set an actual interval, this works for your example data if 2 F is supposed to be in the output.
Data Intervals; Input Interval $ Start $ Stop $; datalines; 1 B D 2 F J 3 L N ; Data Have; Input ID $; datalines; A B C D E F G H J K L M N ; proc sql; create table want as select a.interval, b.Id from intervals as a,have as b where a.start le b.id le a.stop ; quit;
Before going to code some questions are in order.
Is this supposed to be a surrogate for more complex data, such as words or phrases phrases for the start/stop values of your "real" project? This may be very critical as "interval" defined on character values may be very hard to work with in general as the < or > operators do not always provide the result you expect when values are of different lengths. Not to mention the complications that would arise for mixed case of letters.
How many observations will exist in your real Have and Intervals data sets? One obvious way to do this could create an extreme number of intermediate values leading to long run times.
Why does the Interval data have start and stop on separate observations? The natural way to demonstrate and interval is with both values on a single observation (row).
Are the - in the Interval set supposed to indicate missing or blank values or does the dash have some meaning? In a data step reading values placing a period in the position to indicate a missing value will do so but a dash is a value.
Why is 2 F not in the Want set? The start of Interval 2 is F.
Correcting the Interval and Have data steps so they will run and create data (missing the datalines statement) and making the Interval set an actual interval, this works for your example data if 2 F is supposed to be in the output.
Data Intervals; Input Interval $ Start $ Stop $; datalines; 1 B D 2 F J 3 L N ; Data Have; Input ID $; datalines; A B C D E F G H J K L M N ; proc sql; create table want as select a.interval, b.Id from intervals as a,have as b where a.start le b.id le a.stop ; quit;
Thanks for the prompt response. It was very helpful. The start/stop times came to me in a dataset as separate observations. Once I figured out how to get them into a single observation your solution works perfectly. And I apologize for my typo that left 2 F out of the Want dataset.
Just for having some fun.
Data Intervals;
Input Interval $ Start $ Stop $;
datalines;
1 B D
2 F J
3 L N
;
data want;
set Intervals;
do i=rank(Start) to rank(Stop);
want=byte(i);output;
end;
drop i;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.