I have a lookup being provided to me in the following format.
data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;
I have data structured like this.
data test1;
input code $5.;
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.
proc sql;
create table want as
select a.*,
b.data
from test1 a left join test2 b on
(a.code <= b.code2)
And
(a.code >= b.code1)
;
quit;
To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.
data want2;
input code1 $6. data;
datalines;
23145 5
23146 5
23147 5
7892F 6
7893F 6
32012 7
4456G 8
80090 9
80091 9
;
Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.
I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.
Consider using a format instead.
Performance will be much faster but not sorted. If sort order is important, sort it after the fact.
data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;
data code_fmt;
set test2;
fmtname = 'code_fmt';
type='C';
rename code1=start code2=end data=label;
run;
proc format cntlin=code_fmt;
run;
data test1;
input code $5.;
code_formatted = put(code, $code_fmt.);
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
run;
proc print data=test1;
run;
@A_SAS_Man wrote:
I have a lookup being provided to me in the following format.
data test2; input code1 $6. code2 $6. data; datalines; 23145 23147 5 7892F 7892F 6 7893F 7893F 6 32012 32012 7 4456G 4456G 8 80090 80091 9 ;
I have data structured like this.
data test1; input code $5.; datalines; 80090 80091 7892F 7893F 7893F 32012 4456G 23145 23146 23147 ;
I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.
proc sql; create table want as select a.*, b.data from test1 a left join test2 b on (a.code <= b.code2) And (a.code >= b.code1) ; quit;
To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.
data want2; input code1 $6. data; datalines; 23145 5 23146 5 23147 5 7892F 6 7893F 6 32012 7 4456G 8 80090 9 80091 9 ;
Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.
I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.
Consider using a format instead.
Performance will be much faster but not sorted. If sort order is important, sort it after the fact.
data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;
data code_fmt;
set test2;
fmtname = 'code_fmt';
type='C';
rename code1=start code2=end data=label;
run;
proc format cntlin=code_fmt;
run;
data test1;
input code $5.;
code_formatted = put(code, $code_fmt.);
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
run;
proc print data=test1;
run;
@A_SAS_Man wrote:
I have a lookup being provided to me in the following format.
data test2; input code1 $6. code2 $6. data; datalines; 23145 23147 5 7892F 7892F 6 7893F 7893F 6 32012 32012 7 4456G 4456G 8 80090 80091 9 ;
I have data structured like this.
data test1; input code $5.; datalines; 80090 80091 7892F 7893F 7893F 32012 4456G 23145 23146 23147 ;
I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.
proc sql; create table want as select a.*, b.data from test1 a left join test2 b on (a.code <= b.code2) And (a.code >= b.code1) ; quit;
To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.
data want2; input code1 $6. data; datalines; 23145 5 23146 5 23147 5 7892F 6 7893F 6 32012 7 4456G 8 80090 9 80091 9 ;
Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.
I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.
Thanks this appears to be working, one question. If I had two data fields I was trying to attach as part of this would I essentially just have to repeat this process for each data element? Or is there a way to use format to attach two values? See example alternate test2 below, if I wanted to attach data2 a the same time and with the same logic as data is there a simple way to do that in your method? Essentially data2 in my process is just a description that I need on there.
data test2;
input code1 $6. code2 $6. data data2;
datalines;
23145 23147 5 k
7892F 7892F 6 l
7893F 7893F 6 l
32012 32012 7 y
4456G 4456G 8 f
80090 80091 9 s
;
You could also use a concatenation of data and data2 (with a suitable delimiter) as the format label and then decompose the formatted value using the SCAN function.
If data2 depends on data, not on the codes, and the same (data, data2) pair applies to multiple ranges of codes, a separate look-up "data → data2" would avoid the need to store multiple copies of (possibly long) data2 values in the format.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.