Hi, basically what I need is based on where the Key falls in the interval for each quarter, a column indicating the proper group it belongs should be created from have1. Left join won't work since I don't really have a unique key, this is simply a lookup based on a range. Any help is appreciated!
Example:
have1:
Quarter | Key |
1 | 31.39715 |
1 | 53.59118 |
1 | 84.20986 |
1 | 81.27638 |
2 | 58.62042 |
2 | 12.07567 |
2 | 91.42742 |
3 | 69.06109 |
4 | 53.8079 |
4 | 33.30056 |
have 2:
Quarter | Group | Cutoff |
1 | A | 20 |
1 | B | 50 |
1 | C | 90 |
1 | D | 100 |
2 | A | 10 |
2 | B | 20 |
2 | C | 30 |
2 | D | 100 |
3 | A | 60 |
3 | B | 70 |
3 | C | 80 |
3 | D | 100 |
4 | A | 10 |
4 | B | 60 |
4 | C | 99 |
4 | D | 100 |
want:
Quarter | Key | Group |
1 | 31.39715 | B |
1 | 53.59118 | C |
1 | 84.20986 | C |
1 | 81.27638 | C |
2 | 58.62042 | D |
2 | 12.07567 | B |
2 | 91.42742 | D |
3 | 69.06109 | A |
4 | 53.8079 | A |
4 | 33.30056 | A |
Here is a possible approach using formats. First, create the data HAVE sets:
data have1;
infile datalines dsd dlm='|';
input Quarter Key;
datalines;
1|31.39715
1|53.59118
1|84.20986
1|81.27638
2|58.62042
2|12.07567
2|91.42742
3|69.06109
4|53.8079
4|33.30056
;
data have2;
infile datalines dsd dlm='|';
input Quarter Group:$1. Cutoff;
datalines;
1|A|20
1|B|50
1|C|90
1|D|100
2|A|10
2|B|20
2|C|30
2|D|100
3|A|60
3|B|70
3|C|80
3|D|100
4|A|10
4|B|60
4|C|99
4|D|100
;
Next, use the HAVE2 dataset to create a series of formats, one for each Quarter:
data cntlin;
set have2 (rename=(Group=Label));
by quarter ;
drop quarter cutoff;
fmtname=cats('qtr',quarter,'_');
Start=put(lag1(Cutoff),32.);
End=put(Cutoff,32.);
if first.quarter then Start='LOW';
run;
proc format cntlin=cntlin fmtlib;
run;
Finally, apply the appropriate format to the Key values using PUTN, which allows assigning the format at execution time:
data want;
set have1;
Group=putn(Key,cats('qtr',quarter,'_'));
run;
This produces the desired results:
Obs | Quarter | Key | Group |
---|---|---|---|
1 | 1 | 31.3972 | B |
2 | 1 | 53.5912 | C |
3 | 1 | 84.2099 | C |
4 | 1 | 81.2764 | C |
5 | 2 | 58.6204 | D |
6 | 2 | 12.0757 | B |
7 | 2 | 91.4274 | D |
8 | 3 | 69.0611 | B |
9 | 4 | 53.8079 | B |
10 | 4 | 33.3006 | B |
Note that the values for Quarter 4 different from your "Want" data, but these values are correct based on the cutoff values provided in your HAVE2 dataset.
May the SAS be with you!
Mark
Here is a possible approach using formats. First, create the data HAVE sets:
data have1;
infile datalines dsd dlm='|';
input Quarter Key;
datalines;
1|31.39715
1|53.59118
1|84.20986
1|81.27638
2|58.62042
2|12.07567
2|91.42742
3|69.06109
4|53.8079
4|33.30056
;
data have2;
infile datalines dsd dlm='|';
input Quarter Group:$1. Cutoff;
datalines;
1|A|20
1|B|50
1|C|90
1|D|100
2|A|10
2|B|20
2|C|30
2|D|100
3|A|60
3|B|70
3|C|80
3|D|100
4|A|10
4|B|60
4|C|99
4|D|100
;
Next, use the HAVE2 dataset to create a series of formats, one for each Quarter:
data cntlin;
set have2 (rename=(Group=Label));
by quarter ;
drop quarter cutoff;
fmtname=cats('qtr',quarter,'_');
Start=put(lag1(Cutoff),32.);
End=put(Cutoff,32.);
if first.quarter then Start='LOW';
run;
proc format cntlin=cntlin fmtlib;
run;
Finally, apply the appropriate format to the Key values using PUTN, which allows assigning the format at execution time:
data want;
set have1;
Group=putn(Key,cats('qtr',quarter,'_'));
run;
This produces the desired results:
Obs | Quarter | Key | Group |
---|---|---|---|
1 | 1 | 31.3972 | B |
2 | 1 | 53.5912 | C |
3 | 1 | 84.2099 | C |
4 | 1 | 81.2764 | C |
5 | 2 | 58.6204 | D |
6 | 2 | 12.0757 | B |
7 | 2 | 91.4274 | D |
8 | 3 | 69.0611 | B |
9 | 4 | 53.8079 | B |
10 | 4 | 33.3006 | B |
Note that the values for Quarter 4 different from your "Want" data, but these values are correct based on the cutoff values provided in your HAVE2 dataset.
May the SAS be with you!
Mark
And here a SQL way using the source data created by @SASJedi
The proposed data step with formats will perform better so I'd certainly be using this approach for any bigger data volumes.
The advantage of pure SQL is that it could execute in-database (with a few amendments to the current SAS SQL flavor) if both of your source tables are in a database.
proc sql;
select
h1.quarter
,h1.key
,h2.group
from have1 h1 left join have2 h2
on h1.quarter=h2.quarter
and h1.key <= h2.Cutoff
group by h1.quarter, h1.key
having h2.Cutoff-h1.key = min(h2.Cutoff-h1.key)
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.