I have the following
data have1;
input Grp cum ;
datalines;
1 2
2 4
3 6
4 8
5 10
6 12
;
run;
data have2;
input id ;
datalines;
1
2
3
4
5
6
7
8
9
10
11
12
;
run;
proc sql;
Create table want as
Select a.Id,
b.Grp
from have2 as a
left join have1 as b
on a.id=b.Cum
;
The above gives the output as
id | Grp |
1 | . |
2 | 1 |
3 | . |
4 | 2 |
5 | . |
6 | 3 |
7 | . |
8 | 4 |
9 | . |
10 | 5 |
11 | . |
12 | 6 |
I would like to get this:
id | Grp |
1 | 1 |
2 | 1 |
3 | 2 |
4 | 2 |
5 | 3 |
6 | 3 |
7 | 4 |
8 | 4 |
9 | 5 |
10 | 5 |
11 | 6 |
12 | 6 |
Thanks for your help in advance.
If you want to "join" HAVE1 and HAVE2 then first convert HAVE1 to something with a range of values of ID.
data groups;
set have1;
by cum;
min=sum(lag(cum),1);
max=cum;
keep grp min max;
run;
Now you can join HAVE2 with GROUPS and assign the GRP value to each observation in HAVE2.
proc sql ;
create table want as
select * from have2 a inner join groups b
on a.id between b.min and b.max
;
quit;
SQL is not really the solution for this but here you go:
data have1;
input Grp cum;
datalines;
1 2
2 4
3 6
4 8
5 10
6 12
;
run;
data have2;
input id;
datalines;
1
2
3
4
5
6
7
8
9
10
11
12
;
run;
proc sql;
Create table want (keep=id grp) as
Select a.Id,
b.Grp as grp_,
case
when b.grp is missing then monotonic() - 1
end
as col,
case
when b.grp is missing then sum(1,calculated col)
else b.grp
end
as grp
from have2 as a
left join have1 as b
on a.id=b.Cum;
quit;
What is the reasoning that says ID=1 should be matched to GRP=1? I do not see the connection.
There is no relationship between group and id. This is a mere representation of more than 100K rows.
To remove the confusion here is the updated data set.
data have1;
input Grp cum ;
datalines;
1 3252
2 6752
3 10252
4 13752
5 17252
6 20752
;
run;
data have2;
do i=1 to 20000;
id=i;output;
end;
drop i;
run;
You must have had some algorithm in your mind when went from your two example input datasets to your desired output dataset.
What was the logic that said ID=1 was part of GRP=1 instead of GRP=2 or 3 or 3457?
I am trying to fill up the missing group values when I left join. The desired output will be all ids less than cum should have the same group value.
@SK_11 wrote:
I am trying to fill up the missing group values when I left join. The desired output will be all ids less than cum should have the same group value.
There must be more to it than that. If HAVE1 is
data have1;
input Grp cum;
datalines;
1 2
2 4
3 6
4 8
5 10
6 12
;
The ID=1 will be a member of all 6 GRP values since one is less than 2 and also less than 4 and less then 6 etc.
Why would you want to try to force a solution to this problem using SQL?
Do you just want to do this?
Given this input:
data have1;
input Grp cum;
datalines;
1 2
2 4
3 6
4 8
5 10
6 12
;
Run this data step:
data want;
set have1;
do id=sum(lag(cum),1) to cum;
output;
end;
run;
Result:
Obs Grp cum id 1 1 2 1 2 1 2 2 3 2 4 3 4 2 4 4 5 3 6 5 6 3 6 6 7 4 8 7 8 4 8 8 9 5 10 9 10 5 10 10 11 6 12 11 12 6 12 12
What does HAVE2 add to the problem? Are you just trying to set an upper bound on the value of ID?
data have1;
input Grp cum ;
datalines;
1 3252
2 6752
3 10252
4 13752
5 17252
6 20752
;
data want;
set have1;
do id=sum(lag(cum),1) to min(cum,20000);
output;
end;
run;
proc freq data=want;
tables grp ;
run;
The FREQ Procedure Cumulative Cumulative Grp Frequency Percent Frequency Percent -------------------------------------------------------- 1 3252 16.26 3252 16.26 2 3500 17.50 6752 33.76 3 3500 17.50 10252 51.26 4 3500 17.50 13752 68.76 5 3500 17.50 17252 86.26 6 2748 13.74 20000 100.00
If you want to "join" HAVE1 and HAVE2 then first convert HAVE1 to something with a range of values of ID.
data groups;
set have1;
by cum;
min=sum(lag(cum),1);
max=cum;
keep grp min max;
run;
Now you can join HAVE2 with GROUPS and assign the GRP value to each observation in HAVE2.
proc sql ;
create table want as
select * from have2 a inner join groups b
on a.id between b.min and b.max
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.