Hi. I have this data set with duplicates.
I want to add a Group column where if an ID with duplicates received all the same scores, then they are labelled as 1, otherwise, labelled as 2. If an ID has no duplicate, then it is labelled as 3.
data have ;
input ID $ score ;
cards ;
1 10
1 10
2 5
3 20
3 21
3 20
4 15
4 18
5 .
6 8
6 8
7 17
7 17
7 25
;
run ;
The result would look like this.
ID Score Group
1 10 1
1 10 1
2 5 3
3 20 2
3 21 2
3 20 2
4 15 2
4 18 2
5 . 3
6 8 1
6 8 1
7 17 2
7 17 2
7 25 2
Then I want to create a table summarizing the results counting the IDs (only once if duplicates) that had the same scores (Group 1), different scores (Group 2) and those without duplicates (Group 3).
The table would look something like this.
Group Frequency
1 2
2 3
3 2
Thank you for your help!
proc sql;
create table want as
select *, case when count(distinct score) in (1,0) and count(score) in (1,0) then 3
when count(distinct score)=1 and count(score)>1 then 1
when count(distinct score)>1 then 2
end as group
from have
group by id
order by id;
quit;
Thank you @novinosrin .
I got this comment.
NOTE: A CASE expression has no ELSE clause. Cases not accounted for by the
WHEN clauses will result in a missing value for the CASE expression.
NOTE: The query requires remerging summary statistics back with the original
data.
NOTE: Table WORK.WANT created, with 14 rows and 3 columns.
194 order by id;
-----
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
Hello @yoyong555 ok added the else, however i did not get any error. here is my test
data have ;
input ID $ score ;
cards ;
1 10
1 10
2 5
3 20
3 21
3 20
4 15
4 18
5 .
6 8
6 8
7 17
7 17
7 25
;
run ;
proc sql;
create table want as
select *, case when count(distinct score) in (1,0) and count(score) in (1,0) then 3
when count(distinct score)=1 and count(score)>1 then 1
when count(distinct score)>1 then 2
else . end as group
from have
group by id
order by id;
quit;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.WANT created, with 14 rows and 3 columns.
46 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
Since your input is sorted by ID, both output data sets can be generated in a single step with 2 passes through the input data:
data have ;
input ID $ score ;
cards ;
1 10
1 10
2 5
3 20
3 21
3 20
4 15
4 18
5 .
6 8
6 8
7 17
7 17
7 25
;
run ;
data group (keep = id score group)
freq (keep = group freq)
;
do _n_ = 1 by 1 until (last.id) ;
set have ;
by id ;
if _n_ = 1 then _score1 = score ;
else if _score1 ne score then group = 2 ;
end ;
if _n_ = 1 then group = 3 ;
else if group ne 2 then group = 1 ;
array fq [3] _temporary_ ;
fq [group] + 1 ;
do _n_ = 1 to _n_ ;
set have end = lr ;
output group ;
end ;
if lr then do group = 1 to dim (fq) ;
freq = fq [group] ;
output freq ;
end ;
run ;
Each BY group is read twice:
Simple!
Kind regards
Paul D.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.