BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SarahDew
Obsidian | Level 7

I am using a count + group by to flag rows that are complete duplicates. The result removes the extra rows, while I prefer to keep the original dataset, with just an added column. I don't use a select distinct so can't understand why rows are being deleted. Any clue why it happens and how to avoid it?

 

data t;
input ID$ name$;
cards;
a010 Steve
a010 James
a011 Harvey
a012 Carl
a012 Carl;
run;

proc sql;
create table dup_flag as
select *, count(*) as n
from t
group by ID, name;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

Hi @SarahDew, when you use GROUP BY, it's going to collapse those columns into a single record based on ID and NAME. If you want to keep the original structure, I would just join the original table to your original query:

 

data t;
input ID$ name :$6.;
cards;
a010 Steve
a010 James
a011 Harvey
a012 Carl
a012 Carl
;
run;

proc sql;
	create table dup_flag as
		select 
			t.*
			, a.n
		from 
			t
		left join
			(
				select
					id
					, name
					, count(*) as n
				from
					t
				group by
					id
					, name
			) a
			on t.id = a.id
			and t.name = a.name
	; 
quit;
ID	name	n
a010	James	1
a010	Steve	1
a011	Harvey	1
a012	Carl	2
a012	Carl	2

I think that's what you're trying to get at, but I may not have understood the question. Let me know - happy to help.

View solution in original post

5 REPLIES 5
maguiremq
SAS Super FREQ

Hi @SarahDew, when you use GROUP BY, it's going to collapse those columns into a single record based on ID and NAME. If you want to keep the original structure, I would just join the original table to your original query:

 

data t;
input ID$ name :$6.;
cards;
a010 Steve
a010 James
a011 Harvey
a012 Carl
a012 Carl
;
run;

proc sql;
	create table dup_flag as
		select 
			t.*
			, a.n
		from 
			t
		left join
			(
				select
					id
					, name
					, count(*) as n
				from
					t
				group by
					id
					, name
			) a
			on t.id = a.id
			and t.name = a.name
	; 
quit;
ID	name	n
a010	James	1
a010	Steve	1
a011	Harvey	1
a012	Carl	2
a012	Carl	2

I think that's what you're trying to get at, but I may not have understood the question. Let me know - happy to help.

FreelanceReinh
Jade | Level 19

Hello @SarahDew,

 

Alternatively, you can group by a unique key derived from ID and name (instead of by ID, name). This will trigger automatic remerging (see the note in the log "The query requires remerging summary statistics back with the original data.") and thus prevent the unwanted aggregation. For your example data (and in most other cases) a simple concatenation works as the unique key:

group by ID||name;
PGStats
Opal | Level 21

... Or you can simply fool SAS/SQL into auto-remerge by pretending to perform an operation on one of the GROUP BY columns, for example:

 

proc sql;
/*create table dup_flag as */
  select *, count(*) as n
  from t
  group by ID, trim(name);
quit;
PG
PGStats
Opal | Level 21

If all your variables are in the GROUP BY clause, you must request the remerge explicitly, for example:

 

proc sql;
/*create table dup_flag as */
select *
from t natural join (
  select *, count(*) as n
  from t
  group by ID, name);
quit;
PG
Sajid01
Meteorite | Level 14

The error is in your data step. The last step has a semi colon is at the incorrect location just after Carl

Sajid01_0-1677886261135.png
The corrected code is shown below. I moved the semicolon to the next line.

data t;
input ID$ name$;
cards;
a010 Steve
a010 James
a011 Harvey
a012 Carl
a012 Carl
;
run;

proc sql;
select ID, NAME, count(*) as n
from t
group by id, name;
quit;

The result would be as expected. Nothing is deleted

Sajid01_1-1677886414291.png

 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1211 views
  • 3 likes
  • 5 in conversation