BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SarahDew
Obsidian | Level 7

I am using a count + group by to flag rows that are complete duplicates. The result removes the extra rows, while I prefer to keep the original dataset, with just an added column. I don't use a select distinct so can't understand why rows are being deleted. Any clue why it happens and how to avoid it?

 

data t;
input ID$ name$;
cards;
a010 Steve
a010 James
a011 Harvey
a012 Carl
a012 Carl;
run;

proc sql;
create table dup_flag as
select *, count(*) as n
from t
group by ID, name;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

Hi @SarahDew, when you use GROUP BY, it's going to collapse those columns into a single record based on ID and NAME. If you want to keep the original structure, I would just join the original table to your original query:

 

data t;
input ID$ name :$6.;
cards;
a010 Steve
a010 James
a011 Harvey
a012 Carl
a012 Carl
;
run;

proc sql;
	create table dup_flag as
		select 
			t.*
			, a.n
		from 
			t
		left join
			(
				select
					id
					, name
					, count(*) as n
				from
					t
				group by
					id
					, name
			) a
			on t.id = a.id
			and t.name = a.name
	; 
quit;
ID	name	n
a010	James	1
a010	Steve	1
a011	Harvey	1
a012	Carl	2
a012	Carl	2

I think that's what you're trying to get at, but I may not have understood the question. Let me know - happy to help.

View solution in original post

5 REPLIES 5
maguiremq
SAS Super FREQ

Hi @SarahDew, when you use GROUP BY, it's going to collapse those columns into a single record based on ID and NAME. If you want to keep the original structure, I would just join the original table to your original query:

 

data t;
input ID$ name :$6.;
cards;
a010 Steve
a010 James
a011 Harvey
a012 Carl
a012 Carl
;
run;

proc sql;
	create table dup_flag as
		select 
			t.*
			, a.n
		from 
			t
		left join
			(
				select
					id
					, name
					, count(*) as n
				from
					t
				group by
					id
					, name
			) a
			on t.id = a.id
			and t.name = a.name
	; 
quit;
ID	name	n
a010	James	1
a010	Steve	1
a011	Harvey	1
a012	Carl	2
a012	Carl	2

I think that's what you're trying to get at, but I may not have understood the question. Let me know - happy to help.

FreelanceReinh
Jade | Level 19

Hello @SarahDew,

 

Alternatively, you can group by a unique key derived from ID and name (instead of by ID, name). This will trigger automatic remerging (see the note in the log "The query requires remerging summary statistics back with the original data.") and thus prevent the unwanted aggregation. For your example data (and in most other cases) a simple concatenation works as the unique key:

group by ID||name;
PGStats
Opal | Level 21

... Or you can simply fool SAS/SQL into auto-remerge by pretending to perform an operation on one of the GROUP BY columns, for example:

 

proc sql;
/*create table dup_flag as */
  select *, count(*) as n
  from t
  group by ID, trim(name);
quit;
PG
PGStats
Opal | Level 21

If all your variables are in the GROUP BY clause, you must request the remerge explicitly, for example:

 

proc sql;
/*create table dup_flag as */
select *
from t natural join (
  select *, count(*) as n
  from t
  group by ID, name);
quit;
PG
Sajid01
Meteorite | Level 14

The error is in your data step. The last step has a semi colon is at the incorrect location just after Carl

Sajid01_0-1677886261135.png
The corrected code is shown below. I moved the semicolon to the next line.

data t;
input ID$ name$;
cards;
a010 Steve
a010 James
a011 Harvey
a012 Carl
a012 Carl
;
run;

proc sql;
select ID, NAME, count(*) as n
from t
group by id, name;
quit;

The result would be as expected. Nothing is deleted

Sajid01_1-1677886414291.png

 

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 675 views
  • 3 likes
  • 5 in conversation