Group rows that are duplicated

Reply
Occasional Contributor
Posts: 18

Group rows that are duplicated

Hello to everyone!

Does anybody know if there is a way to group rows that are duplicated in a data set? For example, to put them at the end, or mark them with a color, or create another data set with the duplicated rows.

I appreciate any kind of help!

Kind regards,

Milenko

Super User
Posts: 5,435

Re: Group rows that are duplicated

Posted in reply to MilenkoAndreas

Please attach some sample data to show what you require.

PROC SORT has a DUPOUT= option that might suit your need.

Data never sleeps
Occasional Contributor
Posts: 18

Re: Group rows that are duplicated

Hello Linus! Thank you so much for your reply!

Im not so sure what PROC SORT does, Im kind of new in SAS Guide, and I just know how to use some tasks, I guess you are talking about prorgraming. If that is the right path to take, could you tell me if there is a good tutorial for beginners? Please!

Anyway the data is just a matrix with 10 colunms, some columns have dates, others have strings and others numbers.  I dont know if that is enough info, how should I send a sample data?...

Kind regards,

Milenko

New Contributor
Posts: 2

Re: Group rows that are duplicated

Posted in reply to MilenkoAndreas

Hi  Milenko,

UCLA operates a great site for individuals wanting to learn SAS.  Here is the URL:  http://www.ats.ucla.edu/stat/sas/default.htm

They have nice tutorials and movies to watch.

SAS also has free training.  Check this link:  SAS Training Starting Points  Look for the "Free Training at Your Fingertips".

Enjoy learning!

Roger

Occasional Contributor
Posts: 18

Re: Group rows that are duplicated

Posted in reply to rogerward

Thank you, rogerward!! I will visit those places for sure!! Thank you so much!!

Contributor
Posts: 45

Re: Group rows that are duplicated

Posted in reply to MilenkoAndreas

Hi Milenko,

To answer your question, the SORT procedure (written as PROC SORT) in the SAS editor window sorts your observations based on any number of variables in your data set. For example, you mentioned dates, you can sort the data by dates, using a BY statement, and keep the first record with a given date in your data set and copy the duplicate records into a new data set using the DUPOUT= option mentioned above. Here is an example:

 

proc sort data=file dupout=dup nodupkey;

by var1;

run;

Note that you can sort a data set by any number of BY variables (second line), not just the date, but other variables in your data set.

Hope that helps,

Maik.

Occasional Contributor
Posts: 18

Re: Group rows that are duplicated

Posted in reply to MaikH_Schutze

Hey Maik! Thank you so much! that is exactly what I used!  Your help is very appreciate!!

Kind regards!

Occasional Contributor
Posts: 18

Re: Group rows that are duplicated

Hey! I just learned how to use that function! :smileygrin:  Thank you so much!! It worked  just fine!

Ask a Question
Discussion stats
  • 7 replies
  • 241 views
  • 3 likes
  • 4 in conversation