BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
CathyVI
Lapis Lazuli | Level 10

Hi,

I have a dataset with comments similar to the dataset below. I would like to output each comments and its duplicates into different sheet in excel. How do I find the duplicate, after sorting without deleting duplicate and outputting them to each excel sheet?

For example:

data have;
length comment $ 1000;
input comment$ 1 - 1000;
datalines;
Since it is impossible to completely and completely quit using dependent drugs, it is better to face drug abuse with a positive, rational and practical attitude and point of view
Substance abuse refers to the intermittent or persistent excessive use of psychoactive substances that is contrary to social norms or is irrelevant or inconsistent with accepted medical practice.Such abuse is far from tentative use, social entertainment or situational need, but gradually turns into a state of intensive use, which leads to the formation of dependence.
Substance abuse refers to the intermittent or persistent excessive use of psychoactive substances that is contrary to social norms or is irrelevant or inconsistent with accepted medical practice.Such abuse is far from tentative use, social entertainment or situational need, but gradually turns into a state of intensive use, which leads to the formation of dependence.
Need safe spaces and safe disposable sites
Need safe spaces and safe disposable sites
More attention has been paid to the concept of harm reduction.
NO
No opinion
Marijuana should be legal.
Need safe spaces and safe disposable sites
The use of drugs should be vigorously suppressed
The use of drugs should be vigorously suppressed
Very bad behavior
Need safe spaces and safe disposable sites
Increase control
nothing
nothing
I just hope there will be when I need it
I just hope there will be when I need it
Increase control
No opinion
;
run;

 

Expected results:

 

Sheet 1

Need safe spaces and safe disposable sites
Need safe spaces and safe disposable sites

Need safe spaces and safe disposable sites
Need safe spaces and safe disposable sites

Sheet 2:

nothing
nothing

sheet 3:

I just hope there will be when I need it
I just hope there will be when I need it

sheet 4:

Substance abuse refers to the intermittent or persistent excessive use of psychoactive substances that is contrary to social norms or is irrelevant or inconsistent with accepted medical practice.Such abuse is far from tentative use, social entertainment or situational need, but gradually turns into a state of intensive use, which leads to the formation of dependence.
Substance abuse refers to the intermittent or persistent excessive use of psychoactive substances that is contrary to social norms or is irrelevant or inconsistent with accepted medical practice.Such abuse is far from tentative use, social entertainment or situational need, but gradually turns into a state of intensive use, which leads to the formation of dependence.

and so on.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Does that data read in correctly for you? It doesn't for me.

 

If order of the results matter you need to add a variable to control the order or you may have one in your data set already so leaving that as an exercise for you to solve.  

 

If you have other destinations open you may want to close them as otherwise you'll get a lot of printed output. 

 


proc sort data=have out=duplicates nouniquekey;
by comment;
run;

ods excel file='/home/fkhurshed/Demo1/duplicates.xlsx' options(sheet_interval="bygroup" sheet_label = '' sheet_name = "Duplicate");

options nobyline;
proc print data=duplicates;
by comment;
var comment;
run;

ods excel close;

View solution in original post

6 REPLIES 6
Reeza
Super User

Does that data read in correctly for you? It doesn't for me.

 

If order of the results matter you need to add a variable to control the order or you may have one in your data set already so leaving that as an exercise for you to solve.  

 

If you have other destinations open you may want to close them as otherwise you'll get a lot of printed output. 

 


proc sort data=have out=duplicates nouniquekey;
by comment;
run;

ods excel file='/home/fkhurshed/Demo1/duplicates.xlsx' options(sheet_interval="bygroup" sheet_label = '' sheet_name = "Duplicate");

options nobyline;
proc print data=duplicates;
by comment;
var comment;
run;

ods excel close;
CathyVI
Lapis Lazuli | Level 10

@Reeza 

This code works but I wanted to see all other variables so excluded the "var" statement so that sas can include all variables in the dataset. However, sas included all the variable except "comment". Why is that? Is it because am using "by comment"?  How could I have comment and all other variables in the dataset. I don't want to use keep statement because I have a lot of variables. 

 

Reeza
Super User

VAR statement controls what is displayed. Remove the VAR statement and I think it prints all variables by default.

 

 

CathyVI
Lapis Lazuli | Level 10

@Reeza Yes i removed the var and all variables was displayed except comment. I want variable "comment" to be displayed too. How will I get it ?

Reeza
Super User
var _all_; 

Select all variables for printing then. 

CathyVI
Lapis Lazuli | Level 10
Great! Thanks, it worked.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1362 views
  • 3 likes
  • 2 in conversation