Hi,
I have a dataset with comments similar to the dataset below. I would like to output each comments and its duplicates into different sheet in excel. How do I find the duplicate, after sorting without deleting duplicate and outputting them to each excel sheet?
For example:
data have;
length comment $ 1000;
input comment$ 1 - 1000;
datalines;
Since it is impossible to completely and completely quit using dependent drugs, it is better to face drug abuse with a positive, rational and practical attitude and point of view
Substance abuse refers to the intermittent or persistent excessive use of psychoactive substances that is contrary to social norms or is irrelevant or inconsistent with accepted medical practice.Such abuse is far from tentative use, social entertainment or situational need, but gradually turns into a state of intensive use, which leads to the formation of dependence.
Substance abuse refers to the intermittent or persistent excessive use of psychoactive substances that is contrary to social norms or is irrelevant or inconsistent with accepted medical practice.Such abuse is far from tentative use, social entertainment or situational need, but gradually turns into a state of intensive use, which leads to the formation of dependence.
Need safe spaces and safe disposable sites
Need safe spaces and safe disposable sites
More attention has been paid to the concept of harm reduction.
NO
No opinion
Marijuana should be legal.
Need safe spaces and safe disposable sites
The use of drugs should be vigorously suppressed
The use of drugs should be vigorously suppressed
Very bad behavior
Need safe spaces and safe disposable sites
Increase control
nothing
nothing
I just hope there will be when I need it
I just hope there will be when I need it
Increase control
No opinion
;
run;
Expected results:
Sheet 1
Need safe spaces and safe disposable sites
Need safe spaces and safe disposable sites
Need safe spaces and safe disposable sites
Need safe spaces and safe disposable sites
Sheet 2:
nothing
nothing
sheet 3:
I just hope there will be when I need it
I just hope there will be when I need it
sheet 4:
Substance abuse refers to the intermittent or persistent excessive use of psychoactive substances that is contrary to social norms or is irrelevant or inconsistent with accepted medical practice.Such abuse is far from tentative use, social entertainment or situational need, but gradually turns into a state of intensive use, which leads to the formation of dependence.
Substance abuse refers to the intermittent or persistent excessive use of psychoactive substances that is contrary to social norms or is irrelevant or inconsistent with accepted medical practice.Such abuse is far from tentative use, social entertainment or situational need, but gradually turns into a state of intensive use, which leads to the formation of dependence.
and so on.
Thanks
Does that data read in correctly for you? It doesn't for me.
If order of the results matter you need to add a variable to control the order or you may have one in your data set already so leaving that as an exercise for you to solve.
If you have other destinations open you may want to close them as otherwise you'll get a lot of printed output.
proc sort data=have out=duplicates nouniquekey;
by comment;
run;
ods excel file='/home/fkhurshed/Demo1/duplicates.xlsx' options(sheet_interval="bygroup" sheet_label = '' sheet_name = "Duplicate");
options nobyline;
proc print data=duplicates;
by comment;
var comment;
run;
ods excel close;
Does that data read in correctly for you? It doesn't for me.
If order of the results matter you need to add a variable to control the order or you may have one in your data set already so leaving that as an exercise for you to solve.
If you have other destinations open you may want to close them as otherwise you'll get a lot of printed output.
proc sort data=have out=duplicates nouniquekey;
by comment;
run;
ods excel file='/home/fkhurshed/Demo1/duplicates.xlsx' options(sheet_interval="bygroup" sheet_label = '' sheet_name = "Duplicate");
options nobyline;
proc print data=duplicates;
by comment;
var comment;
run;
ods excel close;
This code works but I wanted to see all other variables so excluded the "var" statement so that sas can include all variables in the dataset. However, sas included all the variable except "comment". Why is that? Is it because am using "by comment"? How could I have comment and all other variables in the dataset. I don't want to use keep statement because I have a lot of variables.
VAR statement controls what is displayed. Remove the VAR statement and I think it prints all variables by default.
@Reeza Yes i removed the var and all variables was displayed except comment. I want variable "comment" to be displayed too. How will I get it ?
var _all_; Select all variables for printing then.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
