Hello,
I have a variable called Q6 in sas dataset. It is a survey questionnaire. I will like to check if there are duplicates in the sentences in Q6. If there are duplicates I will like to create a new variable called Q6_dup =1 or Q6_dup =0.
For example:
data want;
input ID$ Q6$;
1233 Any drug has certain side effects, we must strictly under the guidance of the doctor's rational use of drugs, do not abuse drugs.
3656 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease
8677 As far as addiction being a disease…in some cases yes…say in the instance where someone had a serious accident
3455 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease.
I don't know how to use (if first.var) because sentences may start with the same word but may not continue to be the same.
Please identify which exact "duplicates" you want marked in the example.
And which aren't.
If you have many of these "duplicates" I would ask if this text is actually generated as the result of selecting an option in the survey (a common issue with survey entry software) in which case your duplicate indicator is not going to be very helpful.
Typically for my surveys I would read the data with custom informats so that the known and expected responses are coded to standard value and then only worry about the possible open text responses if they appear in the same responses.
@ballardw From the example I gave ID 3656 and 3455 are duplicates. See below:
3656 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease
3455 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease.
I could like to mark the 3656 as Q6_dup =1 and 3455 as Q6_dup =0.
Yes, text is generated as the result of selecting an option in the survey
I am still learning sas and not sure how to approach this using the custom informats.
Please could you or anyone help me with a sample code and explanation using the example above.
If the text isn't free text and ID isn't per person, then sort it and flag the first record? Do you care which ID is marked as non-duplicate? What happens if there are more than 2 instances?
proc sort data=have;
by Q6 ID;
run;
data want;
set have;
by q6;
q6_dup=first.q6;
run;
It's not clear what you're considering a duplicate, can you clarify?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.