BookmarkSubscribeRSS Feed
CathyVI
Pyrite | Level 9

Hello,

I have a variable called Q6 in sas dataset. It is a survey questionnaire. I will like to check if there are duplicates in the sentences in Q6. If there are duplicates I will like to create a new variable called Q6_dup =1 or Q6_dup =0.

 

For example:

data want;

input ID$ Q6$;

1233  Any drug has certain side effects, we must strictly under the guidance of the doctor's rational use of drugs, do not abuse drugs. 

3656  Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease

8677 As far as addiction being a disease…in some cases yes…say in the instance where someone had a serious accident

3455 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease.

I don't know how to use (if first.var) because sentences may start with the same word but may not continue to be the same.

 

4 REPLIES 4
ballardw
Super User

Please identify which exact "duplicates" you want marked in the example.

And which aren't.

 

If you have many of these "duplicates" I would ask if this text is actually generated as the result of selecting an option in the survey (a common issue with survey entry software) in which case your duplicate indicator is not going to be very helpful.

 

Typically for my surveys I would read the data with custom informats so that the known and expected responses are coded to standard value and then only worry about the possible open text responses if they appear in the same responses.

CathyVI
Pyrite | Level 9

@ballardw  From the example I gave ID 3656 and 3455 are duplicates. See below:

3656 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease

3455 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease.

I could like to mark the 3656 as Q6_dup =1 and 3455 as Q6_dup =0.

Yes, text is generated as the result of selecting an option in the survey

I am still learning sas and not sure how to approach this using the custom informats.

Please could you or anyone help me with a sample code and explanation using the example above.

Reeza
Super User

If the text isn't free text and ID isn't per person, then sort it and flag the first record? Do you care which ID is marked as non-duplicate? What happens if there are more than 2 instances?

 

proc sort data=have;
by Q6 ID;
run;

data want;
set have;
by q6;
q6_dup=first.q6;
run;

Reeza
Super User

It's not clear what you're considering a duplicate, can you clarify?

  • What is the input? Is this the sample data posted? If so, please post as a data step in code block. 
  • How is a duplicate defined?
  • What is the expected output?

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 243 views
  • 0 likes
  • 3 in conversation