BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zimcom
Pyrite | Level 9

Hi all,

 

I have a list of medication of over thousands of observations, how can I flag the same medications taken by the same subject (duplicated entries) effectively? any ideas.. 

 

SUBJECTCMTRTcmstdtcCMONGOcmendtc
117961Acetylsalicylic Acid. .
117961Dextrose. .
117961Midazolam. .
117961Oxygen. .
117961Testosterone2014-12-31 2015-03-26
117961Atorvastatin2015-04-01Checked.
117961Diltiazen2015-03-28 2015-05-30
117961Diltiazem2015-03-30 2015-03-31
117961Amiodarone2015-03-31 2015-05-05
117961Sennoside2015-04-01 2015-04-01
117961Apixaban2015-04-05 2015-04-07
117961Magnesium sulphate2015-03-31 2015-03-31
117961TPA2015-03-26 2015-03-26
117961acetaminophen2015-03-27Checked.
117961enoxaparin2015-03-31 2015-04-03
117961Cetirizine2014-12-31Checked.
117961Diltiazem2015-06-10Checked.
117961acetylsalicyclic acid2015-03-31 2015-04-05
117961perindopril2015-03-30Checked.
117961Indapamide2015-03-30Checked.
117961Gentamicin2015-05-08 2015-05-08
1179612% Xylocaine2015-05-08 2015-05-08
117961Cephazolin2015-05-08 2015-05-08
117961dalteparin2015-04-04 2015-04-04
117961Docusate Sodium2015-04-01 2015-04-08
117961Omnaris Nasal Spray2014-12-31 2015-03-26
117961Amiodarone2015-03-31 2015-05-06
117961Morphine2015-03-27 2015-05-28
117961Perindopril/ indapamide2014-12-31Checked.
117961Zopiclone2014-12-31 2015-03-26
117961acetaminophen2015-03-27 2015-05-28
117961Dimenhydrinate2015-03-28 2015-03-28
117961Baclofen2014-12-31 2015-03-26
117961Amiodarone2015-03-28 2015-05-28
117961Magnesium Sulphate2015-03-28 2015-03-28
117961Potassium Chloride2015-03-28 2015-05-28
118036Acetylsalicylic Acid. .
1 ACCEPTED SOLUTION

Accepted Solutions
pau13rown
Lapis Lazuli | Level 10

sort by subject and cmtrt, then maybe "proc sort data=.... nodupkey dupout=XXX" see the dataset XXX for the duplicates. Or once sorted you can identify duplicates in a data step

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Are there duplicated entries in this data?? Can you give a specific example?

--
Paige Miller
zimcom
Pyrite | Level 9

I am trying to check/identify if there are duplicate medications taken by the same subject...

 

I know there is a way to do this like sort and compare, I am just wondering if there is a simple approach, since the list is huge...  

novinosrin
Tourmaline | Level 20

proc sql;

create table want as

select * ,count(CMTRT) >1 as dup_flag

from have

group by subject, CMTRT;

quit;

 

untested

zimcom
Pyrite | Level 9

tested it, it works too

 

Thanks

zimcom

PaigeMiller
Diamond | Level 26

So the dates shown in the data set have no bearing on whether or not something is a duplicate? This was not stated in the original problem statement. Why show us information not related to the problem at hand?

--
Paige Miller
pau13rown
Lapis Lazuli | Level 10

sort by subject and cmtrt, then maybe "proc sort data=.... nodupkey dupout=XXX" see the dataset XXX for the duplicates. Or once sorted you can identify duplicates in a data step

hashman
Ammonite | Level 13

You data sample gives no good idea about (a) what you mean by duplicates and (b) in which manner you want to search for them:

 

(a) For example, for subject 117961, neither Atorvastatin nor Cetirizine appear among its records more than once, and yet in your sample, they are checked as dupes.  

(b) You don't say whether medication A in one record and medications A/B in another (such as Perindopril/ indapamide) are considered dupes. If they are, a program to check for them would be more involved than if they are not. This is because in this case, CMTRT cannot be relied upon as a key by the unduplication process since entries like A/B would have to be parsed into components first.

 

Besides, it's unclear whether your sample data represents your input or desired output looking like the input augmented with the variable CMONGO. Also, it looks as though your input isn't cleansed: For example, you have DiltiazeM in one record and DiltiazeN (which is likely a data entry typo) in the prior one. If you wanted your program to recognize such things as identical, it'd have to contain some sort of a fuzzy match routine, which seems to be is well beyond the scope of your question.

 

Generally, it would serve you (and those trying to help here) well if you presented you sample input and desired output unambiguously, tersely, and, as Paige has noted, with no extraneous information (such as the dates in your sample data). 

 

Paul D.  

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1009 views
  • 0 likes
  • 5 in conversation