Hi all,
I have a list of medication of over thousands of observations, how can I flag the same medications taken by the same subject (duplicated entries) effectively? any ideas..
SUBJECT | CMTRT | cmstdtc | CMONGO | cmendtc |
117961 | Acetylsalicylic Acid | . | . | |
117961 | Dextrose | . | . | |
117961 | Midazolam | . | . | |
117961 | Oxygen | . | . | |
117961 | Testosterone | 2014-12-31 | 2015-03-26 | |
117961 | Atorvastatin | 2015-04-01 | Checked | . |
117961 | Diltiazen | 2015-03-28 | 2015-05-30 | |
117961 | Diltiazem | 2015-03-30 | 2015-03-31 | |
117961 | Amiodarone | 2015-03-31 | 2015-05-05 | |
117961 | Sennoside | 2015-04-01 | 2015-04-01 | |
117961 | Apixaban | 2015-04-05 | 2015-04-07 | |
117961 | Magnesium sulphate | 2015-03-31 | 2015-03-31 | |
117961 | TPA | 2015-03-26 | 2015-03-26 | |
117961 | acetaminophen | 2015-03-27 | Checked | . |
117961 | enoxaparin | 2015-03-31 | 2015-04-03 | |
117961 | Cetirizine | 2014-12-31 | Checked | . |
117961 | Diltiazem | 2015-06-10 | Checked | . |
117961 | acetylsalicyclic acid | 2015-03-31 | 2015-04-05 | |
117961 | perindopril | 2015-03-30 | Checked | . |
117961 | Indapamide | 2015-03-30 | Checked | . |
117961 | Gentamicin | 2015-05-08 | 2015-05-08 | |
117961 | 2% Xylocaine | 2015-05-08 | 2015-05-08 | |
117961 | Cephazolin | 2015-05-08 | 2015-05-08 | |
117961 | dalteparin | 2015-04-04 | 2015-04-04 | |
117961 | Docusate Sodium | 2015-04-01 | 2015-04-08 | |
117961 | Omnaris Nasal Spray | 2014-12-31 | 2015-03-26 | |
117961 | Amiodarone | 2015-03-31 | 2015-05-06 | |
117961 | Morphine | 2015-03-27 | 2015-05-28 | |
117961 | Perindopril/ indapamide | 2014-12-31 | Checked | . |
117961 | Zopiclone | 2014-12-31 | 2015-03-26 | |
117961 | acetaminophen | 2015-03-27 | 2015-05-28 | |
117961 | Dimenhydrinate | 2015-03-28 | 2015-03-28 | |
117961 | Baclofen | 2014-12-31 | 2015-03-26 | |
117961 | Amiodarone | 2015-03-28 | 2015-05-28 | |
117961 | Magnesium Sulphate | 2015-03-28 | 2015-03-28 | |
117961 | Potassium Chloride | 2015-03-28 | 2015-05-28 | |
118036 | Acetylsalicylic Acid | . | . |
sort by subject and cmtrt, then maybe "proc sort data=.... nodupkey dupout=XXX" see the dataset XXX for the duplicates. Or once sorted you can identify duplicates in a data step
Are there duplicated entries in this data?? Can you give a specific example?
I am trying to check/identify if there are duplicate medications taken by the same subject...
I know there is a way to do this like sort and compare, I am just wondering if there is a simple approach, since the list is huge...
proc sql;
create table want as
select * ,count(CMTRT) >1 as dup_flag
from have
group by subject, CMTRT;
quit;
untested
tested it, it works too
Thanks
zimcom
So the dates shown in the data set have no bearing on whether or not something is a duplicate? This was not stated in the original problem statement. Why show us information not related to the problem at hand?
sort by subject and cmtrt, then maybe "proc sort data=.... nodupkey dupout=XXX" see the dataset XXX for the duplicates. Or once sorted you can identify duplicates in a data step
You data sample gives no good idea about (a) what you mean by duplicates and (b) in which manner you want to search for them:
(a) For example, for subject 117961, neither Atorvastatin nor Cetirizine appear among its records more than once, and yet in your sample, they are checked as dupes.
(b) You don't say whether medication A in one record and medications A/B in another (such as Perindopril/ indapamide) are considered dupes. If they are, a program to check for them would be more involved than if they are not. This is because in this case, CMTRT cannot be relied upon as a key by the unduplication process since entries like A/B would have to be parsed into components first.
Besides, it's unclear whether your sample data represents your input or desired output looking like the input augmented with the variable CMONGO. Also, it looks as though your input isn't cleansed: For example, you have DiltiazeM in one record and DiltiazeN (which is likely a data entry typo) in the prior one. If you wanted your program to recognize such things as identical, it'd have to contain some sort of a fuzzy match routine, which seems to be is well beyond the scope of your question.
Generally, it would serve you (and those trying to help here) well if you presented you sample input and desired output unambiguously, tersely, and, as Paige has noted, with no extraneous information (such as the dates in your sample data).
Paul D.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.