I have a dataset with over 4M individuals categorized into 2 groups: CP1=0 and CP1=1. The data is the in the wide format (each row is a single individual). Each row (person) has anywhere from 0 to 80 medications; the variable name is Gnrc_Nm1 to Gnrc_Nm80, and the data for these variables are the "distinct" drug name in character format. I would like to do 2 things in order: First, I would like to identify the number of individuals per group (CP1) that has each of the distinct drug names. I imagine I would have to do some procedure that looks across Gnrc_Nm1 to Gnrc_Nm80 per person and then summarizes per group for each drug name. Each person has a distinct drug name, so no worries about duplicates per person (i.e., if someone had FLUOCINONIDE 3x, they will only have it once in Gnrc_Nmx). How would you recommend I go about this? Second, I would like to do a cluster analysis to identify the top 10 most common clusters for each group. I cannot figure out how to do this with the data in the wide format, the variables Gnrc_Nm1 to Gnrc_Nm80, and the data for these variables as character. Any idea on how to go about this? Below is an example of the data using proc print: Obs Patid CP1 _LABEL_ Gnrc_Nm1 Gnrc_Nm2 Gnrc_Nm3 Gnrc_Nm4 Gnrc_Nm5 Gnrc_Nm6 Gnrc_Nm7 Gnrc_Nm8 Gnrc_Nm9 Gnrc_Nm10 Gnrc_Nm11 Gnrc_Nm12 Gnrc_Nm13 Gnrc_Nm14 Gnrc_Nm15 Gnrc_Nm16 Gnrc_Nm17 Gnrc_Nm18 Gnrc_Nm19 Gnrc_Nm20 Gnrc_Nm21 Gnrc_Nm22 Gnrc_Nm23 Gnrc_Nm24 Gnrc_Nm25 Gnrc_Nm26 Gnrc_Nm27 Gnrc_Nm28 Gnrc_Nm29 Gnrc_Nm30 Gnrc_Nm31 Gnrc_Nm32 Gnrc_Nm33 Gnrc_Nm34 Gnrc_Nm35 Gnrc_Nm36 Gnrc_Nm37 Gnrc_Nm38 Gnrc_Nm39 Gnrc_Nm40 Gnrc_Nm41 Gnrc_Nm42 Gnrc_Nm43 Gnrc_Nm44 Gnrc_Nm45 Gnrc_Nm46 Gnrc_Nm47 Gnrc_Nm48 Gnrc_Nm49 Gnrc_Nm50 Gnrc_Nm51 Gnrc_Nm52 Gnrc_Nm53 Gnrc_Nm54 Gnrc_Nm55 Gnrc_Nm56 Gnrc_Nm57 Gnrc_Nm58 Gnrc_Nm59 Gnrc_Nm60 Gnrc_Nm61 Gnrc_Nm62 Gnrc_Nm63 Gnrc_Nm64 Gnrc_Nm65 Gnrc_Nm66 Gnrc_Nm67 Gnrc_Nm68 Gnrc_Nm69 Gnrc_Nm70 Gnrc_Nm71 Gnrc_Nm72 Gnrc_Nm73 Gnrc_Nm74 Gnrc_Nm75 Gnrc_Nm76 Gnrc_Nm77 Gnrc_Nm78 Gnrc_Nm79 Gnrc_Nm80 FUdate CP1 CP2 ID1 ID2 ASD1 ASD2 Epil1 Epil2 Yrdob sex Race USreg YRsd age12345678 1 0 Generic Name FLUOCINONIDE OMEPRAZOLE 2016-01-01 0 0 0 0 0 0 0 0 1960 F W 1 2016 56 2 0 Generic Name HYDROCORTISONE 2016-12-31 0 0 0 0 0 0 0 0 1957 M W 2 2016 59 3 1 Generic Name AMOXICILLIN/POTASSIUM CLAV LEVOTHYROXINE SODIUM 2015-01-01 0 0 0 0 0 0 0 0 1957 F H 1 2015 58 4 0 Generic Name ALBUTEROL SULFATE ALENDRONATE SODIUM AMLODIPINE/VALSARTAN AMOXICILLIN ATENOLOL ATORVASTATIN CALCIUM CITALOPRAM HYDROBROMIDE CLOPIDOGREL BISULFATE FENOFIBRATE NANOCRYSTALLIZED FENOFIBRIC ACID (CHOLINE) FLUTICASONE PROPIONATE FLUTICASONE/SALMETEROL FUROSEMIDE GABAPENTIN GLIPIZIDE LORAZEPAM MONTELUKAST SODIUM NEO/POLYMYX B SULF/DEXAMETH NIFEDIPINE PANTOPRAZOLE SODIUM PENTOXIFYLLINE ROFLUMILAST TIOTROPIUM BROMIDE ZOSTER VACCINE LIVE/PF 2014-01-01 0 0 0 0 0 0 0 0 1938 F W 1 2014 76 5 1 Generic Name AMOXICILLIN/POTASSIUM CLAV TOBRAMYCIN 2015-05-01 0 0 0 0 0 0 0 0 1969 M U 3 2015 46 6 1 Generic Name BROMFENAC SODIUM CIPROFLOXACIN HCL LATANOPROST TIMOLOL MALEATE TRIAMCINOLONE ACETONIDE 2014-01-01 0 0 0 0 0 0 0 0 1928 M H 1 2014 86 7 0 Generic Name AZITHROMYCIN 2016-11-30 0 0 0 0 0 0 0 0 1978 F H 3 2016 38 8 0 Generic Name ADAPALENE/BENZOYL PEROXIDE CLINDAMYCIN PHOSPHATE DEXTROAMPHETAMINE/AMPHETAMINE L-NORGEST-ETH ESTR/ETHIN ESTRA 2015-07-01 0 0 0 0 0 0 0 0 1985 F W 4 2015 30
... View more