Hi all,
I am trying to create a new categirical variable (Chain_Pharmacy=1/0 for Yes/ No) based on a string variable call Pharmacy_Name.
This variable Pharmacy_Name has several pharmacies, like Wal-Mart, Walgreens, Costco, CVS, etc. I have > 800,000 records in the datasheet . I need to define Chain_Pharmacy if any string under Pharmacy_Name variable occurs more than 10 times. The issue with this variable is I have >800,000 records to search for and I can only Identify few pharmacies such as Wal-Mart, Walgreens, Costco, CVS, Rite-Aid, etc, but the databse will have many other chain pharmacies and mom and pop stores. I need to identify chains which have more than 10 pharmacies with same name (irrespective of what that chain may be) versus mom and pop/ individual pharmacy stores.
Can I do this by count? Is there any better way other than count for creating my new variable?
Thanks!
See below:
CARRS PHARMACY 1739 |
CARRS PHARMACY 1805 |
CARRS PHARMACY 1812 |
FRED MEYER PHARMACY #701485 |
FRED MEYER PHARMACY #701668 |
FRED MEYER PHARMACY #701671 |
FRED MEYER PHARMACY 701011 |
FRED MEYER PHARMACY 701018 |
FRED MEYER PHARMACY 701071 |
FRED MEYER PHARMACY 701224 |
FRED MEYER PHARMACY 701653 |
FRED MEYER PHARMACY 701656 |
FRED MEYER PHARMACY 701671 |
GENOA HEALTHCARE #146 |
NORTH POLE PRESCRIPTION LABORATORY |
PRESCRIPTION CENTER |
SAFEWAY PHARMACY #3410 |
SAFEWAY PHARMACY 0548 |
SAFEWAY PHARMACY 1821 |
SAFEWAY PHARMACY 2728 |
SAFEWAY PHARMACY 2754 |
SAFEWAY PHARMACY 3410 |
SOLDOTNA PROFESSIONAL PHARMACY |
TARGET PHARMACY #2371 |
TARGET PHARMACY 2339 |
ULMER DRUG AND HARDWARE |
WAL-MART PHARMACY 10-2070 #102070 |
WAL-MART PHARMACY 10-2070 102070 |
WAL-MART PHARMACY 10-2071 #102071 |
WAL-MART PHARMACY 10-2071 102071 |
WAL-MART PHARMACY 10-2074 102074 |
may be something like this
proc sql; select * from yourtable group by prxchange('s/(\d+|#).*$//i',-1, pharmacy) having count(prxchange('s/(\d+|#).*$//i',-1, pharmacy))>10;
to check what multiple store have, along with its parent name
proc sql;
select *, prxchange('s/(\d+|#).*$//i',-1, pharmacy) as parentcompanyname from have
group by parentcompanyname
having count(parentcompanyname)>10;
or to just companies with parent name
proc sql;
select prxchange('s/(\d+|#).*$//i',-1, pharmacy) as parentcompanyname from ac
group by parentcompanyname
having count(parentcompanyname)>10;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.