DATA Step, Macro, Functions and more

Creating a new categirical variable from a string variable

Reply
Occasional Contributor
Posts: 8

Creating a new categirical variable from a string variable

Hi all,

I am trying to create a new categirical variable (Chain_Pharmacy=1/0 for Yes/ No) based on a string variable call Pharmacy_Name.

This variable Pharmacy_Name has several pharmacies, like Wal-Mart, Walgreens, Costco, CVS, etc. I have > 800,000 records in the datasheet . I need to define Chain_Pharmacy if any string under Pharmacy_Name variable occurs more than 10 times. The issue with this variable is I have >800,000 records to search for and I can only Identify few pharmacies such as Wal-Mart, Walgreens, Costco, CVS, Rite-Aid, etc, but the databse will have many other chain pharmacies and mom and pop stores. I need to identify chains which have more than 10 pharmacies with same name (irrespective of what that chain may be) versus mom and pop/ individual pharmacy stores.

 

Can I do this by count? Is there any better way other than count for creating my new variable?

Thanks!

 

See below:

CARRS PHARMACY 1739

CARRS PHARMACY 1805

CARRS PHARMACY 1812

FRED MEYER PHARMACY #701485

FRED MEYER PHARMACY #701668

FRED MEYER PHARMACY #701671

FRED MEYER PHARMACY 701011

FRED MEYER PHARMACY 701018

FRED MEYER PHARMACY 701071

FRED MEYER PHARMACY 701224

FRED MEYER PHARMACY 701653

FRED MEYER PHARMACY 701656

FRED MEYER PHARMACY 701671

GENOA HEALTHCARE #146

NORTH POLE PRESCRIPTION LABORATORY

PRESCRIPTION CENTER

SAFEWAY PHARMACY #3410

SAFEWAY PHARMACY 0548

SAFEWAY PHARMACY 1821

SAFEWAY PHARMACY 2728

SAFEWAY PHARMACY 2754

SAFEWAY PHARMACY 3410

SOLDOTNA PROFESSIONAL PHARMACY

TARGET PHARMACY #2371

TARGET PHARMACY 2339

ULMER DRUG AND HARDWARE

WAL-MART PHARMACY 10-2070 #102070

WAL-MART PHARMACY 10-2070 102070

WAL-MART PHARMACY 10-2071 #102071

WAL-MART PHARMACY 10-2071 102071

WAL-MART PHARMACY 10-2074 102074

PROC Star
Posts: 252

Re: Creating a new categirical variable from a string variable

[ Edited ]

may be something like this

proc sql;
select * from yourtable
group by prxchange('s/(\d+|#).*$//i',-1, pharmacy)
having count(prxchange('s/(\d+|#).*$//i',-1, pharmacy))>10;

 to check what multiple store have, along with its parent name

proc sql;

select *, prxchange('s/(\d+|#).*$//i',-1, pharmacy) as parentcompanyname from have

group by parentcompanyname

having count(parentcompanyname)>10;

 

or to just companies  with parent name

 

proc sql;

select prxchange('s/(\d+|#).*$//i',-1, pharmacy) as parentcompanyname from ac

group by parentcompanyname

having count(parentcompanyname)>10;

Ask a Question
Discussion stats
  • 1 reply
  • 113 views
  • 1 like
  • 2 in conversation