Help using Base SAS procedures

Market share based on zipcodes

Accepted Solution Solved
Reply
Regular Contributor
Posts: 162
Accepted Solution

Market share based on zipcodes

Dear all,

I would like to calculate the market share of a firm (tic) based on geographic area (distance based on zipcodes)  for a given industry (sic3) and financial year (fyear).

Market share = sales of the firm / total sales for all the firms in the same industry (sic3) in a year (fyear) within 60 miles radius.

Let says, firm A, B and C are located in the same geographic area (ie within 60 miles radius). Only firm A and B are in the same industry (SIC3=123) in fyear = 2003;

then market share for A in year 2003=  SALE of firm A/sum(SALE of firm A, SALE of firm B) .

I attached a SAS file with the all US firms from years 2003 to 2010:

tic = identification code for firm

fyear = financial year

SALE = sales value for each firm

SIC3 = industry classification code

zip = zipcode of a firm (for the calculation of distance)

The following procedures should be relevant:

1. Identify firms that:

- in the same SIC3

- in the same FYEAR

- within 60 miles radius  (if .<zipcitydistance(_zip , zip) <=60) from a firm-year

2. calculate total sales of the firms identified

3. Calculate market share of the firm = salei / Total sale

Note: I wish to use hash object to calculate the market share as it involves self-matching of firms. It will consume very much computer resources if proc sql to be used.

Any help is much appreciated.

Thank you.

Regards,

mspak

Attachment

Accepted Solutions
Solution
‎11-16-2012 08:14 AM
Super User
Posts: 10,018

Re: Market share based on zipcodes

Your task is really not easy,I would split it into lots of small tables and process them one by one .

libname x v9 'c:\temp' ;

proc sort data=x.Geo_shr out=zip(keep= fyear sic3 zip) nodupkey;by  fyear sic3 zip ;run;

data _null_;

if _n_ eq 1 then do;

  if 0 then set zip;

  declare hash h(multidata:'Y') ;

   h.definekey('fyear','sic3');

   h.definedata('fyear','sic3','zip');

   h.definedone();

end;

do until(last.sic3);

set zip;

by fyear sic3;

h.add();

end;

h.output(dataset:cats('new',_n_));

h.clear();

run;

data _null_;

set sashelp.vmember(where=(libname='WORK' and lowcase(memname) like 'new%')) end=last;

if _n_ eq 1 then call execute('proc sql;');

call execute('create table _'||strip(memname)||' as

                select a.*,b.zip as _zip

                  from '||memname||' as a,'||memname||' as b

                   where  .< zipcitydistance(a.zip , b.zip) <=60;');

if last then call execute('quit;');

run;

data distance;

set _: ;

run;

proc sql;

create table sum as

select fyear,sic3,zip,sum(sale) as sale_sum

  from x.Geo_shr

   group by fyear,sic3,zip;

quit;

data distance;

if _n_ eq 1 then do;

if 0 then set sum(rename=(zip=_zip));

  declare hash h(dataset:'sum(rename=(zip=_zip))');

   h.definekey('fyear','sic3','_zip');

   h.definedata('sale_sum');

   h.definedone();

end;

set distance;

h.find();

run;

proc sql;

create table total_sale as

select fyear,sic3,zip,sum(sale_sum) as total_sale

  from distance

   group by  fyear,sic3,zip;

quit;

data want;

  if _n_ eq 1 then do;

if 0 then set total_sale;

  declare hash h(dataset:'total_sale');

   h.definekey('fyear','sic3','zip');

   h.definedata('total_sale');

   h.definedone();

end;

set x.Geo_shr;

call missing(total_sale);

rc=h.find();

percent=divide(sale,total_sale) ;

drop rc;

run;

KSharp

View solution in original post


All Replies
Solution
‎11-16-2012 08:14 AM
Super User
Posts: 10,018

Re: Market share based on zipcodes

Your task is really not easy,I would split it into lots of small tables and process them one by one .

libname x v9 'c:\temp' ;

proc sort data=x.Geo_shr out=zip(keep= fyear sic3 zip) nodupkey;by  fyear sic3 zip ;run;

data _null_;

if _n_ eq 1 then do;

  if 0 then set zip;

  declare hash h(multidata:'Y') ;

   h.definekey('fyear','sic3');

   h.definedata('fyear','sic3','zip');

   h.definedone();

end;

do until(last.sic3);

set zip;

by fyear sic3;

h.add();

end;

h.output(dataset:cats('new',_n_));

h.clear();

run;

data _null_;

set sashelp.vmember(where=(libname='WORK' and lowcase(memname) like 'new%')) end=last;

if _n_ eq 1 then call execute('proc sql;');

call execute('create table _'||strip(memname)||' as

                select a.*,b.zip as _zip

                  from '||memname||' as a,'||memname||' as b

                   where  .< zipcitydistance(a.zip , b.zip) <=60;');

if last then call execute('quit;');

run;

data distance;

set _: ;

run;

proc sql;

create table sum as

select fyear,sic3,zip,sum(sale) as sale_sum

  from x.Geo_shr

   group by fyear,sic3,zip;

quit;

data distance;

if _n_ eq 1 then do;

if 0 then set sum(rename=(zip=_zip));

  declare hash h(dataset:'sum(rename=(zip=_zip))');

   h.definekey('fyear','sic3','_zip');

   h.definedata('sale_sum');

   h.definedone();

end;

set distance;

h.find();

run;

proc sql;

create table total_sale as

select fyear,sic3,zip,sum(sale_sum) as total_sale

  from distance

   group by  fyear,sic3,zip;

quit;

data want;

  if _n_ eq 1 then do;

if 0 then set total_sale;

  declare hash h(dataset:'total_sale');

   h.definekey('fyear','sic3','zip');

   h.definedata('total_sale');

   h.definedone();

end;

set x.Geo_shr;

call missing(total_sale);

rc=h.find();

percent=divide(sale,total_sale) ;

drop rc;

run;

KSharp

Regular Contributor
Posts: 162

Re: Market share based on zipcodes

Dear Ksharp,

Thank you for your program.

I ran the program, and obtained the output - want (as attached).

I notice that there are missing "total_sale". This is impossible.

The sales value of the firm itself should be included in the calculation of total sales. Even if there is the only one firm (that is - the firm itself) is located within 60 miles radius, the total sales should be same with the sales of that firm.

Don't you think that I should sum up the  "total_sale" and "sale"? Should I calculate the market share as: sale/sum(total_sale, sale)?

I feel that the total_sale that you calculated is the total sales of firms within 60 miles radius from the firm (excluding the firm itself).

Another question:

You wrote a command as follows:

if _n_ eq 1 then call execute('proc sql;');

Does this mean that proc sql is run in the memory? I never think that I can do that!!!

Thank you.

Regards,

Mei Sen

Attachment
Super User
Posts: 10,018

Re: Market share based on zipcodes

"I notice that there are missing "total_sale". This is impossible."

Because the sale of your dataset has lots of missing value. My total_sale means for the same fyear and same sic3 , sum of sale of those company in 60 miles which base on zip. I notice for the same zip,there are lots of company,it is right ? and we calculated distance is base on zip ,that means all of company in the same zip is a single point, and we use this point to calculated distance ,it is right ?

"The sales value of the firm itself should be included in the calculation of total sales."

Yes. I did it so.

"Even if there is the only one firm (that is - the firm itself) is located within 60 miles radius, the total sales should be same with the sales of that firm."

But we use this company's zip to calculate distance ,and there are several different company in the same zip. is it true ?

"Don't you think that I should sum up the  "total_sale" and "sale"? Should I calculate the market share as: sale/sum(total_sale, sale)?"

NO. I have already include sale into total_sale .

"I feel that the total_sale that you calculated is the total sales of firms within 60 miles radius from the firm (excluding the firm itself)."

NO.I included it  Don't forget I use a zip to get distance,and a zip involved lots of different company.

"Does this mean that proc sql is run in the memory? I never think that I can do that!!!"

That is the same way with generating a macro, I just use call execute replace a macro .

Ksharp

消息编辑者为:xia keshan

PROC Star
Posts: 7,467

Re: Market share based on zipcodes

I agree with Ksharp .. you have a number (specifically 330) of zip codes that don't match any that are included in the zipcitydistance function.  You can identify them by running:

proc sort data=x.Geo_shr (keep=zip) out=test nodupkey;

  by zip;

run;

data test (keep=zip);

  set test;

  test=zipcitydistance(zip , '15217');

  if missing(test);

run;

You can probably quickly fix that by replacing those zip codes with neighboring zipcodes by doing Google searches for two strings, namely 'zipcode' and the offending 5 digit zipcode.

Regular Contributor
Posts: 162

Re: Market share based on zipcodes

Hi,

Very much appreciated your helps.

In the output file 'want'. Some of the observations with SALE figures but missing TOTAL_SALE figures. This is the reason that I think the TOTAL_SALE does not include SALE for the firm itself. For example, there are 10 other firms located within 60 miles radius from the firm A. Then the TOTAL_SALE should be based on SALE of 11 firms (10 other firms plus the firm A itself).

If there is NO any match based on 60 miles radius, same SIC3 and same FYEAR, then TOTAL_SALE = SALE of firm A.  Missing TOTAL_SALE is possible only when there is no other firm within 60 miles radius from that firm (ie. firm A) and that firm (ie. Firm A)'s SALE is missing.

We should use zip to calculate the distance. Firms in the same zip considered as the same point. But firms in the same zip should be considered as firms within 60 miles radius (ie. zero mile radius) as well. Let say, There are 5 other firms with the same zip, same sic3 and same fyear as per the firm (firm A) and 3 other firms with different zip, same sic3 and same fyear but within 60 miles radius, then: TOTAL_SALE = SALE of 9 firms (5+3+1).

The following code is also new to me. I’ve never know that I can use sashelp files as my input files!!!

data _null_;

set sashelp.vmember(where=(libname='WORK' and lowcase(memname) like 'new%')) end=last;

if _n_ eq 1 then call execute('proc sql;');

call execute('create table _'||strip(memname)||' as

               select a.*,b.zip as _zip

from '||memname||' as a,'||memname||' as b

                   where  .< zipcitydistance(a.zip , b.zip) <=60;');

if last then call execute('quit;');

run;


Even though hash object is an efficient method, but I feel a bit difficult to code it.

Thank you.

Regards,

mspak

PROC Star
Posts: 7,467

Re: Market share based on zipcodes

If you aren't comfortable with the hash object than I would suggest not using it until you are sufficiently comfortable with using it.

However, that said, you have at least one other problem to deal with in addition to the unusable zip code problem.  Some of your records have missing sales values for all firms within a particular grouping.

E.g., try the following and take a look at the records that result at the very beginning of the resulting file, 'want'.

Before anyone spends time trying to suggest a non-hash approach, I would suggest reposting the data file, but with the replacement zip codes and instructions on how you want to deal with missing sales for a particular grouping:

libname x v9 'd:\art' ;

proc sort data=limited out=zip(keep= fyear sic3 zip) nodupkey;

  by  fyear sic3 zip ;

run;

data _null_;

if _n_ eq 1 then do;

  if 0 then set zip;

  declare hash h(multidata:'Y') ;

   h.definekey('fyear','sic3');

   h.definedata('fyear','sic3','zip');

   h.definedone();

end;

do until(last.sic3);

set zip;

by fyear sic3;

h.add();

end;

h.output(dataset:cats('new',_n_));

h.clear();

run;

data _null_;

set sashelp.vmember(where=(libname='WORK' and lowcase(memname) like 'new%')) end=last;

if _n_ eq 1 then call execute('proc sql;');

call execute('create table _'||strip(memname)||' as

                select a.*,b.zip as _zip

                  from '||memname||' as a,'||memname||' as b

                   where  .< zipcitydistance(a.zip , b.zip) <=60;');

if last then call execute('quit;');

run;

data distance;

set _: ;

run;

proc sql;

create table sum as

select fyear,sic3,zip,sum(sale) as sale_sum

  from limited /*x.Geo_shr*/

   group by fyear,sic3,zip;

quit;

data distance;

if _n_ eq 1 then do;

if 0 then set sum(rename=(zip=_zip));

  declare hash h(dataset:'sum(rename=(zip=_zip))');

   h.definekey('fyear','sic3','_zip');

   h.definedata('sale_sum');

   h.definedone();

end;

set distance;

h.find();

run;

proc sql;

create table total_sale as

select fyear,sic3,zip,sum(sale_sum) as total_sale

  from distance

   group by  fyear,sic3,zip;

quit;

data want;

  if _n_ eq 1 then do;

if 0 then set total_sale;

  declare hash h(dataset:'total_sale');

   h.definekey('fyear','sic3','zip');

   h.definedata('total_sale');

   h.definedone();

end;

/*set x.Geo_shr;*/

set limited;

call missing(total_sale);

rc=h.find();

percent=divide(sale,total_sale) ;

drop rc;

run;

proc sort data=want;

  by total_sale;

run;

Super Contributor
Posts: 1,636

Re: Market share based on zipcodes

Hi Art,

I am curious. How long it took you to come up the solution?

PROC Star
Posts: 7,467

Re: Market share based on zipcodes

Linlin,  That is Ksharp's proposed code.  I only modified the input data to identify an example of the locus of the problems that the OP was questioning.

Regular Contributor
Posts: 216

Re: Market share based on zipcodes

Hi mspak,

Try this solution

libname comm "C:\projects\SAS_Community"; *Location containing the Sample Geo_shr data set;

/* Macro to calculate the Market share for a specific Year/Industry classification code/radius combination */

%macro getMarketShare(p_inDsName= ,p_year=, p_SIC3=, p_radius=60, p_outDsName=);

    %local l_count

            l_zips

            l_maxFreq

            l_maxFreq2;

    /* ------------------------------------------------------- */

    /* Get unique list of Firms Zip Codes for the              */

    /* specified Year/industry classification code combination */

    /* ------------------------------------------------------- */

    proc sort data=&p_inDsName out=work.zips(keep=zip) nodupkey;

        BY    zip;

        where (zipcity(zip) is not null

        %if(%superq(p_year) NE ) %then

        %do;

            AND fyear EQ &p_year

        %end;

        %if(%superq(p_SIC3) NE ) %then

        %do;

            AND SIC3 EQ "&p_SIC3"

        %end;

            );

    run;

    /* ------------------------------------------------ */

    /* Workout the available Zip City Distance clusters */

    /* ------------------------------------------------ */

    Proc sql noprint;

        Select quote(put(zip,z5.))

        into :l_zips separated by ' '

        from work.zips

        ;

    quit;

    %let l_count=&sqlobs;

    %put l_count=&l_count;

    %put l_zips = &l_zips;

    data work.zips(keep=pointA pointB distance);

        Set work.zips;

        LENGTH pointA pointB $5;

        array zips [&l_count] $5 (&l_zips);

        pointA = put(zip,z5.);

        do i=1 to dim(zips);

            pointB= zips;

            distance = zipcitydistance(pointA, pointB);

            /* Ignore zero distances */

            if &p_radius >= distance > 0 then

                output;

        end;

    run;

    /* ------------------------------------------ */

    /* Declare internal macro for data processing */

    /* ------------------------------------------ */

    %MACRO shrinkCombos(

      p_inDsName=        /* Input Source Data Source */

    , p_outDsName=        /* Output Data Set */

    , p_byClause=        /* Classification Variables Set */

    , p_classVar=        /* Class Variable to Transpose */

    , p_classVar2=        /* Second Class Variable to Transpose (Optional) */

    , p_prntClassVar=    /* Preceding Class Variable */

    , p_aggrVarLen=10    /* Variable length of the new transposed values */

    );

        /* Sort */

        PROC SORT DATA=&p_inDsName;

            BY    &p_byClause;

        RUN;

        /* Process the data and generate the transposed value */

        DATA &p_outDsName(DROP=&p_classVar RENAME=(aggr=&p_classVar));

            SET &p_inDsName;

            BY &p_byClause;

            LENGTH aggr $&p_aggrVarLen ;

            RETAIN aggr ;

            IF (FIRST.&p_prntClassVar) THEN

                aggr='';

            %if (%superq(p_classVar2) NE ) %then

            %do;

            if (INDEX(STRIP(aggr), STRIP(&p_classVar2)) = 0) then

                aggr = catx(' ',aggr,&p_classVar2);

            if (INDEX(STRIP(aggr), STRIP(&p_classVar)) = 0) then

                aggr = catx(' ',aggr,&p_classVar);

            DROP &p_classVar2;

            %end;

            %else

            %do;

            aggr = catx(' ',aggr,&p_classVar);

            %end;

            IF (LAST.&p_prntClassVar) THEN

                OUTPUT;

        RUN;

    %MEND shrinkCombos;

    /* Get record count for calculating variable length */

    proc sql noprint;

        select max(_freq_)

        into :l_maxFreq

        from (    select distance ,pointA ,count(pointB) as _freq_

                from work.zips

                Group by distance ,pointA)

        ;

    quit;

    %put l_maxFreq=&l_maxFreq;

    /* Work out Zips Connection Points */

    %shrinkCombos( p_inDsName=work.zips, p_outDsName=work.zips_connctions

    , p_byClause=%str(distance pointA pointB), p_classVar=pointB, p_prntClassVar=pointA, p_classVar2=, p_aggrVarLen=%eval(&l_maxFreq*6));

    /* Get record count for calculating variable length */

    proc sql noprint;

        select max(_freq_)

        into :l_maxFreq2

        from (    select distance ,count(pointA) as _freq_

                from work.zips_connctions

                Group by distance)

        ;

    quit;

    %put l_maxFreq2=&l_maxFreq2;

    /* Work out the Distance Clusters and their contributing members */

    %shrinkCombos( p_inDsName=work.zips_connctions, p_outDsName=work.dist_clusters

    , p_byClause=%str(distance pointA), p_classVar=pointA, p_prntClassVar=distance, p_classVar2=pointB, p_aggrVarLen=%eval((&l_maxFreq*6) + 1 + (&l_maxFreq2*6)));

    proc sql;

        create table work.combined AS

        select    a.*

                , b.distance

                , b.pointA AS cluster

        from    &p_inDsName a, work.dist_clusters b

        where    INDEX(STRIP(b.pointA), put(a.zip,z5.)) GT 0

        %if(%superq(p_year) NE ) %then

        %do;

            AND fyear EQ &p_year

        %end;

        %if(%superq(p_SIC3) NE ) %then

        %do;

            AND SIC3 EQ "&p_SIC3"

        %end;

        ORDER BY b.distance, a.zip;

    quit;

    data &p_outDsName;

        total=0;

        do until (last.distance);

            SET    work.combined;

            BY    distance zip;

            total+sale;

        end;

        do until (last.distance);

            SET    work.combined;

            BY    distance zip;

            ratio = (sale/total);

            output;

        end;

        format ratio percent8.3;

    run;

    proc datasets lib=work nolist;

        delete zips zips_connctions dist_clusters combined;

    run; quit;

%mend getMarketShare;

options mprint;

/* Usage Example */

%getMarketShare(p_inDsName=comm.geo_shr ,p_year=2009, p_SIC3=641, p_radius=250, p_outDsName=work.marketShares);

Regular Contributor
Posts: 162

Re: Market share based on zipcodes

Posted in reply to AhmedAl_Attar

Dear ,

In order to execute the macro suggested, I think I need to repeat the steps for each combination of year and SIC3, Is it correct?

Thank you.

Regards,

mspak

Regular Contributor
Posts: 216

Re: Market share based on zipcodes

Yes, The macro can work that way, or you can do all the SIC3s for a particular Year.

When I tried to do all the combinations of Year/SIC3/ZIP , I got 14M+ observations!!

It's somewhat easy to construct a program that would derive all the combinations and write out the macro calls into a file and execute it.

Regards,

Ahmed 

Regular Contributor
Posts: 162

Re: Market share based on zipcodes

Posted in reply to AhmedAl_Attar

Hi Ahmed,

How to execute it?

%getMarketShare(p_inDsName=comm.geo_shr ,p_year=2009, p_SIC3=641, p_radius=250, p_outDsName=work.marketShares);


The above function was suggested. What does this mean? Input data=geo_shr, fyear=2009, sic3={all 641 categories of industry} and p_radius={within 250 miles radius}, output data=marketshares. Is this correct?

Thank you.

mspak

Regular Contributor
Posts: 216

Re: Market share based on zipcodes

Hi mspak,

Please find included a revised solution. This new revised solution will perform Market/Sales Share calculations on two levels

1. Ratio/Percent at the Industry Segment for a given fyear and radius

2. Ratio/Percent at the different distances amongst the zips within the given Industry Segment/Fyear/Radius combination (Drill Down on Radius to see how the Zips/companys are connected to each other across the different distance bins/clusters)

%macro shrinkCombos(

  p_inDsName=        /* Input Source Data Source */

, p_outDsName=        /* Output Data Set */

, p_byClause=        /* Classification Variables Set */

, p_classVar=        /* Class Variable to Transpose */

, p_classVar2=        /* Second Class Variable to Transpose (Optional) */

, p_prntClassVar=    /* Preceding Class Variable */

, p_aggrVarLen=10    /* Variable length of the new transposed values */

);

    /* Sort */

    PROC SORT DATA=&p_inDsName;

        BY    &p_byClause;

    RUN;

    /* Process the data and generate the transposed value */

    DATA &p_outDsName(DROP=&p_classVar RENAME=(aggr=&p_classVar));

        SET &p_inDsName;

        BY &p_byClause;

        LENGTH aggr $&p_aggrVarLen ;

        RETAIN aggr ;

        IF (FIRST.&p_prntClassVar) THEN

            aggr='';

        %if (%superq(p_classVar2) NE ) %then

        %do;

        if (INDEX(STRIP(aggr), STRIP(&p_classVar2)) = 0) then

            aggr = catx(' ',aggr,&p_classVar2);

        if (INDEX(STRIP(aggr), STRIP(&p_classVar)) = 0) then

            aggr = catx(' ',aggr,&p_classVar);

        DROP &p_classVar2;

        %end;

        %else

        %do;

        aggr = catx(' ',aggr,&p_classVar);

        %end;

        IF (LAST.&p_prntClassVar) THEN

            OUTPUT;

    RUN;

%mend shrinkCombos;

%macro getMarketShareByDistance(p_inDsName= ,p_year=, p_SIC3=, p_radius=60, p_outDsName=);

    %local l_count

            l_zips

            l_maxFreq

            l_maxFreq2;

    /* ------------------------------------------------------- */

    /* Get unique list of Firms Zip Codes for the              */

    /* specified Year/industry classification code combination */

    /* ------------------------------------------------------- */

    proc sort data=&p_inDsName out=work.zips(keep=zip) nodupkey;

        BY    zip;

        where (zipcity(zip) is not null

        %if(%superq(p_year) NE ) %then

        %do;

            AND fyear EQ &p_year

        %end;

        %if(%superq(p_SIC3) NE ) %then

        %do;

            AND SIC3 EQ "&p_SIC3"

        %end;

            );

    run;

    /* ------------------------------------------------ */

    /* Workout the available Zip City Distance clusters */

    /* ------------------------------------------------ */

    Proc sql noprint;

        Select quote(put(zip,z5.))

        into :l_zips separated by ' '

        from work.zips

        ;

    quit;

    %let l_count=&sqlobs;

    /*

    %put l_count=&l_count;

    %put l_zips = &l_zips;

    */

    data work.zips(keep=pointA pointB distance);

        Set work.zips;

        LENGTH pointA pointB $5;

        array zips [&l_count] $5 (&l_zips);

        pointA = put(zip,z5.);

        do i=1 to dim(zips);

            pointB= zips;

            distance = zipcitydistance(pointA, pointB);

            /* Ignore zero distances */

            if &p_radius >= distance > 0 then

                output;

        end;

    run;

    /* Get record count for calculating variable length */

    proc sql noprint;

        select max(_freq_)

        into :l_maxFreq

        from (    select distance ,pointA ,count(pointB) as _freq_

                from work.zips

                Group by distance ,pointA)

        ;

    quit;

    /*%put l_maxFreq=&l_maxFreq;*/

    /* Work out Zips Connection Points */

    %shrinkCombos( p_inDsName=work.zips, p_outDsName=work.zips_connctions

    , p_byClause=%str(distance pointA pointB), p_classVar=pointB, p_prntClassVar=pointA, p_classVar2=, p_aggrVarLen=%eval(&l_maxFreq*6));

    /* Get record count for calculating variable length */

    proc sql noprint;

        select max(_freq_)

        into :l_maxFreq2

        from (    select distance ,count(pointA) as _freq_

                from work.zips_connctions

                Group by distance)

        ;

    quit;

    /*%put l_maxFreq2=&l_maxFreq2;*/

    /* Work out the Distance Clusters and their contributing members */

    %shrinkCombos( p_inDsName=work.zips_connctions, p_outDsName=work.dist_clusters

    , p_byClause=%str(distance pointA), p_classVar=pointA, p_prntClassVar=distance, p_classVar2=pointB, p_aggrVarLen=%eval((&l_maxFreq*6) + 1 + (&l_maxFreq2*6)));

    proc sql;

        create table work.combined AS

        select    a.*

                , b.distance

                , b.pointA AS cluster

        from    &p_inDsName a, work.dist_clusters b

        where    INDEX(STRIP(b.pointA), put(a.zip,z5.)) GT 0

        %if(%superq(p_year) NE ) %then

        %do;

            AND fyear EQ &p_year

        %end;

        %if(%superq(p_SIC3) NE ) %then

        %do;

            AND SIC3 EQ "&p_SIC3"

        %end;

        ORDER BY b.distance, a.zip;

    quit;

    data &p_outDsName;

        total=0;

        do until (last.distance);

            SET    work.combined;

            BY    distance zip;

            total+sale;

        end;

        do until (last.distance);

            SET    work.combined;

            BY    distance zip;

            /* Avoid Division by Zero*/

            IF (total > 0) then ratio = sale/total;

            else ratio = 0;

            output;

        end;

        format ratio percent8.3;

    run;

    /* Clean up */

    proc datasets lib=work nolist;

        delete zips zips_connctions dist_clusters combined;

    run; quit;

%mend getMarketShareByDistance;

%macro getMarketShareByRadius(

  p_inDsName=

, p_year=

, p_sic3=

, p_radius=

, p_outRadiusSharesDsName=

, p_outDistanceSharesDsName=

);

    %local    l_sTime

            l_eTime

            l_rTime;

    /** Record Starting Time **/

    %let l_sTime = %sysfunc(time());

    /* --------------------------------- */

    /* Get a list of unique combinations */

    /* of Fyear/Industry Section         */

    /* --------------------------------- */

    proc summary data=&p_inDsName nway;

        class    fyear sic3;

        where (zipcity(zip) is not null

        %if(%superq(p_year) NE ) %then

        %do;

            AND fyear EQ &p_year

        %end;

        %if(%superq(p_SIC3) NE ) %then

        %do;

            AND SIC3 EQ "&p_SIC3"

        %end;

            );

        output out=work.combos(drop=_Smiley Happy;

    run;

    FILENAME codes TEMP;

    DATA _NULL_;

        FILE codes;

        SET work.combos end=eof;

        put '%getMarketShareByDistance(p_inDsName='"&p_inDsName ,p_year=" fyear +(-1)", p_SIC3=" sic3 +(-1)", p_radius=&p_radius, p_outDsName=work._"_n_ +(-1)");";

        put "PROC APPEND BASE=&p_outDistanceSharesDsName DATA=WORK._" _n_ +(-1)" FORCE; RUN;";

        PUT "PROC DATASETS LIB=WORK NOLIST; DELETE _:; RUN; QUIT;";

        if (eof) then

        do;

            put ;

            put "proc sql;";

            put    +3 "create table &p_outRadiusSharesDsName AS";

            put +3 "select distinct fyear, sic3, zip, tic, sale";

            put +3 "from &p_outDistanceSharesDsName";

            put +3 "group by fyear, sic3, zip, tic";

            put +3 "order by fyear, sic3, zip, tic;";

            put "quit;";

            put ;

            put "data &p_outRadiusSharesDsName;";

            put +3 "total=0;";

            put +3 "radius=&p_radius;";

            put +3 "do until (last.sic3);";

            put +6 "SET &p_outRadiusSharesDsName;";

            put +6 "BY    fyear sic3;";

            put +6 "total+sale;";

            put +3 "end;";

            put ;

            put +3 "do until (last.sic3);";

            put +6 "SET &p_outRadiusSharesDsName;";

            put +6 "BY    fyear sic3;";

            put +6 "IF (total > 0) then ratio = sale/total;";

            put +6 "else ratio = 0;";

            put +6 "output;";

            put +3 "end;";

            put +3 "format ratio percent8.2;";

            put "run;";

        end;

    RUN;

    %include codes;

    /** Record Finish Time **/

    %let l_eTime = %sysfunc(time());

    %if (%superq(l_sTime) NE ) %then

    %do;

        /* Calculate Run Time and display it */

        %let l_rTime = %sysfunc(putn(%sysevalf(&l_eTime - &l_sTime),time12.2));

        %put >>> getMarketShareByRadius Smiley Embarassed>> Total RunTime = &l_rTime;

    %end;

%mend getMarketShareByRadius;

/* ----------------------- */

/* Usage Example */

/* ----------------------- */

libname comm "C:\projects\SAS_Community";

%getMarketShareByRadius(

  p_inDsName=comm.geo_shr  /* Input Data */

, p_year=                                /* Fyear value */

, p_sic3=                                /* Industry Segment */

, p_radius=60                          /* Within 60 miles radius */

, p_outRadiusSharesDsName=work.marketShares /* Output Market Shares within 60 miles Radius Data set */

, p_outDistanceSharesDsName=work.marketSharesByDistance /* Output Market Shares for the various distance bins within 60 miles radius Data Set */

);

Just another way of analyzing the data.

Regards,

Ahmed

PROC Star
Posts: 7,467

Re: Market share based on zipcodes

: Here is a modified version of Ksharp's code, but without using the hash method, initially filtering the data to remove zip codes that aren't captured by the zipcitydistance function, and not creating all of those subfiles.  I get the same results as Ksharp did, thus you still have to validate whether the code is doing what you want/need.  The good news is that it runs faster than Ksharp's code:

libname x v9 'd:\art' ;

data without_bad_zipcodes (drop=distance);

  set x.Geo_shr;

  distance=zipcitydistance(zip , '00646');

  if not missing(distance);

run;

proc sort data=without_bad_zipcodes;

  by  fyear sic3 zip ;

run;

proc sort data=without_bad_zipcodes out=zip(keep= fyear sic3 zip) nodupkey;

  by  fyear sic3 zip ;

run;

data within60miles (keep=fyear sic3 zip _zip);

  array zips(12000);

  retain zips:;

  set zip;

  by fyear sic3;

  if first.sic3 then do;

    call missing(of zips(*));

    counter=0;

  end;

  counter+1;

  zips(counter)=zip;

  if last.sic3 then do i=1 to counter;

    do j=i to counter;

      if zipcitydistance(zips(i) , zips(j)) le 60 then do;

        zip=zips(i);

        _zip=zips(j);

        output;

        if i ne j then do;

          zip=zips(j);

          _zip=zips(i);

          output;

        end;

      end;

    end;

  end;

run;

proc sort data=within60miles;

  by fyear sic3 zip;

run;

proc sql;

  create table sum as

    select fyear,sic3,zip,sum(sale) as sale_sum

      from without_bad_zipcodes /*x.Geo_shr*/

        group by fyear,sic3,zip

  ;

quit;

data distance;

  merge within60miles sum;

  by fyear sic3 zip;

run;

proc sql;

  create table total_sale as

    select fyear,sic3,_zip as zip,sum(sale_sum) as total_sale

      from distance

        group by  fyear,sic3,_zip;

quit;

data want;

  merge without_bad_zipcodes total_sale;

  by fyear sic3 zip;

  percent=divide(sale,total_sale) ;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 762 views
  • 6 likes
  • 5 in conversation