Dear all,
I would like to calculate the market share of a firm (tic) based on geographic area (distance based on zipcodes) for a given industry (sic3) and financial year (fyear).
Market share = sales of the firm / total sales for all the firms in the same industry (sic3) in a year (fyear) within 60 miles radius.
Let says, firm A, B and C are located in the same geographic area (ie within 60 miles radius). Only firm A and B are in the same industry (SIC3=123) in fyear = 2003;
then market share for A in year 2003= SALE of firm A/sum(SALE of firm A, SALE of firm B) .
I attached a SAS file with the all US firms from years 2003 to 2010:
tic = identification code for firm
fyear = financial year
SALE = sales value for each firm
SIC3 = industry classification code
zip = zipcode of a firm (for the calculation of distance)
The following procedures should be relevant:
1. Identify firms that:
- in the same SIC3
- in the same FYEAR
- within 60 miles radius (if .<zipcitydistance(_zip , zip) <=60) from a firm-year
2. calculate total sales of the firms identified
3. Calculate market share of the firm = salei / Total sale
Note: I wish to use hash object to calculate the market share as it involves self-matching of firms. It will consume very much computer resources if proc sql to be used.
Any help is much appreciated.
Thank you.
Regards,
mspak
Your task is really not easy,I would split it into lots of small tables and process them one by one .
libname x v9 'c:\temp' ;
proc sort data=x.Geo_shr out=zip(keep= fyear sic3 zip) nodupkey;by fyear sic3 zip ;run;
data _null_;
if _n_ eq 1 then do;
if 0 then set zip;
declare hash h(multidata:'Y') ;
h.definekey('fyear','sic3');
h.definedata('fyear','sic3','zip');
h.definedone();
end;
do until(last.sic3);
set zip;
by fyear sic3;
h.add();
end;
h.output(dataset:cats('new',_n_));
h.clear();
run;
data _null_;
set sashelp.vmember(where=(libname='WORK' and lowcase(memname) like 'new%')) end=last;
if _n_ eq 1 then call execute('proc sql;');
call execute('create table _'||strip(memname)||' as
select a.*,b.zip as _zip
from '||memname||' as a,'||memname||' as b
where .< zipcitydistance(a.zip , b.zip) <=60;');
if last then call execute('quit;');
run;
data distance;
set _: ;
run;
proc sql;
create table sum as
select fyear,sic3,zip,sum(sale) as sale_sum
from x.Geo_shr
group by fyear,sic3,zip;
quit;
data distance;
if _n_ eq 1 then do;
if 0 then set sum(rename=(zip=_zip));
declare hash h(dataset:'sum(rename=(zip=_zip))');
h.definekey('fyear','sic3','_zip');
h.definedata('sale_sum');
h.definedone();
end;
set distance;
h.find();
run;
proc sql;
create table total_sale as
select fyear,sic3,zip,sum(sale_sum) as total_sale
from distance
group by fyear,sic3,zip;
quit;
data want;
if _n_ eq 1 then do;
if 0 then set total_sale;
declare hash h(dataset:'total_sale');
h.definekey('fyear','sic3','zip');
h.definedata('total_sale');
h.definedone();
end;
set x.Geo_shr;
call missing(total_sale);
rc=h.find();
percent=divide(sale,total_sale) ;
drop rc;
run;
KSharp
Your task is really not easy,I would split it into lots of small tables and process them one by one .
libname x v9 'c:\temp' ;
proc sort data=x.Geo_shr out=zip(keep= fyear sic3 zip) nodupkey;by fyear sic3 zip ;run;
data _null_;
if _n_ eq 1 then do;
if 0 then set zip;
declare hash h(multidata:'Y') ;
h.definekey('fyear','sic3');
h.definedata('fyear','sic3','zip');
h.definedone();
end;
do until(last.sic3);
set zip;
by fyear sic3;
h.add();
end;
h.output(dataset:cats('new',_n_));
h.clear();
run;
data _null_;
set sashelp.vmember(where=(libname='WORK' and lowcase(memname) like 'new%')) end=last;
if _n_ eq 1 then call execute('proc sql;');
call execute('create table _'||strip(memname)||' as
select a.*,b.zip as _zip
from '||memname||' as a,'||memname||' as b
where .< zipcitydistance(a.zip , b.zip) <=60;');
if last then call execute('quit;');
run;
data distance;
set _: ;
run;
proc sql;
create table sum as
select fyear,sic3,zip,sum(sale) as sale_sum
from x.Geo_shr
group by fyear,sic3,zip;
quit;
data distance;
if _n_ eq 1 then do;
if 0 then set sum(rename=(zip=_zip));
declare hash h(dataset:'sum(rename=(zip=_zip))');
h.definekey('fyear','sic3','_zip');
h.definedata('sale_sum');
h.definedone();
end;
set distance;
h.find();
run;
proc sql;
create table total_sale as
select fyear,sic3,zip,sum(sale_sum) as total_sale
from distance
group by fyear,sic3,zip;
quit;
data want;
if _n_ eq 1 then do;
if 0 then set total_sale;
declare hash h(dataset:'total_sale');
h.definekey('fyear','sic3','zip');
h.definedata('total_sale');
h.definedone();
end;
set x.Geo_shr;
call missing(total_sale);
rc=h.find();
percent=divide(sale,total_sale) ;
drop rc;
run;
KSharp
Dear Ksharp,
Thank you for your program.
I ran the program, and obtained the output - want (as attached).
I notice that there are missing "total_sale". This is impossible.
The sales value of the firm itself should be included in the calculation of total sales. Even if there is the only one firm (that is - the firm itself) is located within 60 miles radius, the total sales should be same with the sales of that firm.
Don't you think that I should sum up the "total_sale" and "sale"? Should I calculate the market share as: sale/sum(total_sale, sale)?
I feel that the total_sale that you calculated is the total sales of firms within 60 miles radius from the firm (excluding the firm itself).
Another question:
You wrote a command as follows:
if _n_ eq 1 then call execute('proc sql;');
Does this mean that proc sql is run in the memory? I never think that I can do that!!!
Thank you.
Regards,
Mei Sen
"I notice that there are missing "total_sale". This is impossible."
Because the sale of your dataset has lots of missing value. My total_sale means for the same fyear and same sic3 , sum of sale of those company in 60 miles which base on zip. I notice for the same zip,there are lots of company,it is right ? and we calculated distance is base on zip ,that means all of company in the same zip is a single point, and we use this point to calculated distance ,it is right ?
"The sales value of the firm itself should be included in the calculation of total sales."
Yes. I did it so.
"Even if there is the only one firm (that is - the firm itself) is located within 60 miles radius, the total sales should be same with the sales of that firm."
But we use this company's zip to calculate distance ,and there are several different company in the same zip. is it true ?
"Don't you think that I should sum up the "total_sale" and "sale"? Should I calculate the market share as: sale/sum(total_sale, sale)?"
NO. I have already include sale into total_sale .
"I feel that the total_sale that you calculated is the total sales of firms within 60 miles radius from the firm (excluding the firm itself)."
NO.I included it Don't forget I use a zip to get distance,and a zip involved lots of different company.
"Does this mean that proc sql is run in the memory? I never think that I can do that!!!"
That is the same way with generating a macro, I just use call execute replace a macro .
Ksharp
消息编辑者为:xia keshan
I agree with Ksharp .. you have a number (specifically 330) of zip codes that don't match any that are included in the zipcitydistance function. You can identify them by running:
proc sort data=x.Geo_shr (keep=zip) out=test nodupkey;
by zip;
run;
data test (keep=zip);
set test;
test=zipcitydistance(zip , '15217');
if missing(test);
run;
You can probably quickly fix that by replacing those zip codes with neighboring zipcodes by doing Google searches for two strings, namely 'zipcode' and the offending 5 digit zipcode.
Hi,
Very much appreciated your helps.
In the output file 'want'. Some of the observations with SALE figures but missing TOTAL_SALE figures. This is the reason that I think the TOTAL_SALE does not include SALE for the firm itself. For example, there are 10 other firms located within 60 miles radius from the firm A. Then the TOTAL_SALE should be based on SALE of 11 firms (10 other firms plus the firm A itself).
If there is NO any match based on 60 miles radius, same SIC3 and same FYEAR, then TOTAL_SALE = SALE of firm A. Missing TOTAL_SALE is possible only when there is no other firm within 60 miles radius from that firm (ie. firm A) and that firm (ie. Firm A)'s SALE is missing.
We should use zip to calculate the distance. Firms in the same zip considered as the same point. But firms in the same zip should be considered as firms within 60 miles radius (ie. zero mile radius) as well. Let say, There are 5 other firms with the same zip, same sic3 and same fyear as per the firm (firm A) and 3 other firms with different zip, same sic3 and same fyear but within 60 miles radius, then: TOTAL_SALE = SALE of 9 firms (5+3+1).
The following code is also new to me. I’ve never know that I can use sashelp files as my input files!!!
data _null_;
set sashelp.vmember(where=(libname='WORK' and lowcase(memname) like 'new%')) end=last;
if _n_ eq 1 then call execute('proc sql;');
call execute('create table _'||strip(memname)||' as
select a.*,b.zip as _zip
from '||memname||' as a,'||memname||' as b
where .< zipcitydistance(a.zip , b.zip) <=60;');
if last then call execute('quit;');
run;
Even though hash object is an efficient method, but I feel a bit difficult to code it.
Thank you.
Regards,
mspak
If you aren't comfortable with the hash object than I would suggest not using it until you are sufficiently comfortable with using it.
However, that said, you have at least one other problem to deal with in addition to the unusable zip code problem. Some of your records have missing sales values for all firms within a particular grouping.
E.g., try the following and take a look at the records that result at the very beginning of the resulting file, 'want'.
Before anyone spends time trying to suggest a non-hash approach, I would suggest reposting the data file, but with the replacement zip codes and instructions on how you want to deal with missing sales for a particular grouping:
libname x v9 'd:\art' ;
proc sort data=limited out=zip(keep= fyear sic3 zip) nodupkey;
by fyear sic3 zip ;
run;
data _null_;
if _n_ eq 1 then do;
if 0 then set zip;
declare hash h(multidata:'Y') ;
h.definekey('fyear','sic3');
h.definedata('fyear','sic3','zip');
h.definedone();
end;
do until(last.sic3);
set zip;
by fyear sic3;
h.add();
end;
h.output(dataset:cats('new',_n_));
h.clear();
run;
data _null_;
set sashelp.vmember(where=(libname='WORK' and lowcase(memname) like 'new%')) end=last;
if _n_ eq 1 then call execute('proc sql;');
call execute('create table _'||strip(memname)||' as
select a.*,b.zip as _zip
from '||memname||' as a,'||memname||' as b
where .< zipcitydistance(a.zip , b.zip) <=60;');
if last then call execute('quit;');
run;
data distance;
set _: ;
run;
proc sql;
create table sum as
select fyear,sic3,zip,sum(sale) as sale_sum
from limited /*x.Geo_shr*/
group by fyear,sic3,zip;
quit;
data distance;
if _n_ eq 1 then do;
if 0 then set sum(rename=(zip=_zip));
declare hash h(dataset:'sum(rename=(zip=_zip))');
h.definekey('fyear','sic3','_zip');
h.definedata('sale_sum');
h.definedone();
end;
set distance;
h.find();
run;
proc sql;
create table total_sale as
select fyear,sic3,zip,sum(sale_sum) as total_sale
from distance
group by fyear,sic3,zip;
quit;
data want;
if _n_ eq 1 then do;
if 0 then set total_sale;
declare hash h(dataset:'total_sale');
h.definekey('fyear','sic3','zip');
h.definedata('total_sale');
h.definedone();
end;
/*set x.Geo_shr;*/
set limited;
call missing(total_sale);
rc=h.find();
percent=divide(sale,total_sale) ;
drop rc;
run;
proc sort data=want;
by total_sale;
run;
Hi Art,
I am curious. How long it took you to come up the solution?
Linlin, That is Ksharp's proposed code. I only modified the input data to identify an example of the locus of the problems that the OP was questioning.
Hi mspak,
Try this solution
libname comm "C:\projects\SAS_Community"; *Location containing the Sample Geo_shr data set;
/* Macro to calculate the Market share for a specific Year/Industry classification code/radius combination */
%macro getMarketShare(p_inDsName= ,p_year=, p_SIC3=, p_radius=60, p_outDsName=);
%local l_count
l_zips
l_maxFreq
l_maxFreq2;
/* ------------------------------------------------------- */
/* Get unique list of Firms Zip Codes for the */
/* specified Year/industry classification code combination */
/* ------------------------------------------------------- */
proc sort data=&p_inDsName out=work.zips(keep=zip) nodupkey;
BY zip;
where (zipcity(zip) is not null
%if(%superq(p_year) NE ) %then
%do;
AND fyear EQ &p_year
%end;
%if(%superq(p_SIC3) NE ) %then
%do;
AND SIC3 EQ "&p_SIC3"
%end;
);
run;
/* ------------------------------------------------ */
/* Workout the available Zip City Distance clusters */
/* ------------------------------------------------ */
Proc sql noprint;
Select quote(put(zip,z5.))
into :l_zips separated by ' '
from work.zips
;
quit;
%let l_count=&sqlobs;
%put l_count=&l_count;
%put l_zips = &l_zips;
data work.zips(keep=pointA pointB distance);
Set work.zips;
LENGTH pointA pointB $5;
array zips [&l_count] $5 (&l_zips);
pointA = put(zip,z5.);
do i=1 to dim(zips);
pointB= zips;
distance = zipcitydistance(pointA, pointB);
/* Ignore zero distances */
if &p_radius >= distance > 0 then
output;
end;
run;
/* ------------------------------------------ */
/* Declare internal macro for data processing */
/* ------------------------------------------ */
%MACRO shrinkCombos(
p_inDsName= /* Input Source Data Source */
, p_outDsName= /* Output Data Set */
, p_byClause= /* Classification Variables Set */
, p_classVar= /* Class Variable to Transpose */
, p_classVar2= /* Second Class Variable to Transpose (Optional) */
, p_prntClassVar= /* Preceding Class Variable */
, p_aggrVarLen=10 /* Variable length of the new transposed values */
);
/* Sort */
PROC SORT DATA=&p_inDsName;
BY &p_byClause;
RUN;
/* Process the data and generate the transposed value */
DATA &p_outDsName(DROP=&p_classVar RENAME=(aggr=&p_classVar));
SET &p_inDsName;
BY &p_byClause;
LENGTH aggr $&p_aggrVarLen ;
RETAIN aggr ;
IF (FIRST.&p_prntClassVar) THEN
aggr='';
%if (%superq(p_classVar2) NE ) %then
%do;
if (INDEX(STRIP(aggr), STRIP(&p_classVar2)) = 0) then
aggr = catx(' ',aggr,&p_classVar2);
if (INDEX(STRIP(aggr), STRIP(&p_classVar)) = 0) then
aggr = catx(' ',aggr,&p_classVar);
DROP &p_classVar2;
%end;
%else
%do;
aggr = catx(' ',aggr,&p_classVar);
%end;
IF (LAST.&p_prntClassVar) THEN
OUTPUT;
RUN;
%MEND shrinkCombos;
/* Get record count for calculating variable length */
proc sql noprint;
select max(_freq_)
into :l_maxFreq
from ( select distance ,pointA ,count(pointB) as _freq_
from work.zips
Group by distance ,pointA)
;
quit;
%put l_maxFreq=&l_maxFreq;
/* Work out Zips Connection Points */
%shrinkCombos( p_inDsName=work.zips, p_outDsName=work.zips_connctions
, p_byClause=%str(distance pointA pointB), p_classVar=pointB, p_prntClassVar=pointA, p_classVar2=, p_aggrVarLen=%eval(&l_maxFreq*6));
/* Get record count for calculating variable length */
proc sql noprint;
select max(_freq_)
into :l_maxFreq2
from ( select distance ,count(pointA) as _freq_
from work.zips_connctions
Group by distance)
;
quit;
%put l_maxFreq2=&l_maxFreq2;
/* Work out the Distance Clusters and their contributing members */
%shrinkCombos( p_inDsName=work.zips_connctions, p_outDsName=work.dist_clusters
, p_byClause=%str(distance pointA), p_classVar=pointA, p_prntClassVar=distance, p_classVar2=pointB, p_aggrVarLen=%eval((&l_maxFreq*6) + 1 + (&l_maxFreq2*6)));
proc sql;
create table work.combined AS
select a.*
, b.distance
, b.pointA AS cluster
from &p_inDsName a, work.dist_clusters b
where INDEX(STRIP(b.pointA), put(a.zip,z5.)) GT 0
%if(%superq(p_year) NE ) %then
%do;
AND fyear EQ &p_year
%end;
%if(%superq(p_SIC3) NE ) %then
%do;
AND SIC3 EQ "&p_SIC3"
%end;
ORDER BY b.distance, a.zip;
quit;
data &p_outDsName;
total=0;
do until (last.distance);
SET work.combined;
BY distance zip;
total+sale;
end;
do until (last.distance);
SET work.combined;
BY distance zip;
ratio = (sale/total);
output;
end;
format ratio percent8.3;
run;
proc datasets lib=work nolist;
delete zips zips_connctions dist_clusters combined;
run; quit;
%mend getMarketShare;
options mprint;
/* Usage Example */
%getMarketShare(p_inDsName=comm.geo_shr ,p_year=2009, p_SIC3=641, p_radius=250, p_outDsName=work.marketShares);
Dear AhmedAl-Attar,
In order to execute the macro suggested, I think I need to repeat the steps for each combination of year and SIC3, Is it correct?
Thank you.
Regards,
mspak
Yes, The macro can work that way, or you can do all the SIC3s for a particular Year.
When I tried to do all the combinations of Year/SIC3/ZIP , I got 14M+ observations!!
It's somewhat easy to construct a program that would derive all the combinations and write out the macro calls into a file and execute it.
Regards,
Ahmed
Hi Ahmed,
How to execute it?
%getMarketShare(p_inDsName=comm.geo_shr ,p_year=2009, p_SIC3=641, p_radius=250, p_outDsName=work.marketShares);
The above function was suggested. What does this mean? Input data=geo_shr, fyear=2009, sic3={all 641 categories of industry} and p_radius={within 250 miles radius}, output data=marketshares. Is this correct?
Thank you.
mspak
Hi mspak,
Please find included a revised solution. This new revised solution will perform Market/Sales Share calculations on two levels
1. Ratio/Percent at the Industry Segment for a given fyear and radius
2. Ratio/Percent at the different distances amongst the zips within the given Industry Segment/Fyear/Radius combination (Drill Down on Radius to see how the Zips/companys are connected to each other across the different distance bins/clusters)
%macro shrinkCombos(
p_inDsName= /* Input Source Data Source */
, p_outDsName= /* Output Data Set */
, p_byClause= /* Classification Variables Set */
, p_classVar= /* Class Variable to Transpose */
, p_classVar2= /* Second Class Variable to Transpose (Optional) */
, p_prntClassVar= /* Preceding Class Variable */
, p_aggrVarLen=10 /* Variable length of the new transposed values */
);
/* Sort */
PROC SORT DATA=&p_inDsName;
BY &p_byClause;
RUN;
/* Process the data and generate the transposed value */
DATA &p_outDsName(DROP=&p_classVar RENAME=(aggr=&p_classVar));
SET &p_inDsName;
BY &p_byClause;
LENGTH aggr $&p_aggrVarLen ;
RETAIN aggr ;
IF (FIRST.&p_prntClassVar) THEN
aggr='';
%if (%superq(p_classVar2) NE ) %then
%do;
if (INDEX(STRIP(aggr), STRIP(&p_classVar2)) = 0) then
aggr = catx(' ',aggr,&p_classVar2);
if (INDEX(STRIP(aggr), STRIP(&p_classVar)) = 0) then
aggr = catx(' ',aggr,&p_classVar);
DROP &p_classVar2;
%end;
%else
%do;
aggr = catx(' ',aggr,&p_classVar);
%end;
IF (LAST.&p_prntClassVar) THEN
OUTPUT;
RUN;
%mend shrinkCombos;
%macro getMarketShareByDistance(p_inDsName= ,p_year=, p_SIC3=, p_radius=60, p_outDsName=);
%local l_count
l_zips
l_maxFreq
l_maxFreq2;
/* ------------------------------------------------------- */
/* Get unique list of Firms Zip Codes for the */
/* specified Year/industry classification code combination */
/* ------------------------------------------------------- */
proc sort data=&p_inDsName out=work.zips(keep=zip) nodupkey;
BY zip;
where (zipcity(zip) is not null
%if(%superq(p_year) NE ) %then
%do;
AND fyear EQ &p_year
%end;
%if(%superq(p_SIC3) NE ) %then
%do;
AND SIC3 EQ "&p_SIC3"
%end;
);
run;
/* ------------------------------------------------ */
/* Workout the available Zip City Distance clusters */
/* ------------------------------------------------ */
Proc sql noprint;
Select quote(put(zip,z5.))
into :l_zips separated by ' '
from work.zips
;
quit;
%let l_count=&sqlobs;
/*
%put l_count=&l_count;
%put l_zips = &l_zips;
*/
data work.zips(keep=pointA pointB distance);
Set work.zips;
LENGTH pointA pointB $5;
array zips [&l_count] $5 (&l_zips);
pointA = put(zip,z5.);
do i=1 to dim(zips);
pointB= zips;
distance = zipcitydistance(pointA, pointB);
/* Ignore zero distances */
if &p_radius >= distance > 0 then
output;
end;
run;
/* Get record count for calculating variable length */
proc sql noprint;
select max(_freq_)
into :l_maxFreq
from ( select distance ,pointA ,count(pointB) as _freq_
from work.zips
Group by distance ,pointA)
;
quit;
/*%put l_maxFreq=&l_maxFreq;*/
/* Work out Zips Connection Points */
%shrinkCombos( p_inDsName=work.zips, p_outDsName=work.zips_connctions
, p_byClause=%str(distance pointA pointB), p_classVar=pointB, p_prntClassVar=pointA, p_classVar2=, p_aggrVarLen=%eval(&l_maxFreq*6));
/* Get record count for calculating variable length */
proc sql noprint;
select max(_freq_)
into :l_maxFreq2
from ( select distance ,count(pointA) as _freq_
from work.zips_connctions
Group by distance)
;
quit;
/*%put l_maxFreq2=&l_maxFreq2;*/
/* Work out the Distance Clusters and their contributing members */
%shrinkCombos( p_inDsName=work.zips_connctions, p_outDsName=work.dist_clusters
, p_byClause=%str(distance pointA), p_classVar=pointA, p_prntClassVar=distance, p_classVar2=pointB, p_aggrVarLen=%eval((&l_maxFreq*6) + 1 + (&l_maxFreq2*6)));
proc sql;
create table work.combined AS
select a.*
, b.distance
, b.pointA AS cluster
from &p_inDsName a, work.dist_clusters b
where INDEX(STRIP(b.pointA), put(a.zip,z5.)) GT 0
%if(%superq(p_year) NE ) %then
%do;
AND fyear EQ &p_year
%end;
%if(%superq(p_SIC3) NE ) %then
%do;
AND SIC3 EQ "&p_SIC3"
%end;
ORDER BY b.distance, a.zip;
quit;
data &p_outDsName;
total=0;
do until (last.distance);
SET work.combined;
BY distance zip;
total+sale;
end;
do until (last.distance);
SET work.combined;
BY distance zip;
/* Avoid Division by Zero*/
IF (total > 0) then ratio = sale/total;
else ratio = 0;
output;
end;
format ratio percent8.3;
run;
/* Clean up */
proc datasets lib=work nolist;
delete zips zips_connctions dist_clusters combined;
run; quit;
%mend getMarketShareByDistance;
%macro getMarketShareByRadius(
p_inDsName=
, p_year=
, p_sic3=
, p_radius=
, p_outRadiusSharesDsName=
, p_outDistanceSharesDsName=
);
%local l_sTime
l_eTime
l_rTime;
/** Record Starting Time **/
%let l_sTime = %sysfunc(time());
/* --------------------------------- */
/* Get a list of unique combinations */
/* of Fyear/Industry Section */
/* --------------------------------- */
proc summary data=&p_inDsName nway;
class fyear sic3;
where (zipcity(zip) is not null
%if(%superq(p_year) NE ) %then
%do;
AND fyear EQ &p_year
%end;
%if(%superq(p_SIC3) NE ) %then
%do;
AND SIC3 EQ "&p_SIC3"
%end;
);
output out=work.combos(drop=_:);
run;
FILENAME codes TEMP;
DATA _NULL_;
FILE codes;
SET work.combos end=eof;
put '%getMarketShareByDistance(p_inDsName='"&p_inDsName ,p_year=" fyear +(-1)", p_SIC3=" sic3 +(-1)", p_radius=&p_radius, p_outDsName=work._"_n_ +(-1)");";
put "PROC APPEND BASE=&p_outDistanceSharesDsName DATA=WORK._" _n_ +(-1)" FORCE; RUN;";
PUT "PROC DATASETS LIB=WORK NOLIST; DELETE _:; RUN; QUIT;";
if (eof) then
do;
put ;
put "proc sql;";
put +3 "create table &p_outRadiusSharesDsName AS";
put +3 "select distinct fyear, sic3, zip, tic, sale";
put +3 "from &p_outDistanceSharesDsName";
put +3 "group by fyear, sic3, zip, tic";
put +3 "order by fyear, sic3, zip, tic;";
put "quit;";
put ;
put "data &p_outRadiusSharesDsName;";
put +3 "total=0;";
put +3 "radius=&p_radius;";
put +3 "do until (last.sic3);";
put +6 "SET &p_outRadiusSharesDsName;";
put +6 "BY fyear sic3;";
put +6 "total+sale;";
put +3 "end;";
put ;
put +3 "do until (last.sic3);";
put +6 "SET &p_outRadiusSharesDsName;";
put +6 "BY fyear sic3;";
put +6 "IF (total > 0) then ratio = sale/total;";
put +6 "else ratio = 0;";
put +6 "output;";
put +3 "end;";
put +3 "format ratio percent8.2;";
put "run;";
end;
RUN;
%include codes;
/** Record Finish Time **/
%let l_eTime = %sysfunc(time());
%if (%superq(l_sTime) NE ) %then
%do;
/* Calculate Run Time and display it */
%let l_rTime = %sysfunc(putn(%sysevalf(&l_eTime - &l_sTime),time12.2));
%put >>> getMarketShareByRadius :>>> Total RunTime = &l_rTime;
%end;
%mend getMarketShareByRadius;
/* ----------------------- */
/* Usage Example */
/* ----------------------- */
libname comm "C:\projects\SAS_Community";
%getMarketShareByRadius(
p_inDsName=comm.geo_shr /* Input Data */
, p_year= /* Fyear value */
, p_sic3= /* Industry Segment */
, p_radius=60 /* Within 60 miles radius */
, p_outRadiusSharesDsName=work.marketShares /* Output Market Shares within 60 miles Radius Data set */
, p_outDistanceSharesDsName=work.marketSharesByDistance /* Output Market Shares for the various distance bins within 60 miles radius Data Set */
);
Just another way of analyzing the data.
Regards,
Ahmed
: Here is a modified version of Ksharp's code, but without using the hash method, initially filtering the data to remove zip codes that aren't captured by the zipcitydistance function, and not creating all of those subfiles. I get the same results as Ksharp did, thus you still have to validate whether the code is doing what you want/need. The good news is that it runs faster than Ksharp's code:
libname x v9 'd:\art' ;
data without_bad_zipcodes (drop=distance);
set x.Geo_shr;
distance=zipcitydistance(zip , '00646');
if not missing(distance);
run;
proc sort data=without_bad_zipcodes;
by fyear sic3 zip ;
run;
proc sort data=without_bad_zipcodes out=zip(keep= fyear sic3 zip) nodupkey;
by fyear sic3 zip ;
run;
data within60miles (keep=fyear sic3 zip _zip);
array zips(12000);
retain zips:;
set zip;
by fyear sic3;
if first.sic3 then do;
call missing(of zips(*));
counter=0;
end;
counter+1;
zips(counter)=zip;
if last.sic3 then do i=1 to counter;
do j=i to counter;
if zipcitydistance(zips(i) , zips(j)) le 60 then do;
zip=zips(i);
_zip=zips(j);
output;
if i ne j then do;
zip=zips(j);
_zip=zips(i);
output;
end;
end;
end;
end;
run;
proc sort data=within60miles;
by fyear sic3 zip;
run;
proc sql;
create table sum as
select fyear,sic3,zip,sum(sale) as sale_sum
from without_bad_zipcodes /*x.Geo_shr*/
group by fyear,sic3,zip
;
quit;
data distance;
merge within60miles sum;
by fyear sic3 zip;
run;
proc sql;
create table total_sale as
select fyear,sic3,_zip as zip,sum(sale_sum) as total_sale
from distance
group by fyear,sic3,_zip;
quit;
data want;
merge without_bad_zipcodes total_sale;
by fyear sic3 zip;
percent=divide(sale,total_sale) ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.