<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I determine the 4 Closest Peers by Distance? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749665#M235635</link>
    <description>&lt;P&gt;Can you post the desired result given this sample data?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Makes it much easier to provide usable code.&lt;/P&gt;</description>
    <pubDate>Tue, 22 Jun 2021 18:38:10 GMT</pubDate>
    <dc:creator>PeterClemmensen</dc:creator>
    <dc:date>2021-06-22T18:38:10Z</dc:date>
    <item>
      <title>How do I determine the 4 Closest Peers by Distance?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749605#M235593</link>
      <description>&lt;P&gt;I am attempting to create 12 additional variables containing the following three columns repeated 4x: peer ID, peer zip code, peer distance. The first three columns would be the closest peer (cls1gvk, cls1zip, cls1dist) and each subsequent iteration finds the&amp;nbsp;&lt;STRONG&gt;next closest&lt;/STRONG&gt; peer, and so on until the 4th closest peer. Peers are established based upon industry classification, 'gind' in Compustat, and year, 'fyear' in Compustat.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Current plan of action:&lt;/P&gt;&lt;P&gt;The only path I can see forward is to create a unique DB for every unique gind-fyear (of which I have 888). They would each begin as a 2x&lt;EM&gt;n,&amp;nbsp;&lt;/EM&gt;where&amp;nbsp;the variables are gvkey and zip code. I would then perform a pairwise combination of each gvkey (3rd column), add the matched zip to each paired gvkey (4th column), compute the distance (zipcitydistance function) between zip codes for each (5th), rank them (6th), and finally remove all of those ranks greater than four.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The data I have looks like:&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;gvkey&lt;/TD&gt;&lt;TD&gt;fyear&lt;/TD&gt;&lt;TD&gt;gind&lt;/TD&gt;&lt;TD&gt;zip code&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100001&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100001&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100002&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100002&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100003&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100003&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100004&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100004&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100005&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100005&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100006&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100006&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;200555&lt;/TD&gt;&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100007&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100007&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100008&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100008&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100009&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100009&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100010&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100010&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100011&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100011&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100012&lt;/TD&gt;&lt;TD&gt;2018&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100012&lt;/TD&gt;&lt;TD&gt;2019&lt;/TD&gt;&lt;TD&gt;312448&lt;/TD&gt;&lt;TD&gt;10022&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help is appreciated.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 15:42:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749605#M235593</guid>
      <dc:creator>ericdrosano</dc:creator>
      <dc:date>2021-06-22T15:42:37Z</dc:date>
    </item>
    <item>
      <title>Re: How do I determine the 4 Closest Peers by Distance?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749665#M235635</link>
      <description>&lt;P&gt;Can you post the desired result given this sample data?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Makes it much easier to provide usable code.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 18:38:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749665#M235635</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2021-06-22T18:38:10Z</dc:date>
    </item>
    <item>
      <title>Re: How do I determine the 4 Closest Peers by Distance?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749683#M235646</link>
      <description>&lt;P&gt;I found a solution to my problem. Since I wanted to match each firm (gvkey) with its 10 nearest industry-year peers, I first created the gindfyear variable which is a combination of industry (gind) and year (fyear). This allowed me to create the pairwise combinations based upon matched industry-year. Since I began with 72,693 observations, I did not want to create&amp;nbsp;5,284,199,556 firm-firm pairwise combinations!! This first batch of code did this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create table want
    as select distinct a.gvkey, a.zip_code as main_zip, b.gvkey as peer, b.zip_code as peer_zip, a.gindfyear
    from have a, have b
    where a.gindfyear eq b.gindfyear and a.gvkey ne b.gvkey; 
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;As you can see above, it creates 5 variables: gvkey, main zip, peer, peer zip, and industry-year. This resulted in 9,861,622 obs, which is far fewer than the potential 5 billion I was reticent to create.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I then use the zipcitydistance function in SAS to determine the distances between each main and peer. Finally, I ranked them, keeping only the four closest. The final code is below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want2;
	set want;
	dist= zipcitydistance(main_zip, peer_zip);
run;
proc sort
	data= want2;
	by gvkey gindfyear dist;
quit;
data want3;
	set want2;
	by gvkey gindfyear dist;

	retain closest;
	if first.gindfyear then do;
		closest=1;
		output;
	end;
	else if closest lt 4 then do;
		closest = closest + 1;
		output;
	end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This correctly yields 726,930 observations since I began with 72,693 observations (10x for each).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 19:26:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749683#M235646</guid>
      <dc:creator>ericdrosano</dc:creator>
      <dc:date>2021-06-22T19:26:12Z</dc:date>
    </item>
    <item>
      <title>Re: How do I determine the 4 Closest Peers by Distance?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749684#M235647</link>
      <description>I saw your reply after I had already found the solution. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Tue, 22 Jun 2021 19:25:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-determine-the-4-Closest-Peers-by-Distance/m-p/749684#M235647</guid>
      <dc:creator>ericdrosano</dc:creator>
      <dc:date>2021-06-22T19:25:38Z</dc:date>
    </item>
  </channel>
</rss>

