BookmarkSubscribeRSS Feed
KoVa
Obsidian | Level 7

Hi all,

I've been asked to make a dataset where I select 10 nearest stores for clients in another dataset. 

I've managed to select the top 10 (the distance is calculated on the latitude and longitude on postal code level), but I'm struggling to get the random part involved. because the distance is calculated on postal code level, so I get a lot of stores at distance = 0 km in bigger cities. so for those cases, I need to randomize the ten closest stores, so that client A gets a different list of 10 stores than client B (even though A and B live in the same postal code (and that postal code having more than 10 stores). in a way, I need to resort the dataset of stores on distance + rand('uniform') for each line in the dataset, add the info of the 10 stores to that line, and proceed to the next line. am I making any sense here? 

 

2 REPLIES 2
ballardw
Super User

Example data might be nice.

 

When you say "o that client A gets a different list of 10 stores than client B" do you mean that none of the stores for A can appear with B? Or would it be okay to have some overlap?

 

If some overlap is acceptable the Proc  SURVEYSELECT with the City as a strata variable might work. But really would kind of need to see what you data file looks like.

 

Some pseudocode to process only the 0 distance cities might look like:

Proc surveyselect data=have (where=(distance=0))
    out=want  sampsize=10 selectall;
   strata clientid;
run;

The data set would have to be sorted by the strata variable prior to this. The "selectall" will select all the records if there are fewer than the sampsize available for a strata. If you have multiple clients in a city with fewer than 10 "nearest" stores at distance 0 you need to spend some more time determining your rules of what to do.

 

I am assuming that your data looks something like:

Clientid CityId Distance

 

If you can't have any overlap in the list of cities then you have a bit more complicated, and possibly not cleanly solvable situation. Consider that you have 3 clients and a total of 19 zero distance stores. How do you allocate them??? and which ones get selected from the next nearest distance?

KoVa
Obsidian | Level 7

the client data set looks like
CLient_ID zip_code
Adjid33 9000
dhfki45 1000

I made another dataset calculating the distance from each store's zip code to every zip code available .
store-id zip_Code distance
dkks3 1000 0
djlslsd 1000 0
dssdk3 1200 1.4
in that dataset I used a proc sort on a random variable in case distance = 0 km, to get random stores for that distance. btw the distance at which they have to be randomized can also be further than 0, for example say 14 km. (imagine the case when you live outside of a great city, and the nearest stores (50 or so) are all at 14 km, those will need to be randomized as well)

I then used a datastep to select the top 10 + a transpose to get the 10 nearest stores for each postal code,

which looks like 

zip_code store1 store2 store3  store4 store5 store6 store7 store8  store9   store10

1000      dkks3  djlslsd  dssdk3 kdlli2  llkoo5  dkki4  dkiu8   dkkdk7 duusy6 kubg6.

 

and then I  merged that with the first dataset. that of course gives me the same list of stores for each client in a certain zip code. whereas i need to loop through the process again for each line of the client dataset, to get a new 'randomized' list for each client.
I was wondering if I could do this using call execute, and running a macro for each line that way.

 

edit: forgot to answer your question: yes, there can be overlaps. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 310 views
  • 3 likes
  • 2 in conversation