BookmarkSubscribeRSS Feed
mraza12
Calcite | Level 5

Hi,

 

I have a large dataset containing distances of different stores to a customer. My dataset looks something like this:

Customer_ID     Retail_Store_1       Retail_Store_2          Retail_Store_3         Closest_Store

1234                     0.56                           0.78                         3.1                        Retail_Store_1       

 

I am trying to split this dataset into 3 datasets for each retail store. I want to split by closest store and Retail_Store_# < 10 miles.

I used the obs=1000 to limit it but it is affecting my where clause. There are scenarios where the dataset is less than 1000 and I want to add more records to it. I want to check the remaining records in the original dataset and check again for the closest store and less than 10 miles until count reaches 1000 or there is no more records left.

 

Below is my code to search for the closest store:

%macro find_closest_store(county, numrows);

%let county = &county;

%let numrows = &numrows;

 

data closest_store;

set &county.data

array Retail_Store(&numrows) Retail_Store_1-Retail_Store_&numrows;

&county._min_value=min(of Retail_Event(*));

closest_store=vname(Retail_Store(whichn(&county._min_value, of Retail_Store(*))));

run;

%mend find_closest_store;

%find_closest_store(county1, 10);

 

My code for split (I had a macro but it was a long code so I just took a piece of it)

 

proc sql;

create table county_Retail_Store_1 as

select

from county_data (obs=1000)

where closest_store = 'Retail_Store_1' and Retail_Store_1 < 10;

quit;

 

However, this obs statement is throwing off the where clause and I am getting results for distances > 10 as well.

 

These are the steps that I want to take but I am running into issues.

 

  1. Assign the closest store to each member. (I did that successfully)
  2. Split the event data per the closest store but limit the count to 1000 (need help with this - not getting good results)
  3. If less than 1000, search for more members if available and calculate the closest store again and if within 10 miles of the desired store, add those members until it reaches 1000 or no more records left.   (Need help with this as well).

Any help will be appreciated.

 

Many Thanks!

2 REPLIES 2
mkeintz
PROC Star

You report that the "obs=1000" option in this code

 

proc sql;
  create table county_Retail_Store_1 as
  select
  from county_data (obs=1000)
  where closest_store = 'Retail_Store_1' and Retail_Store_1 < 10;
quit;

is "throwing off the where clause and I am getting results for distances > 10 as well".

 

Are you saying that in the absence of the "obs=1000", you only get observations with Retail_Store_1<10, but in using the obs= parameter you start getting instances of Retail_Store_1>=10?  I'll believe it when I see it.

 

In your particular example code, you don't have an variables listed in the select clause (or even "select *").  SQL would throw an error and stop processing the request.  If you already had a table named County_retail_store_1, it would not be replaced.  And if that prior table had distances over 10, they would still be there, wholly unrelated to any "obs=1000" parameter.

 

By the way, you could create all three datasets in one process, using a DATA step, as in:

 

data ret1 ret2 ret3;
  set county_data;

  if closest_store='Retail_Store_1' and retail_store_1<10 then output ret1; else
  if closest_store='Retail_Store_2' and retail_store_1<10 then output ret2; else
  if closest_store='Retail_Store_3' and retail_store_1<10 then output ret3; 
run;

To replicate this via PROC SQL, you'd have to read the data three times, one for each result table.  But the code above reads it only once to produce the same 3 "tables".

 

Edited correction - the 2nd and 3rd IF statement used the wrong variables in the "<10" comparison.  It should be:

 

data ret1 ret2 ret3;
  set county_data;

  if closest_store='Retail_Store_1' and retail_store_1<10 then output ret1; else
  if closest_store='Retail_Store_2' and retail_store_2<10 then output ret2; else
  if closest_store='Retail_Store_3' and retail_store_3<10 then output ret3; 
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mraza12
Calcite | Level 5

Thanks for your response. I actually wrote that piece of code in a hurry. I had a * after select. I did see results for more than 10. I am not sure why it would give me that because when I remove the (obs=1000), I do get proper results.

 

Thanks a lot for your suggestion on data step instead proc sql, I will use that instead.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 445 views
  • 0 likes
  • 2 in conversation