BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sdang
Quartz | Level 8

I want to have two datasets split evenly from a much larger dataset but before the split, SAS must not split obersvations that are in same zipcodes (criteria).  We want to mail an food offer from a dataset we created but  with two different valid dates.  The first valid date is assigned to the first splited dataset and second valid dates assigned to the second splited dataset.   However, we don't want to separate married couple who want to dine together.  Our manual logic is to sort by zip codes and split the datasets evenly and right before the split make sure no two address are the same.  Below is what our rows of data look likes. 

 

ID FN LN Address City State state ZIP
1111111 Wayne Smith 111 Left St BROOKLYN NY 11234
1111112 Mary Smith 111 left BROOKLYN NY 11234

 

A dataset with 10,000 rows split evenly:

Dataset 1 = 5,000

Dataset 2 = 5,000

   But don't separate same zip codes.

 

 Please help!
 
1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

If you're only looking for a rough split, you could just test to find the midway point, and then split the file at the first zipcode change after that.  Something like:

 

data have;
  do id=1 to 10000;
    zip=ceil(ranuni(0)*20);
    output;
  end;
run;

proc sort data=have;
  by zip;
run;


data file1 file2;
  set have nobs=n;
  by zip;

  retain mid 0;
  if _n_ > (n/2) and first.zip then mid=1;

  if mid=0 then output file1;
  else output file2;

drop mid; run;
BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

5 REPLIES 5
Quentin
Super User

If you're only looking for a rough split, you could just test to find the midway point, and then split the file at the first zipcode change after that.  Something like:

 

data have;
  do id=1 to 10000;
    zip=ceil(ranuni(0)*20);
    output;
  end;
run;

proc sort data=have;
  by zip;
run;


data file1 file2;
  set have nobs=n;
  by zip;

  retain mid 0;
  if _n_ > (n/2) and first.zip then mid=1;

  if mid=0 then output file1;
  else output file2;

drop mid; run;
BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
kiranv_
Rhodochrosite | Level 12

This might work. i have not tested code completly


/*gets into your half be careful here you can get decimals if count is odd*/
proc sql;
select count(*)/ 2 into :halfnum from intial_table;
quit;

/*gets you same zipcode  be careful zipcode is not right one probably you should pick addres*/
proc sql;
create table samezipcode_firsthalf
select * from intial_table
where zipped in
(select zipped from intialtable
group by name
having count(*) gt 1) ;
quit;

/*create another table without same zipcode records as id seems unqique*/

proc sql;

create newintialtable as 

select * from intialtable where 

id not in (select id from samezipcode_firsthalf);

quit;

 

/* picking up remaining records for first dataset*/
proc sql;
select count(*) into :newtot
from amezipcode_firsthalf;
quit;

proc sql;
insert into samezipcode_firsthalf
select * from newintialtable /* pick up new records from new intial table */
where monotonic() le %eval( %left(&halfnum)-%left(&newtot));
quit;

/* id is unique as per your data*/
proc sql;
create table second as
select * from intial_table
where id not in (select id from samezipcode_firsthalf);
quit;

PGStats
Opal | Level 21

Another way to do a rough split:

 

proc sort data=have; by zip; run;

data set1 set2;
retain set1;
set have; by zip;
if first.zip then set1 = not set1;
if set1 then output set1; else output set2;
drop set1;
run;
PG
Ksharp
Super User

@PGStats

 

You need 

retain set1;  

?

PGStats
Opal | Level 21

Right! Thanks @Ksharp.

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1638 views
  • 3 likes
  • 5 in conversation