BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jojo
Obsidian | Level 7

I need to split one dataset to two. If the value for each of the two variables (A, B) is zero for all obs in the by group, then output to one dataset, otherwise output to another dataset. So all obs with x=2, y= 12 or x=2, y=13 or x=3, y=9 should be in one dataset and other obs should be in another dataset.

I can do it by getting the sum of A and B within the group and output the last.m if the sum of A and B are zero, and then merge back with the original data by n and m to get all obs. I am thinking if there is another way to do this as the data are huge (over several millions obs), I really want to avoid sorting or merging as much as possible.

                                                                            x       y           A          B  

                                          2    12     0     0

                                          2    12     0     0

                                          2    12     0     0

                                          2    12     0     0

                                          2    12     0     0

                                          2    13     0     0

                                          2    13     0     0

                                          2    13     0     0

                                          2    13     0     0

                                          2    14     0     0

                                          2    14     1     0

                                          2    14     0     0

                                          2    14     0     1

                                          2    15     0     0

                                          2    15     0     0

                                          2    15     0     1

                                          2    15     0     1

                                          2    16     0     0

                                          2    16     0     1

                                          3     9     0     0

                                          3     9     0     0

                                          3     9     0     0


1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

How about using Merge statement?

data data wantAllZero wantNotAllZero;

merge have (where=(a ne 0 or b ne 0) in=no ) have;

by x y;

if no then output wantnotallzero;

else output wantallzero;

run;

Haikuo

View solution in original post

6 REPLIES 6
TomKari
Onyx | Level 15

Here's a non-optimized version using SQL. If you're running this on a server, I don't think it'll be slow with millions of observations. On your PC it might be slow. I know it could be combined into one SQL statement, but I don't know that it'll make it any faster.

It's a start.

Tom

data have;
input x y A B;
cards;
2 12 0 0
2 12 0 0
2 12 0 0
2 12 0 0
2 12 0 0
2 13 0 0
2 13 0 0
2 13 0 0
2 13 0 0
2 14 0 0
2 14 1 0
2 14 0 0
2 14 0 1
2 15 0 0
2 15 0 0
2 15 0 1
2 15 0 1
2 16 0 0
2 16 0 1
3 9 0 0
3 9 0 0
3 9 0 0
run;

proc sql;
create table have_totals as
select x, y, sum(a) as sum_A, sum(b) as sum_B
from have
group by x, y;

create table want_0 as
select h.x, h.y, h.A, h.B
from have h join have_totals t on h.x = t.x and h.y = t.y
where t.sum_A = 0 and t.sum_B = 0;

create table want_gt_0 as
select h.x, h.y, h.A, h.B
from have h join have_totals t on h.x = t.x and h.y = t.y
where t.sum_A > 0 or t.sum_B > 0;
quit;

jojo
Obsidian | Level 7

Thank you!

Patrick
Opal | Level 21

If the set of distinct keys fits into memory then below code could be an option:

data wantAllZero wantNotAllZero;

  set have;

  drop _: ;

  if _n_=1 then

  do;

    dcl hash h1 (dataset:'have(where=(a ne 0 or b ne 0))',hashexp:9);

    _rc=h1.defineKey('x','y');

    _rc=h1.defineDone();

  end;

  if h1.check()=0 then output wantNOtAllZero;

  else output wantAllZero;

run;

Message was edited by: Patrick Matter Code Fix: added closing bracket in where clause

jojo
Obsidian | Level 7

Thank you very much!

Tom
Super User Tom
Super User

data wantAllZero wantNotAllZero;

  nonzero=0;

  do until (last.y);

    set have;

    by x y;

    if a or b then nonzero=1;

  end;

  do until(last.y);

    set have;

     by x y;

     if nonzero then output wantNotAllZero;

     else output wantAllZero;

   end;

  drop nonzero;

run;

Haikuo
Onyx | Level 15

How about using Merge statement?

data data wantAllZero wantNotAllZero;

merge have (where=(a ne 0 or b ne 0) in=no ) have;

by x y;

if no then output wantnotallzero;

else output wantallzero;

run;

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1149 views
  • 2 likes
  • 5 in conversation