BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jojo
Obsidian | Level 7

I need to split one dataset to two. If the value for each of the two variables (A, B) is zero for all obs in the by group, then output to one dataset, otherwise output to another dataset. So all obs with x=2, y= 12 or x=2, y=13 or x=3, y=9 should be in one dataset and other obs should be in another dataset.

I can do it by getting the sum of A and B within the group and output the last.m if the sum of A and B are zero, and then merge back with the original data by n and m to get all obs. I am thinking if there is another way to do this as the data are huge (over several millions obs), I really want to avoid sorting or merging as much as possible.

                                                                            x       y           A          B  

                                          2    12     0     0

                                          2    12     0     0

                                          2    12     0     0

                                          2    12     0     0

                                          2    12     0     0

                                          2    13     0     0

                                          2    13     0     0

                                          2    13     0     0

                                          2    13     0     0

                                          2    14     0     0

                                          2    14     1     0

                                          2    14     0     0

                                          2    14     0     1

                                          2    15     0     0

                                          2    15     0     0

                                          2    15     0     1

                                          2    15     0     1

                                          2    16     0     0

                                          2    16     0     1

                                          3     9     0     0

                                          3     9     0     0

                                          3     9     0     0


1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

How about using Merge statement?

data data wantAllZero wantNotAllZero;

merge have (where=(a ne 0 or b ne 0) in=no ) have;

by x y;

if no then output wantnotallzero;

else output wantallzero;

run;

Haikuo

View solution in original post

6 REPLIES 6
TomKari
Onyx | Level 15

Here's a non-optimized version using SQL. If you're running this on a server, I don't think it'll be slow with millions of observations. On your PC it might be slow. I know it could be combined into one SQL statement, but I don't know that it'll make it any faster.

It's a start.

Tom

data have;
input x y A B;
cards;
2 12 0 0
2 12 0 0
2 12 0 0
2 12 0 0
2 12 0 0
2 13 0 0
2 13 0 0
2 13 0 0
2 13 0 0
2 14 0 0
2 14 1 0
2 14 0 0
2 14 0 1
2 15 0 0
2 15 0 0
2 15 0 1
2 15 0 1
2 16 0 0
2 16 0 1
3 9 0 0
3 9 0 0
3 9 0 0
run;

proc sql;
create table have_totals as
select x, y, sum(a) as sum_A, sum(b) as sum_B
from have
group by x, y;

create table want_0 as
select h.x, h.y, h.A, h.B
from have h join have_totals t on h.x = t.x and h.y = t.y
where t.sum_A = 0 and t.sum_B = 0;

create table want_gt_0 as
select h.x, h.y, h.A, h.B
from have h join have_totals t on h.x = t.x and h.y = t.y
where t.sum_A > 0 or t.sum_B > 0;
quit;

jojo
Obsidian | Level 7

Thank you!

Patrick
Opal | Level 21

If the set of distinct keys fits into memory then below code could be an option:

data wantAllZero wantNotAllZero;

  set have;

  drop _: ;

  if _n_=1 then

  do;

    dcl hash h1 (dataset:'have(where=(a ne 0 or b ne 0))',hashexp:9);

    _rc=h1.defineKey('x','y');

    _rc=h1.defineDone();

  end;

  if h1.check()=0 then output wantNOtAllZero;

  else output wantAllZero;

run;

Message was edited by: Patrick Matter Code Fix: added closing bracket in where clause

jojo
Obsidian | Level 7

Thank you very much!

Tom
Super User Tom
Super User

data wantAllZero wantNotAllZero;

  nonzero=0;

  do until (last.y);

    set have;

    by x y;

    if a or b then nonzero=1;

  end;

  do until(last.y);

    set have;

     by x y;

     if nonzero then output wantNotAllZero;

     else output wantAllZero;

   end;

  drop nonzero;

run;

Haikuo
Onyx | Level 15

How about using Merge statement?

data data wantAllZero wantNotAllZero;

merge have (where=(a ne 0 or b ne 0) in=no ) have;

by x y;

if no then output wantnotallzero;

else output wantallzero;

run;

Haikuo

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1182 views
  • 2 likes
  • 5 in conversation