Solved: spliting a dataset based on conditions

d0816 · Posted 07-25-2019 04:04 PM

Hello,

I have a dataset which I want to spit into two dataset where One dataset will contain all rows for an IDs if one of the rows for an ID have " Counted" as a value for a variable COUNT. The second dataset would contain all the rest of the IDs.

For e.g Dataset: Have

ID	Count	Group
1	NA	j
2	NA	F2
2	NOT COUNTED	j
2	NOT COUNTED	j
2	NOT COUNTED	j
3	NA	C
4	NOT COUNTED	F
4	NA	j
5	NA	j
5	NOT COUNTED	j
5	NOT COUNTED	j
6	NA	C
6	NOT COUNTED	j
6	NOT COUNTED	j
6	NOT COUNTED	j
6	NOT COUNTED	j
7	NA	C
7	NOT COUNTED	j
8	NA	C
8	NOT COUNTED	F
9	NA	C
9	NOT COUNTED	j
9	NOT COUNTED	j
9	COUNTED	j

Final dataset would be two datasets:

Dataset : Want 1

ID	Count	Group
9	NA	C
9	NOT COUNTED	j
9	NOT COUNTED	j
9	COUNTED	j

Dataset : Want 2

ID	Count	Group
1	NA	j
2	NA	F2
2	NOT COUNTED	j
2	NOT COUNTED	j
2	NOT COUNTED	j
3	NA	C
4	NOT COUNTED	F
4	NA	j
5	NA	j
5	NOT COUNTED	j
5	NOT COUNTED	j
6	NA	C
6	NOT COUNTED	j
6	NOT COUNTED	j
6	NOT COUNTED	j
6	NOT COUNTED	j
7	NA	C
7	NOT COUNTED	j
8	NA	C
8	NOT COUNTED	F

Thank you.

novinosrin · Posted 07-25-2019 04:29 PM

Hello @d0816

data one two;
merge have(in=a) have(in=b keep=id count rename=count=_count where=(_Count='COUNTED'));
by id;
if a and not b then output two;
else if a and b then output one;
drop _count;
run;

View solution in original post

novinosrin · Posted 07-25-2019 04:29 PM

Hello @d0816

data one two;
merge have(in=a) have(in=b keep=id count rename=count=_count where=(_Count='COUNTED'));
by id;
if a and not b then output two;
else if a and b then output one;
drop _count;
run;

hashman · Posted 07-25-2019 04:29 PM

@d0816:

This is a typical double DoW-loop job (assuming HAVE is ordered/grouped by ID):

data have ;                               
  input ID @4 Count $11. @17 Group $2. ;  
  cards ;                                 
1  NA           j                         
2  NA           F2                        
2  NOT COUNTED  j                         
2  NOT COUNTED  j                         
2  NOT COUNTED  j                         
3  NA           C                         
4  NOT COUNTED  F                         
4  NA           j                         
5  NA           j                         
5  NOT COUNTED  j                         
5  NOT COUNTED  j                         
6  NA           C                         
6  NOT COUNTED  j                         
6  NOT COUNTED  j                         
6  NOT COUNTED  j                         
6  NOT COUNTED  j                         
7  NA           C                         
7  NOT COUNTED  j                         
8  NA           C                         
8  NOT COUNTED  F                         
9  NA           C                         
9  NOT COUNTED  j                         
9  NOT COUNTED  j                         
9  COUNTED      j                         
run ;                                     
                                          
data want1 want2 ;                        
  do _n_ = 1 by 1 until (last.id) ;       
    set have ;                            
    by ID ;                               
    if count = "COUNTED" then _want1 = 1 ;
  end ;                                   
  do _n_ = 1 to _n_ ;                     
    set have ;                            
    if _want1 then output want1 ;         
    else           output want2 ;         
  end ;   
  drop _: ;                                
run ;

Kind regards

Paul D.

novinosrin · Posted 07-25-2019 04:37 PM

Guru @hashman I would have thought you would have posted this one

data one two;
if _n_=1 then do;
   dcl hash H (dataset:'have(where=(count="COUNTED"))') ;
   h.definekey  ("id") ;
   h.definedata ("id") ;
   h.definedone () ;
end;
set have;
by id;
if first.id then _iorc_=h.check();
if _iorc_=0 then output one;
else output two;
run;

d0816 · Posted 07-25-2019 05:22 PM

@novinosrin @hashman

Thank you for the solutions. All three solutions worked. I am going to accept novinosrin's code as a solution as his came first.

Novinosrin, Would appreciate if you could explain how hash is working, or resources related to hash.

Thank you again. You guys rock.

novinosrin · Posted 07-25-2019 05:31 PM

Hi @d0816 I am humbled by your words. Well to tell you the honest truth, my hash skills are exclusively gained by reading @hashman (Guru I forgot Don's Pseudonym, otherwise I would mention him too. Sorry about that but do pass on my regards) authored book.

Hashman aka Paul D is lord's incarnation with a hash boon and a gift to us(SAS community). I hope to become as good as him one day.

Anyways regarding explaining the entire concept will not help much other than the fact what's happening here is the ID that has counted is basically parked in a memory table called Hash . On a pass of a dataset, we basically check for matches with the counted ID in Hash table. The true ones are populated in one and the rest in two which is clearly noticeable in the code.

PS

Please do get hold of the Book

Data Management Solutions Using SAS^® Hash Table Operations: A Business Intelligence Case Study

By Paul Dorfman and Don Henderson

at the soonest if you want to speed up your learning. All the best!

hashman · Posted 07-25-2019 05:39 PM

@novinosrin: Needless to say, thanks for the plug!

spliting a dataset based on conditions

Re: spliting a dataset based on conditions

Re: spliting a dataset based on conditions

Re: spliting a dataset based on conditions

Re: spliting a dataset based on conditions

Re: spliting a dataset based on conditions

Re: spliting a dataset based on conditions

Data Management Solutions Using SAS^® Hash Table Operations: A Business Intelligence Case Study

Re: spliting a dataset based on conditions

Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study

Classroom Training Available!

Data Management Solutions Using SAS^® Hash Table Operations: A Business Intelligence Case Study