Hello,
I have a dataset which I want to spit into two dataset where One dataset will contain all rows for an IDs if one of the rows for an ID have " Counted" as a value for a variable COUNT. The second dataset would contain all the rest of the IDs.
For e.g Dataset: Have
ID | Count | Group |
1 | NA | j |
2 | NA | F2 |
2 | NOT COUNTED | j |
2 | NOT COUNTED | j |
2 | NOT COUNTED | j |
3 | NA | C |
4 | NOT COUNTED | F |
4 | NA | j |
5 | NA | j |
5 | NOT COUNTED | j |
5 | NOT COUNTED | j |
6 | NA | C |
6 | NOT COUNTED | j |
6 | NOT COUNTED | j |
6 | NOT COUNTED | j |
6 | NOT COUNTED | j |
7 | NA | C |
7 | NOT COUNTED | j |
8 | NA | C |
8 | NOT COUNTED | F |
9 | NA | C |
9 | NOT COUNTED | j |
9 | NOT COUNTED | j |
9 | COUNTED | j |
Final dataset would be two datasets:
Dataset : Want 1
ID | Count | Group |
9 | NA | C |
9 | NOT COUNTED | j |
9 | NOT COUNTED | j |
9 | COUNTED | j |
Dataset : Want 2
ID | Count | Group |
1 | NA | j |
2 | NA | F2 |
2 | NOT COUNTED | j |
2 | NOT COUNTED | j |
2 | NOT COUNTED | j |
3 | NA | C |
4 | NOT COUNTED | F |
4 | NA | j |
5 | NA | j |
5 | NOT COUNTED | j |
5 | NOT COUNTED | j |
6 | NA | C |
6 | NOT COUNTED | j |
6 | NOT COUNTED | j |
6 | NOT COUNTED | j |
6 | NOT COUNTED | j |
7 | NA | C |
7 | NOT COUNTED | j |
8 | NA | C |
8 | NOT COUNTED | F |
Thank you.
Hello @d0816
data one two;
merge have(in=a) have(in=b keep=id count rename=count=_count where=(_Count='COUNTED'));
by id;
if a and not b then output two;
else if a and b then output one;
drop _count;
run;
Hello @d0816
data one two;
merge have(in=a) have(in=b keep=id count rename=count=_count where=(_Count='COUNTED'));
by id;
if a and not b then output two;
else if a and b then output one;
drop _count;
run;
This is a typical double DoW-loop job (assuming HAVE is ordered/grouped by ID):
data have ;
input ID @4 Count $11. @17 Group $2. ;
cards ;
1 NA j
2 NA F2
2 NOT COUNTED j
2 NOT COUNTED j
2 NOT COUNTED j
3 NA C
4 NOT COUNTED F
4 NA j
5 NA j
5 NOT COUNTED j
5 NOT COUNTED j
6 NA C
6 NOT COUNTED j
6 NOT COUNTED j
6 NOT COUNTED j
6 NOT COUNTED j
7 NA C
7 NOT COUNTED j
8 NA C
8 NOT COUNTED F
9 NA C
9 NOT COUNTED j
9 NOT COUNTED j
9 COUNTED j
run ;
data want1 want2 ;
do _n_ = 1 by 1 until (last.id) ;
set have ;
by ID ;
if count = "COUNTED" then _want1 = 1 ;
end ;
do _n_ = 1 to _n_ ;
set have ;
if _want1 then output want1 ;
else output want2 ;
end ;
drop _: ;
run ;
Kind regards
Paul D.
Guru @hashman I would have thought you would have posted this one
data one two;
if _n_=1 then do;
dcl hash H (dataset:'have(where=(count="COUNTED"))') ;
h.definekey ("id") ;
h.definedata ("id") ;
h.definedone () ;
end;
set have;
by id;
if first.id then _iorc_=h.check();
if _iorc_=0 then output one;
else output two;
run;
Thank you for the solutions. All three solutions worked. I am going to accept novinosrin's code as a solution as his came first.
Novinosrin, Would appreciate if you could explain how hash is working, or resources related to hash.
Thank you again. You guys rock.
Hi @d0816 I am humbled by your words. Well to tell you the honest truth, my hash skills are exclusively gained by reading @hashman (Guru I forgot Don's Pseudonym, otherwise I would mention him too. Sorry about that but do pass on my regards) authored book.
Hashman aka Paul D is lord's incarnation with a hash boon and a gift to us(SAS community). I hope to become as good as him one day.
Anyways regarding explaining the entire concept will not help much other than the fact what's happening here is the ID that has counted is basically parked in a memory table called Hash . On a pass of a dataset, we basically check for matches with the counted ID in Hash table. The true ones are populated in one and the rest in two which is clearly noticeable in the code.
PS
Please do get hold of the Book
By Paul Dorfman and Don Henderson
at the soonest if you want to speed up your learning. All the best!
@novinosrin: Needless to say, thanks for the plug!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.