- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data test;
input accounts;
datalines;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
;
run;
data test2;
input managers$ count;
datalines;
a 48
b 45
c 49
;
run;
i have these two data sets i want an output of
1 a
2 b
3 c
4 a
5 b
6 b
7 b
8 b
after every merge step the value for count should increase for each variable and if the count reaches 50 then
it should not merge for that step.
plz help me with any logic if possible.
Free to use macros or sql.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data test3;
set tr_test2;
/*keep managers;*/
array a_mngrs {*} _NUMERIC_;
if _N_=1 then index=1;
do i=1 to &n_accounts.;
if min(of a_mngrs{*}) lt 50 then do;
do j=1 to dim(a_mngrs);
if a_mngrs{index} lt 50 then do;
a_mngrs{index}=a_mngrs{index}+1;
managers=vname(a_mngrs{index});
j=dim(a_mngrs); /* Exit loop on j*/
output;
end;
/* Probably a better way to do this */
index=mod(index+1,dim(a_mngrs));
if index=0 then index=dim(a_mngrs);
end;
end;
else do;
i=&naccounts.; /* Exit loop on i */
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looks like the perfect example for using a hash object:
data want (keep=accounts managers);
set test;
length managers $8 count 8;
retain managers;
if _n_ = 1
then do;
declare hash test2(dataset:'test2', ordered: 'yes');
declare hiter iter('test2');
test2.definekey('managers');
test2.definedata('managers','count');
test2.definedone();
call missing(managers,count);
iter.first();
end;
put 'Start of step';
put managers=;
put count=;
runover = 0;
do while (count >= 50);
put 'while iteration';
rc = iter.next();
put managers=;
put count=;
if rc ne 0
then do;
if runover then leave;
runover = 1;
rc = iter.first();
end;
end;
if runover and rc
then do;
put 'No managers left';
stop;
end;
put 'after while iteration';
put managers=;
put count=;
count + 1;
test2.replace();
output;
rc = iter.next();
if rc then rc = iter.first();
put 'end of step';
put managers=;
put count=;
run;
Note that the many PUTs in there are for debugging and clarification how it works.
The RETAIN for managers is necessary, or the key value will be set to missing at the start of each data step iteration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, your problem is not a simple one, so don't expect simple solutions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Addendum:
if it were simple, you wouldn't have needed to come here.
And you just have a fine opportunity to learn about one of the more arcane, but very useful features of the SAS data step language (hash objects).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the Solution Sir.
Could you also point me in the direction regarding what and where to read about hash objects so that i could understand the code better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@wizkid2050 wrote:
Thanks for the Solution Sir.
Could you also point me in the direction regarding what and where to read about hash objects so that i could understand the code better.
A very good (and quite extensive) paper can be found here:
http://www2.sas.com/proceedings/sugi31/241-31.pdf
This is the relevant part of the Data Step Concepts part of the SAS documentation:
Using DATA Step Component Objects
This is the reference for the language elements (statements and methods):
Hash and Hash Iterator Object Language Elements
You can find all this, and more, with a google search for "sas hash objects basics"
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To help you a little in understanding what happens, I've added some comments:
data want (keep=accounts managers);
set test;
length managers $8 count 8;
* necessary so we don't get into trouble because of wrong variable types;
retain managers;
* because of the retain statement, the key of the hash object
is kept from one data step iteration to the next;
if _n_ = 1
then do; * set up the hash object in the very first iteration of the data step;
* declare the hash;
declare hash test2(dataset:'test2', ordered: 'yes');
* from the hash, declare the iterative hash object for sequential search;
declare hiter iter('test2');
* define hash object members;
test2.definekey('managers');
test2.definedata('managers','count');
test2.definedone();
* set to missing to avoid notes;
call missing(managers,count);
* intialize;
iter.first();
end;
runover = 0; * this will flag if we already ran through the whole hash once;
do while (count >= 50); * find a manager with count < 50;
rc = iter.next(); * next hash object;
if rc ne 0 /* we have reached the "end" of the hash */
then do;
if runover then leave; * we've been here already, so exit the loop;
runover = 1;
rc = iter.first(); * reset the hash to its beginning;
end;
end;
if runover and rc
then do; * obviously we've run out of managers with count <= 50;
put 'No managers left';
stop; * end the data step execution;
end;
count + 1; * increment;
test2.replace(); * and replace in the hash;
output; * output before stepping to the next hash object;
* if we don't do this now, the elements of the next hash iteration
will end up in the output observation;
rc = iter.next(); * get next object;
if rc then rc = iter.first(); * if end, reset;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I think this should work :
proc transpose data=test2 out=tr_test2;
id managers;
idlabel managers;
run;
data want;
set tr_test2 (drop=_NAME_);
array a_mngrs {*}_NUMERIC_;
if _N_=1 then index=1;
do accounts=1 to 15;
if a_mngrs{index} lt 50 then do;
put index=;
a_mngrs{index}=a_mngrs{index}+1;
managers=vname(a_mngrs{index});
output;
end;
/* Probably a better way to do this */
index=mod(index+1,3);
if index=0 then index=3;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It doesn't, sorry 😕
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code seems to run fine but the issue here is you are not making use of first dataset named test.
although i have 1 to 15 there but what if i have that i some random order.
and i dont want to sort it. it has to be merged as it is
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code seems to work correctly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There was a proble with the accounts column. Here is a corrected version :
data want;
set tr_test2 (drop=_NAME_);
array a_mngrs {*}_NUMERIC_;
keep accounts managers;
if _N_=1 then index=1;
accounts=0;
do i=1 to 15;
if a_mngrs{index} lt 50 then do;
a_mngrs{index}=a_mngrs{index}+1;
managers=vname(a_mngrs{index});
accounts+1;
output;
end;
/* Probably a better way to do this */
index=mod(index+1,dim(a_mngrs));
if index=0 then index=dim(a_mngrs);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data test;
input accounts;
datalines;
12
2
32
4
5
6
7
8
9
50
11
46
13
14
15
;
run;
data test2;
input managers$ count;
datalines;
a 48
b 45
c 49
;
run;
proc transpose data=test2 out=tr_test2;
id managers;
idlabel managers;
run;
proc sql;
SELECT count(*)
INTO :n_accounts
FROM test;
quit;
data test3;
set tr_test2;
keep managers;
array a_mngrs {*} a--c;
if _N_=1 then index=1;
do i=1 to &n_accounts.;
if a_mngrs{index} lt 50 then do;
a_mngrs{index}=a_mngrs{index}+1;
managers=vname(a_mngrs{index});
output;
end;
/* Probably a better way to do this */
index=mod(index+1,dim(a_mngrs));
if index=0 then index=dim(a_mngrs);
end;
run;
data want;
merge test test3 (in=a);
if a;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Great work Gamotte. adding a little more to the problem. this sticks for only a b and c what if we have n no. of managers in dataset with different counts. Just to make it dynamic.