DATA Step, Macro, Functions and more

Merging Issue

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Merging Issue

[ Edited ]

data test;
input accounts;
datalines;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
;
run;

data test2;
input managers$ count;
datalines;
a 48
b 45
c 49
;
run;

 

 

i have these two data sets i want an output of 

 

1 a

2 b

3 c

4 a

5 b

6 b

7 b

8 b

 

after every merge step the value for count should increase for each variable and if the count reaches 50 then

it should not merge for that step.

plz help me with any logic if possible.

Free to use macros or sql.


Accepted Solutions
Solution
‎06-30-2016 06:16 AM
Regular Contributor
Posts: 233

Re: Merging Issue

Posted in reply to wizkid2050
data test3;
	set tr_test2;
	/*keep managers;*/
	array a_mngrs {*} _NUMERIC_;

	if _N_=1 then index=1;

	do i=1 to &n_accounts.;
		if min(of a_mngrs{*}) lt 50 then do;
			do j=1 to dim(a_mngrs);
				if a_mngrs{index} lt 50 then do;
					a_mngrs{index}=a_mngrs{index}+1;
					managers=vname(a_mngrs{index});
					j=dim(a_mngrs); /* Exit loop on j*/
					output;
				end;

				/* Probably a better way to do this */
				index=mod(index+1,dim(a_mngrs));
				if index=0 then index=dim(a_mngrs);
			end;
		end;
		else do;
			i=&naccounts.; /* Exit loop on i */
		end;
	end;
run;

View solution in original post


All Replies
Super User
Posts: 7,762

Re: Merging Issue

Posted in reply to wizkid2050

Looks like the perfect example for using a hash object:

data want (keep=accounts managers);
set test;
length managers $8 count 8;
retain managers;
if _n_ = 1
then do;
  declare hash test2(dataset:'test2', ordered: 'yes');
  declare hiter iter('test2');
  test2.definekey('managers');
  test2.definedata('managers','count');
  test2.definedone();
  call missing(managers,count);
  iter.first();
end;
put 'Start of step';
put managers=;
put count=;
runover = 0;
do while (count >= 50);
  put 'while iteration';
  rc = iter.next();
put managers=;
put count=;
  if rc ne 0
  then do;
    if runover then leave;
    runover = 1;
    rc = iter.first();
  end;
end;
if runover and rc
then do;
  put 'No managers left';
  stop;
end;
put 'after while iteration';
put managers=;
put count=;
count + 1;
test2.replace();
output;
rc = iter.next();
if rc then rc = iter.first();
put 'end of step';
put managers=;
put count=;
run;

Note that the many PUTs in there are for debugging and clarification how it works.

The RETAIN for managers is necessary, or the key value will be set to missing at the start of each data step iteration.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 23

Re: Merging Issue

Posted in reply to KurtBremser
Can't it be done in more simplified manner than this?
Super User
Posts: 7,762

Re: Merging Issue

Posted in reply to wizkid2050

Well, your problem is not a simple one, so don't expect simple solutions.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 7,762

Re: Merging Issue

Posted in reply to wizkid2050

Addendum:

if it were simple, you wouldn't have needed to come here.

And you just have a fine opportunity to learn about one of the more arcane, but very useful features of the SAS data step language (hash objects).

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 23

Re: Merging Issue

Posted in reply to KurtBremser

Thanks for the Solution Sir.

Could you also point me in the direction regarding what and where to read about hash objects so that i could understand the code better.

Super User
Posts: 7,762

Re: Merging Issue

Posted in reply to wizkid2050

wizkid2050 wrote:

Thanks for the Solution Sir.

Could you also point me in the direction regarding what and where to read about hash objects so that i could understand the code better.


A very good (and quite extensive) paper can be found here:

http://www2.sas.com/proceedings/sugi31/241-31.pdf

This is the relevant part of the Data Step Concepts part of the SAS documentation:

Using DATA Step Component Objects

This is the reference for the language elements (statements and methods):

Hash and Hash Iterator Object Language Elements

 

You can find all this, and more, with a google search for "sas hash objects basics"

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 7,762

Re: Merging Issue

[ Edited ]
Posted in reply to wizkid2050

To help you a little in understanding what happens, I've added some comments:

data want (keep=accounts managers);
set test;
length managers $8 count 8;
* necessary so we don't get into trouble because of wrong variable types;
retain managers;
* because of the retain statement, the key of the hash object
  is kept from one data step iteration to the next;
if _n_ = 1
then do; * set up the hash object in the very first iteration of the data step;
  * declare the hash;
  declare hash test2(dataset:'test2', ordered: 'yes');
  * from the hash, declare the iterative hash object for sequential search;
  declare hiter iter('test2');
  * define hash object members;
  test2.definekey('managers');
  test2.definedata('managers','count');
  test2.definedone();
  * set to missing to avoid notes;
  call missing(managers,count);
  * intialize;
  iter.first();
end;
runover = 0; * this will flag if we already ran through the whole hash once;
do while (count >= 50); * find a manager with count < 50;
  rc = iter.next(); * next hash object;
  if rc ne 0 /* we have reached the "end" of the hash */
  then do;
    if runover then leave; * we've been here already, so exit the loop;
    runover = 1;
    rc = iter.first(); * reset the hash to its beginning;
  end;
end;
if runover and rc
then do; * obviously we've run out of managers with count <= 50;
  put 'No managers left';
  stop; * end the data step execution;
end;
count + 1; * increment;
test2.replace(); * and replace in the hash;
output; * output before stepping to the next hash object;
* if we don't do this now, the elements of the next hash iteration
  will end up in the output observation;
rc = iter.next(); * get next object;
if rc then rc = iter.first(); * if end, reset;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Regular Contributor
Posts: 233

Re: Merging Issue

Posted in reply to KurtBremser

Hello,

 

I think this should work :

 


proc transpose data=test2 out=tr_test2;
id managers;
idlabel managers;
run;

	data want;
		set tr_test2 (drop=_NAME_);
		array a_mngrs {*}_NUMERIC_;
		
		if _N_=1 then index=1;

		do accounts=1 to 15;
			if a_mngrs{index} lt 50 then do;
				put index=;
				a_mngrs{index}=a_mngrs{index}+1;
				managers=vname(a_mngrs{index});
				output;
			end;
			
			/* Probably a better way to do this */
			index=mod(index+1,3);
			if index=0 then index=3;
		end;
	run;
Regular Contributor
Posts: 233

Re: Merging Issue

It doesn't, sorry :-/

Contributor
Posts: 23

Re: Merging Issue

The code seems to run fine but the issue here is you are not making use of first dataset named test.

although i have 1 to 15 there but what if i have that i some random order.

and i dont want to sort it. it has to be merged as it is

 

Contributor
Posts: 23

Re: Merging Issue

The code seems to work correctly.

Regular Contributor
Posts: 233

Re: Merging Issue

Posted in reply to wizkid2050

There was a proble with the accounts column. Here is a corrected version :

 

	data want;
		set tr_test2 (drop=_NAME_);
		array a_mngrs {*}_NUMERIC_;
		keep accounts managers;
		
		if _N_=1 then index=1;

		accounts=0;

		do i=1 to 15;
			if a_mngrs{index} lt 50 then do;
				a_mngrs{index}=a_mngrs{index}+1;
				managers=vname(a_mngrs{index});
				accounts+1;
				output;
			end;
			
			/* Probably a better way to do this */
			index=mod(index+1,dim(a_mngrs));
			if index=0 then index=dim(a_mngrs);
		end;
	run;
Regular Contributor
Posts: 233

Re: Merging Issue

Posted in reply to wizkid2050
data test;
input accounts;
datalines;
12
2
32
4
5
6
7
8
9
50
11
46
13
14
15
;
run;

data test2;
input managers$ count;
datalines;
a 48
b 45
c 49
;
run;

proc transpose data=test2 out=tr_test2;
id managers;
idlabel managers;
run;

proc sql;
	SELECT count(*)
	INTO :n_accounts
	FROM test;
quit;

data test3;
	set tr_test2;
	keep managers;
	array a_mngrs {*} a--c;

	if _N_=1 then index=1;

	do i=1 to &n_accounts.;
		if a_mngrs{index} lt 50 then do;
			a_mngrs{index}=a_mngrs{index}+1;
			managers=vname(a_mngrs{index});
			output;
		end;

		/* Probably a better way to do this */
		index=mod(index+1,dim(a_mngrs));
		if index=0 then index=dim(a_mngrs);
	end;
run;

data want;
	merge test test3 (in=a);
	if a;
run;
Contributor
Posts: 23

Re: Merging Issue

[ Edited ]

Great work Gamotte. adding a little more to the problem. this sticks for only a b and c what if we have n no. of managers in dataset with different counts. Just to make it dynamic.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 23 replies
  • 450 views
  • 4 likes
  • 4 in conversation