DATA Step, Macro, Functions and more

modify dataset with index

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

modify dataset with index

[ Edited ]

 

I would like to modify a dataset (master) with the data of another dataset (transaction) and, if the key matches go in replacement, otherwise append in the new data. The result, unexpected is that for cod_cli = 2 not only replaces data (correctly) but appends them also under:

 

data master (index=(cod_cli));
input cod_cli Quantity flag_iorc ;
datalines;
1 8 .
2 10 .
2 12 .
2 13 .
3 21 .
4 13 .
run;

data transaction;
input cod_cli   AddQuantity;
datalines;
2 7
5 12
run;

data master;  
set transaction;
do until (_iorc_=%sysrc(_dsenom)); /*ciclo*/
	modify master key=cod_cli;  
	select(_iorc_); 
	      when(%sysrc(_sok)) do;
		     Quantity = Quantity + AddQuantity;
			 flag_iorc = _iorc_;
	         replace;
	      end;
	      when(%sysrc(_dsenom)) do;
		      Quantity = AddQuantity;
	         output;
			 _error_=0;
	      end;
	      otherwise do;
	         put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
	         put 'Program terminating. DATA step iteration # ' _n_;
	         put _all_;
	         stop;
	      end;
	   end;
end;
run;

from the log results:

 

There were 3 observations rewritten, 2

      observations added and 0 observations deleted.

 

But I would expect 1 observations added..

 

thanks for help


Accepted Solutions
Solution
‎04-01-2016 02:25 PM
Super Contributor
Posts: 408

Re: modify dataset with index

Trap for young players. Here is what I think is the issue. The code does not distinguish between "no keyes at all" and "no more keys" in the master table.

 

For key cod_cli=2 the master has three rows. The DO loop cycles through the master looking for potentially multiple occurences of the key. There are three rows in the master for that value. But the loop continues until a fourth lookup yield a DSENOM. This indicates there are no more rows with that key value. Your logic however has a case for that as well: it outputs a new row. Hence the extra row for cod_cli=2.

 

To further illustrate the issue add a row with cod_cli=3 in the transaction set. You will get an additional new row for that key as well.

 

I would set a flag if a key exists in the master and use that to prevent outputting new rows. See code below. But as usual there are probably many alternative approaches.

 

data master;
set transaction;
matched_flag='N';
do until (_iorc_=%sysrc(_dsenom));
        modify master key=cod_cli;
        select(_iorc_);
              when(%sysrc(_sok)) do;
                     Quantity = Quantity + AddQuantity;
                     matched_flag='Y';
                      _error_=0;
                     replace;
              end;
              when(%sysrc(_dsenom)) do;
                         _error_=0;
                         if matched_flag='N' then output;
              end;
              otherwise do;
                 put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
                 put 'Program terminating. DATA step iteration # ' _n_;                 
                 stop;
              end;
           end;
end;
run;

 

Regards,

- Jan.

 

View solution in original post


All Replies
Super User
Posts: 10,533

Re: modify dataset with index

You should provide what you expect the result of your modification to be.

Contributor
Posts: 33

Re: modify dataset with index

I aspected this result:


Immagine.jpg
Contributor
Posts: 33

Re: modify dataset with index

I want to obtain with one data step the end result of these two data step (3 observations rewritten,  1 observations added)

 

data master (index=(cod_cli));
input cod_cli Quantity;
datalines;
1 8
2 10
2 12
2 13
3 21
4 13
run;

data transaction;
input cod_cli   AddQuantity;
datalines;
2 7
5 12
run;

data master;  
set transaction;
do until (_iorc_=%sysrc(_dsenom));  
	modify master key=cod_cli;  
	select(_iorc_); 
	      when(%sysrc(_sok)) do;
		     Quantity = Quantity + AddQuantity;
			 _error_=0;
	         replace;
	      end;
		  when(%sysrc(_dsenom)) do;
			 _error_=0;
	      end;
	      otherwise do;
	         put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
	         put 'Program terminating. DATA step iteration # ' _n_;
	         put _all_;
	         stop;
	      end;
	   end;
end;
run;
data master; set transaction; modify master key=cod_cli; select(_iorc_); when(%sysrc(_sok)) do; end; when(%sysrc(_dsenom)) do; Quantity = AddQuantity; output; _error_=0; end; otherwise do; put 'ERROR: Unexpected value for _IORC_= ' _iorc_; put 'Program terminating. DATA step iteration # ' _n_; put _all_; stop; end; end; run;
Solution
‎04-01-2016 02:25 PM
Super Contributor
Posts: 408

Re: modify dataset with index

Trap for young players. Here is what I think is the issue. The code does not distinguish between "no keyes at all" and "no more keys" in the master table.

 

For key cod_cli=2 the master has three rows. The DO loop cycles through the master looking for potentially multiple occurences of the key. There are three rows in the master for that value. But the loop continues until a fourth lookup yield a DSENOM. This indicates there are no more rows with that key value. Your logic however has a case for that as well: it outputs a new row. Hence the extra row for cod_cli=2.

 

To further illustrate the issue add a row with cod_cli=3 in the transaction set. You will get an additional new row for that key as well.

 

I would set a flag if a key exists in the master and use that to prevent outputting new rows. See code below. But as usual there are probably many alternative approaches.

 

data master;
set transaction;
matched_flag='N';
do until (_iorc_=%sysrc(_dsenom));
        modify master key=cod_cli;
        select(_iorc_);
              when(%sysrc(_sok)) do;
                     Quantity = Quantity + AddQuantity;
                     matched_flag='Y';
                      _error_=0;
                     replace;
              end;
              when(%sysrc(_dsenom)) do;
                         _error_=0;
                         if matched_flag='N' then output;
              end;
              otherwise do;
                 put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
                 put 'Program terminating. DATA step iteration # ' _n_;                 
                 stop;
              end;
           end;
end;
run;

 

Regards,

- Jan.

 

Super Contributor
Posts: 408

Re: modify dataset with index

[ Edited ]

Come to think of it: if you have multiple keys in the master data set, how is one to tell that an existing key in the transaction dataset represents an update and not a new value? It occurred to me that this consideration is at the root of the issue. Only if the purpose of the transaction set is not to add duplicate key values my solution applies.

 

In real life I would expect an additional column in the data to make the key unique. I have learned that every table should have a unique key or the data model is flawed. I'm just saying.

Contributor
Posts: 33

Re: modify dataset with index

[ Edited ]

I got an unexpected result with this code:

 

 

data BHS;
input class1	class2:$50. class3:$40. value;
datalines;
1 xxx name1 0
1 xxx name2 0
run;
	
data WTG;
input class1	class2:$50.	class3:$40. value;
datalines;	
1 xxx name1 1
1 xxx name2 2
run;

proc datasets library = work nolist;
	 modify BHS;
	 index create ts_di2 = ( class1 class2 class3) / nomiss;
quit;

		
			data BHS;  
			matched_flag='N';
			set WTG (rename=(value=value_mese class3 = class3_mese));
			do until (_iorc_=%sysrc(_dsenom)); 
			modify BHS key=ts_di2;  
			select(_iorc_); 
				  when(%sysrc(_sok)) do;
					 value = value_mese;
					 class3 = class3_mese;
					 matched_flag='Y'; put "match"  _iorc_= _n_=;
					 _error_=0;
					 replace;
				  end;
				  when(%sysrc(_dsenom)) do;
					 value = value_mese;
					 class3 = class3_mese;
					 _error_ = 0;  
					 if matched_flag='N' then do;  
                         put "no_match"  _iorc_= _n_=;
                         output; 
                     end;
				  end;
				  otherwise do;
					 put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
					 put 'Program terminating. DATA step iteration # ' _n_;
					 put _all_;
					 stop;
				  end;
			   end;
			end;
			run;			

The result was:

 

"There were 3 observations rewritten, 1
observations added and 0 observations deleted."

Instead I expected  2 observations rewritten and 0 observations added, as there are no duplicates. I do not understand why in the first lap of the cycle enters in the "no match", as all match records

 

I expected this outcome for the table BHS:

data BHS;;
input class1 class2:$50. class3:$40. value;
datalines;
1 xxx name1 1
1 xxx name2 2
run;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 286 views
  • 1 like
  • 3 in conversation