DATA Step, Macro, Functions and more

efficiently change variable values selectively?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 133
Accepted Solution

efficiently change variable values selectively?

How do I dynamically rename/update the port_id in port data, based on the bench data, any port data is prefixed with p, bench with b.

port data has about 10k rows while bench has less than 30 rows.

data port;

input port_id;

cards;

10

10

10

33

33

33

;

data bench;

input

port bench;

cards;

10 33

;

/*make data like this

p_10

p_10

p_10

b_33

b_33

b-33

*/


Accepted Solutions
Solution
‎08-15-2012 11:31 AM
Super User
Posts: 5,516

Re: efficiently change variable values selectively?

Given that you have a very small number of bench_id values, this approach should be tried as well:

proc sql noprint;

   select distinct bench_id into : bench_list separated by ',' from bench;

quit;

data want;

   set port;

   if port_id in (&bench_list) then prefix='b_';

   else prefix='p_';

   newvar = prefix || put(port_id, z2.);

run;

Good luck.

View solution in original post


All Replies
Respected Advisor
Posts: 4,173

Re: efficiently change variable values selectively?

Instead of actually changing the values you could simply create a format and then apply this format on port_id as done in code below using the data you've posted:

data Vprep /view=Vprep;
  set bench;
  retain fmtname 'port_id';
  length label $ 20;
  start=port;  label=cats('p_',port);  output;
  start=bench; label=cats('b_',bench); output;
run;

proc format cntlin=Vprep;
run;

proc datasets lib=work nolist;
  modify port;
    format port_id port_id.;
  run;
quit;

If you really need the physical values then once you've created the format just add below data step:

data port;

  set port;

  newvar=put(port_id,port_id.);

run;

Frequent Contributor
Posts: 133

Re: efficiently change variable values selectively?

I really need to change the value, NOT format them.

Valued Guide
Posts: 765

Re: efficiently change variable values selectively?

data port;

input port_id @@;

cards;

10 10 10 33 33 33

;

data bench;

input port bench;

cards;

10 33

;

data new (keep=port_id newvar);

set bench;

array pb(*) _numeric_;

do until(done);

   set port end=done;

   newvar = catx('_',char(vname(pb(whichn(port_id, of pb(*)))),1),port_id);

   output;

end;

run;


port_id    newvar

   10       p_10

   10       p_10

   10       p_10

   33       b_33

   33       b_33

   33       b_33

Frequent Contributor
Posts: 133

Re: efficiently change variable values selectively?

the bench data is, variable is not port, bench, sorry about that, I test your code, it doesn't look like working

data bench;

input port_id bench_id;

cards;

10 33

33 44

;


Solution
‎08-15-2012 11:31 AM
Super User
Posts: 5,516

Re: efficiently change variable values selectively?

Given that you have a very small number of bench_id values, this approach should be tried as well:

proc sql noprint;

   select distinct bench_id into : bench_list separated by ',' from bench;

quit;

data want;

   set port;

   if port_id in (&bench_list) then prefix='b_';

   else prefix='p_';

   newvar = prefix || put(port_id, z2.);

run;

Good luck.

Valued Guide
Posts: 765

Re: efficiently change variable values selectively?

hi ... if all you have to do is first look at all the values of BENCH_ID in order to make a decision ...

data port;

input port_id @@;

cards;

10 10 10 33 33 33 44 55 33 10 44

;

data bench;

input port_id bench_id;

cards;

10 33

33 44

;

data new (keep=port_id newvar);

length its_a_bench $100;

do j=1 by 1 until (done1);

  set bench (keep=bench_id) end=done1;

  its_a_bench = catx('/', its_a_bench , bench_id);

end;

do until(done2);

   set port end=done2;

   newvar = ifc(find(its_a_bench,catt(port_id)) , catt('b_',port_id) , catt('p_',port_id));

   output;

end;

run;

port_id    newvar

   10       p_10

   10       p_10

   10       p_10

   33       b_33

   33       b_33

   33       b_33

   44       b_44

   55       p_55

   33       b_33

   10       p_10

   44       b_44

Respected Advisor
Posts: 3,156

Re: efficiently change variable values selectively?

Hi,

Under the assumption that your 'port' and your 'bench' are mutually exclusive, the following Hash() approach seems working:

data port;

input port_id;

cards;

10

10

10

33

33

33

;

data bench;

input

port bench;

cards;

10 33

;

data want;

  if _n_=1 then do;

    if 0 then set port;

      dcl hash h(dataset: 'port', multidata:'y');

      h.definekey('port_id');

      h.definedata('port_id');

      h.definedone();

  end;

  set bench;

  do rc=h.find(keySmiley Tongueort) by 0 while (rc=0);

     new_id=cats('p_',port_id);

     output;

     rc=h.find_next(keySmiley Tongueort);

  end;

  do rc=h.find(key:bench) by 0 while (rc=0);

      new_id=cats('b_',port_id);

      output;

      rc=h.find_next(key:bench);

  end;

  keep port_id new_id;

  run;

  proc print;run;

Haikuo

Frequent Contributor
Posts: 133

Re: efficiently change variable values selectively?

They are NOT mutual exclusive, that is the reason I have to prefix it.

So, how do I do that to loose that assumption?

Respected Advisor
Posts: 4,173

Re: efficiently change variable values selectively?

If they are not mutually exclusive then how do you know if eg. Port_ID =40 is "Port" or "Bench" - and if it can be both: Do you have some rule or additional attributes to decide what to choose?

Frequent Contributor
Posts: 133

Re: efficiently change variable values selectively?

Let me clarify a little bit further, the true issue is in port data, 33 belongs to both port_id and bench_id.

But the rule is: look at bench data, if we find the data in bench_id, we prefix it as "b_", all others are named "p_"

Because 33 in bench data appeared in bench_id, we name it "b_33", rather than p_33, but 10 is not appearing in bench_id, so it is prefixed as p_10.

At the same time dimensions for port data is 10k rows over 50 vars, while bench has less than 30 rows and 10 vars, I need a efficient way to prefix this variable.

data port;

input port_id @@;

cards;

10 10 10 33 33

;

data bench;

input port_id bench_id;

cards;

10 33

33 44

;

/*result i want:

p_10

P_10

p_10

b_33

b_33

*/

Respected Advisor
Posts: 4,173

Re: efficiently change variable values selectively?

"At the same time dimensions for port data is 10k rows over 50 vars, while bench has less than 30 rows and 10 vars"

Could you please provide sample data which better reflects how your real data structure looks like? That would help a lot to come up with an appropriate solution.

Frequent Contributor
Posts: 133

Re: efficiently change variable values selectively?

Really appreciate this great discussion.

I think astounding's code is much simpler and worked.

MikeZdeb's code also worked, but personally I don't want to use the looping, I think it could cause the performance issue potentially when processing large volume data.

Valued Guide
Posts: 765

Re: efficiently change variable values selectively?

Hi ... the "looping" you refer to is to just read the data sets.  Astounding's code read the data sets first in SQL, then using the implied looping of the data step rather than  DO/SET/END as I did.  So, just because you don't see "DO" in Astounding's code doesn't mean that loops are not being used, yes/no?

Respected Advisor
Posts: 3,156

Re: efficiently change variable values selectively?

I concur, Mike. There is always implicit loop in data step, DOW does not add in one.

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 430 views
  • 3 likes
  • 6 in conversation