BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ZRick
Obsidian | Level 7

How do I dynamically rename/update the port_id in port data, based on the bench data, any port data is prefixed with p, bench with b.

port data has about 10k rows while bench has less than 30 rows.

data port;

input port_id;

cards;

10

10

10

33

33

33

;

data bench;

input

port bench;

cards;

10 33

;

/*make data like this

p_10

p_10

p_10

b_33

b_33

b-33

*/

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Given that you have a very small number of bench_id values, this approach should be tried as well:

proc sql noprint;

   select distinct bench_id into : bench_list separated by ',' from bench;

quit;

data want;

   set port;

   if port_id in (&bench_list) then prefix='b_';

   else prefix='p_';

   newvar = prefix || put(port_id, z2.);

run;

Good luck.

View solution in original post

20 REPLIES 20
Patrick
Opal | Level 21

Instead of actually changing the values you could simply create a format and then apply this format on port_id as done in code below using the data you've posted:

data Vprep /view=Vprep;
  set bench;
  retain fmtname 'port_id';
  length label $ 20;
  start=port;  label=cats('p_',port);  output;
  start=bench; label=cats('b_',bench); output;
run;

proc format cntlin=Vprep;
run;

proc datasets lib=work nolist;
  modify port;
    format port_id port_id.;
  run;
quit;

If you really need the physical values then once you've created the format just add below data step:

data port;

  set port;

  newvar=put(port_id,port_id.);

run;

ZRick
Obsidian | Level 7

I really need to change the value, NOT format them.

MikeZdeb
Rhodochrosite | Level 12

data port;

input port_id @@;

cards;

10 10 10 33 33 33

;

data bench;

input port bench;

cards;

10 33

;

data new (keep=port_id newvar);

set bench;

array pb(*) _numeric_;

do until(done);

   set port end=done;

   newvar = catx('_',char(vname(pb(whichn(port_id, of pb(*)))),1),port_id);

   output;

end;

run;


port_id    newvar

   10       p_10

   10       p_10

   10       p_10

   33       b_33

   33       b_33

   33       b_33

ZRick
Obsidian | Level 7

the bench data is, variable is not port, bench, sorry about that, I test your code, it doesn't look like working

data bench;

input port_id bench_id;

cards;

10 33

33 44

;


Astounding
PROC Star

Given that you have a very small number of bench_id values, this approach should be tried as well:

proc sql noprint;

   select distinct bench_id into : bench_list separated by ',' from bench;

quit;

data want;

   set port;

   if port_id in (&bench_list) then prefix='b_';

   else prefix='p_';

   newvar = prefix || put(port_id, z2.);

run;

Good luck.

MikeZdeb
Rhodochrosite | Level 12

hi ... if all you have to do is first look at all the values of BENCH_ID in order to make a decision ...

data port;

input port_id @@;

cards;

10 10 10 33 33 33 44 55 33 10 44

;

data bench;

input port_id bench_id;

cards;

10 33

33 44

;

data new (keep=port_id newvar);

length its_a_bench $100;

do j=1 by 1 until (done1);

  set bench (keep=bench_id) end=done1;

  its_a_bench = catx('/', its_a_bench , bench_id);

end;

do until(done2);

   set port end=done2;

   newvar = ifc(find(its_a_bench,catt(port_id)) , catt('b_',port_id) , catt('p_',port_id));

   output;

end;

run;

port_id    newvar

   10       p_10

   10       p_10

   10       p_10

   33       b_33

   33       b_33

   33       b_33

   44       b_44

   55       p_55

   33       b_33

   10       p_10

   44       b_44

Haikuo
Onyx | Level 15

Hi,

Under the assumption that your 'port' and your 'bench' are mutually exclusive, the following Hash() approach seems working:

data port;

input port_id;

cards;

10

10

10

33

33

33

;

data bench;

input

port bench;

cards;

10 33

;

data want;

  if _n_=1 then do;

    if 0 then set port;

      dcl hash h(dataset: 'port', multidata:'y');

      h.definekey('port_id');

      h.definedata('port_id');

      h.definedone();

  end;

  set bench;

  do rc=h.find(key:port) by 0 while (rc=0);

     new_id=cats('p_',port_id);

     output;

     rc=h.find_next(key:port);

  end;

  do rc=h.find(key:bench) by 0 while (rc=0);

      new_id=cats('b_',port_id);

      output;

      rc=h.find_next(key:bench);

  end;

  keep port_id new_id;

  run;

  proc print;run;

Haikuo

ZRick
Obsidian | Level 7

They are NOT mutual exclusive, that is the reason I have to prefix it.

So, how do I do that to loose that assumption?

Patrick
Opal | Level 21

If they are not mutually exclusive then how do you know if eg. Port_ID =40 is "Port" or "Bench" - and if it can be both: Do you have some rule or additional attributes to decide what to choose?

ZRick
Obsidian | Level 7

Let me clarify a little bit further, the true issue is in port data, 33 belongs to both port_id and bench_id.

But the rule is: look at bench data, if we find the data in bench_id, we prefix it as "b_", all others are named "p_"

Because 33 in bench data appeared in bench_id, we name it "b_33", rather than p_33, but 10 is not appearing in bench_id, so it is prefixed as p_10.

At the same time dimensions for port data is 10k rows over 50 vars, while bench has less than 30 rows and 10 vars, I need a efficient way to prefix this variable.

data port;

input port_id @@;

cards;

10 10 10 33 33

;

data bench;

input port_id bench_id;

cards;

10 33

33 44

;

/*result i want:

p_10

P_10

p_10

b_33

b_33

*/

Patrick
Opal | Level 21

"At the same time dimensions for port data is 10k rows over 50 vars, while bench has less than 30 rows and 10 vars"

Could you please provide sample data which better reflects how your real data structure looks like? That would help a lot to come up with an appropriate solution.

ZRick
Obsidian | Level 7

Really appreciate this great discussion.

I think astounding's code is much simpler and worked.

MikeZdeb's code also worked, but personally I don't want to use the looping, I think it could cause the performance issue potentially when processing large volume data.

MikeZdeb
Rhodochrosite | Level 12

Hi ... the "looping" you refer to is to just read the data sets.  Astounding's code read the data sets first in SQL, then using the implied looping of the data step rather than  DO/SET/END as I did.  So, just because you don't see "DO" in Astounding's code doesn't mean that loops are not being used, yes/no?

Haikuo
Onyx | Level 15

I concur, Mike. There is always implicit loop in data step, DOW does not add in one.

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1328 views
  • 3 likes
  • 6 in conversation