How do I dynamically rename/update the port_id in port data, based on the bench data, any port data is prefixed with p, bench with b.
port data has about 10k rows while bench has less than 30 rows.
data port;
input port_id;
cards;
10
10
10
33
33
33
;
data bench;
input
port bench;
cards;
10 33
;
/*make data like this
p_10
p_10
p_10
b_33
b_33
b-33
*/
Given that you have a very small number of bench_id values, this approach should be tried as well:
proc sql noprint;
select distinct bench_id into : bench_list separated by ',' from bench;
quit;
data want;
set port;
if port_id in (&bench_list) then prefix='b_';
else prefix='p_';
newvar = prefix || put(port_id, z2.);
run;
Good luck.
Instead of actually changing the values you could simply create a format and then apply this format on port_id as done in code below using the data you've posted:
data Vprep /view=Vprep;
set bench;
retain fmtname 'port_id';
length label $ 20;
start=port; label=cats('p_',port); output;
start=bench; label=cats('b_',bench); output;
run;
proc format cntlin=Vprep;
run;
proc datasets lib=work nolist;
modify port;
format port_id port_id.;
run;
quit;
If you really need the physical values then once you've created the format just add below data step:
data port;
set port;
newvar=put(port_id,port_id.);
run;
I really need to change the value, NOT format them.
data port;
input port_id @@;
cards;
10 10 10 33 33 33
;
data bench;
input port bench;
cards;
10 33
;
data new (keep=port_id newvar);
set bench;
array pb(*) _numeric_;
do until(done);
set port end=done;
newvar = catx('_',char(vname(pb(whichn(port_id, of pb(*)))),1),port_id);
output;
end;
run;
port_id newvar
10 p_10
10 p_10
10 p_10
33 b_33
33 b_33
33 b_33
the bench data is, variable is not port, bench, sorry about that, I test your code, it doesn't look like working
data bench;
input port_id bench_id;
cards;
10 33
33 44
;
Given that you have a very small number of bench_id values, this approach should be tried as well:
proc sql noprint;
select distinct bench_id into : bench_list separated by ',' from bench;
quit;
data want;
set port;
if port_id in (&bench_list) then prefix='b_';
else prefix='p_';
newvar = prefix || put(port_id, z2.);
run;
Good luck.
hi ... if all you have to do is first look at all the values of BENCH_ID in order to make a decision ...
data port;
input port_id @@;
cards;
10 10 10 33 33 33 44 55 33 10 44
;
data bench;
input port_id bench_id;
cards;
10 33
33 44
;
data new (keep=port_id newvar);
length its_a_bench $100;
do j=1 by 1 until (done1);
set bench (keep=bench_id) end=done1;
its_a_bench = catx('/', its_a_bench , bench_id);
end;
do until(done2);
set port end=done2;
newvar = ifc(find(its_a_bench,catt(port_id)) , catt('b_',port_id) , catt('p_',port_id));
output;
end;
run;
port_id newvar
10 p_10
10 p_10
10 p_10
33 b_33
33 b_33
33 b_33
44 b_44
55 p_55
33 b_33
10 p_10
44 b_44
Hi,
Under the assumption that your 'port' and your 'bench' are mutually exclusive, the following Hash() approach seems working:
data port;
input port_id;
cards;
10
10
10
33
33
33
;
data bench;
input
port bench;
cards;
10 33
;
data want;
if _n_=1 then do;
if 0 then set port;
dcl hash h(dataset: 'port', multidata:'y');
h.definekey('port_id');
h.definedata('port_id');
h.definedone();
end;
set bench;
do rc=h.find(key:port) by 0 while (rc=0);
new_id=cats('p_',port_id);
output;
rc=h.find_next(key:port);
end;
do rc=h.find(key:bench) by 0 while (rc=0);
new_id=cats('b_',port_id);
output;
rc=h.find_next(key:bench);
end;
keep port_id new_id;
run;
proc print;run;
Haikuo
They are NOT mutual exclusive, that is the reason I have to prefix it.
So, how do I do that to loose that assumption?
If they are not mutually exclusive then how do you know if eg. Port_ID =40 is "Port" or "Bench" - and if it can be both: Do you have some rule or additional attributes to decide what to choose?
Let me clarify a little bit further, the true issue is in port data, 33 belongs to both port_id and bench_id.
But the rule is: look at bench data, if we find the data in bench_id, we prefix it as "b_", all others are named "p_"
Because 33 in bench data appeared in bench_id, we name it "b_33", rather than p_33, but 10 is not appearing in bench_id, so it is prefixed as p_10.
At the same time dimensions for port data is 10k rows over 50 vars, while bench has less than 30 rows and 10 vars, I need a efficient way to prefix this variable.
data port;
input port_id @@;
cards;
10 10 10 33 33
;
data bench;
input port_id bench_id;
cards;
10 33
33 44
;
/*result i want:
p_10
P_10
p_10
b_33
b_33
*/
"At the same time dimensions for port data is 10k rows over 50 vars, while bench has less than 30 rows and 10 vars"
Could you please provide sample data which better reflects how your real data structure looks like? That would help a lot to come up with an appropriate solution.
Really appreciate this great discussion.
I think astounding's code is much simpler and worked.
MikeZdeb's code also worked, but personally I don't want to use the looping, I think it could cause the performance issue potentially when processing large volume data.
I may test the efficiency and hardware resources to compare further.
Hi ... the "looping" you refer to is to just read the data sets. Astounding's code read the data sets first in SQL, then using the implied looping of the data step rather than DO/SET/END as I did. So, just because you don't see "DO" in Astounding's code doesn't mean that loops are not being used, yes/no?
I concur, Mike. There is always implicit loop in data step, DOW does not add in one.
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.