Renaming multiple variables

bshiferaw27 · Posted 07-09-2021 03:02 PM

Hi everyone,

I wanted to merge two separate data sets using proc SQL and order them by the person_id. However, the person_id on both data sets is named in different ways. For example, dataA person_id is named "BL-ABL-002-Xp" while dataB person_id is named "002" and so on. I was wondering if anyone knows an easier way I would be able to rename the dataA multiple person_id variables to only use the three-digit numbers "002"? I hope my question makes sense.

So far the code I am using is:

proc SQL;

create table baseline_comb as

select *

from dataA, dataB

where dataA.person_id = dataB.person_id
order by dataA.person_id;

quit;

I appreciate the help in advance!

Reeza · Posted 07-09-2021 03:15 PM

Are they all formatted exactly as shown, with the - and spaces?

If so, you can use SCAN(). I would also recommend not using a cartesian join, but a full join and change your condition to ON rather than a WHERE. And use aliases for your tables. See an example below.

proc SQL;
create table baseline_comb as
select *
from dataA as A
 full join dataB as B
on scan(a.person_id, 3, "-") = b.person_id
order by a.person_id;

quit;

@bshiferaw27 wrote:

Hi everyone,

I wanted to merge two separate data sets using proc SQL and order them by the person_id. However, the person_id on both data sets is named in different ways. For example, dataA person_id is named "BL-ABL-002-Xp" while dataB person_id is named "002" and so on. I was wondering if anyone knows an easier way I would be able to rename the dataA multiple person_id variables to only use the three-digit numbers "002"? I hope my question makes sense.

So far the code I am using is:

proc SQL;

create table baseline_comb as

select *

from dataA, dataB

where dataA.person_id = dataB.person_id
order by dataA.person_id;

quit;

I appreciate the help in advance!

PaigeMiller · Posted 07-09-2021 03:16 PM

Please clarify.

Is your question about renaming variables (which is what you wrote) or is your question about renaming/modifying the values of the variables?

--
Paige Miller

bshiferaw27 · Posted 07-10-2021 10:58 AM

Yes, my question is about renaming the variable name to match with dataB.

PaigeMiller · Posted 07-10-2021 11:08 AM

@bshiferaw27 wrote:

Yes, my question is about renaming the variable name to match with dataB.

In your original code, the variables seem to have the same name, both variables are named person_id, thus no renaming needed.

where dataA.person_id = dataB.person_id

--
Paige Miller

bshiferaw27 · Posted 07-10-2021 12:12 PM

Sorry, I guess I did not make it very clear. So the issue is, the way the person_id is named in dataA is different from dataB, therefore it is not letting me merge it by the person_id. What I am trying to do is to rename the entire person_ids included in dataA to having the three digits numbers to match with the person_ids in dataB. For example, how can I rename the entire person_id names in dataA from "BL_BLA_002_Xl" to "002" for all the name variables?

I hope this helps clarify.

Tom · Posted 07-10-2021 12:17 PM

@bshiferaw27 wrote:

Sorry, I guess I did not make it very clear. So the issue is, the way the person_id is named in dataA is different from dataB, therefore it is not letting me merge it by the person_id. What I am trying to do is to rename the entire person_ids included in dataA to having the three digits numbers to match with the person_ids in dataB. For example, how can I rename the entire person_id names in dataA from "BL_BLA_002_Xl" to "002" for all the name variables?

I hope this helps clarify.

The way the id values are STORED (or CODED) in the two datasets is different. The name of the variable where it is stored does not change so no renaming of the variables is needed. Instead you want to transform or recode the values stored in the variables.

If you want to transform the longer values like "BL_BLA_002_Xl" to the shorter values like "002" it looks from your limited example like the SCAN() function is what you want. So to find the next to the last word delimited by underscores you could use SCAN(PERSON_ID,-2,'_'). Or if you want the third word use SCAN(PERSON_ID,3,'_');

If you are using PROC SQL to combine the datasets you can even do this on the fly.

create table baseline_comb as
  select a.person_id as long_id
       , b.person_id as short_id
       , *
  from dataA a
  inner join dataB b
  on a.person_id = scan(b.person_id,-2,'_')
  order by long_id, short_id
;

Patrick · Posted 07-10-2021 11:12 PM

@bshiferaw27 What really would help clarify things is if you'd post representative sample data for your two source tables (SAS datasteps creating the sample data) and then show us the desired result.

You say variables but I still believe you're actually talking about the values stored in variables. Once you provide representative sample data all this ambiguity will just go away and we can concentrate on solving your actual problem.

Renaming multiple variables

Re: Renaming multiple variables

Re: Renaming multiple variables

Re: Renaming multiple variables

Re: Renaming multiple variables

Re: Renaming multiple variables

Re: Renaming multiple variables

Re: Renaming multiple variables

Ready to join fellow brilliant minds for the SAS Hackathon?