I have two datasets that I would like to merge together, but each datasets have a series of columns that need to be sorted, renamed or trimmed. I have a column "top" that repeats several times with additional numbers and I'm trying to crop out the following sequential characters or extract just "to" from all columns itself to later remove it.
My goal is to remove the character "to" from appearing in the final data, but I tried doing that directly and it did not work.
For example,
Dataset 1 has "
Top1
Top2
Top4....
Dataset 2 has:
Tops23
Tops45
Topsi46...
Here are some codes I was thinking of using:
Proc sort data = first_dataset ;
by id;
run;
data first_dataset_clean;
set first_dataset ;
by id;
to = substr(ex:, 1, 3);
run;
data first_dataset_clean second_dataset_clean;
merge first_dataset(rename = ("top:" = "to")) second_dataset (rename = ("top:" = "to"))
by id;
if first.id = 1 and last.id =1 then output first_dataset_clean;
else output second_dataset_clean;
run;
I've been considering using "where ne", "scan", "tranwrd", "substr", "rename" but couldn't figure out how to pull out just "to" from both datasets.
Any thoughts?