Hi
I will start with creating some test data based on your description and examples. As mentioned a few times in this community, it would be a great help, if the person asking a question also provided the code to make test data, so it could be copy-pasted into SAS.
/* Simulate input data table */
data have;
string = 'xyz0123; xyz0435; xyz0987;'; output;
string = 'wcz0453; wcz0657; wcz0987;'; output;
string = 'abc0123; abcsdfsdf0435;'; output;
string = 'erfsdf0453; fgh0657; hij1111;'; output;
run;
/* Simulate existing code table */
data codes;
length Code $4 Description $20 Category $10;
Code = '0123'; Description = 'M'; Category = 'Gender'; output;
Code = '0657'; Description = 'F'; Category = 'Gender'; output;
Code = '0435'; Description = 'California'; Category = 'State'; output;
Code = '0453'; Description = 'Arizona'; Category = 'State'; output;
Code = '0987'; Description = 'US'; Category = 'Country'; output;
run;
Then we should find a way to use code descriptions and categories in a recoding program. In this case hash objects would be the right choice, but it has a steep learning curve and requires a deep understanding af what's going on in a data step, so I will - as suggested by Reeza - use formats. Wi will create two formats (= translation tables) out of the code table. SAS Proc format can create a format from a table with a specified structure, so we build the tables and then use them as input to proc format.
/* Create format to translate from Code to Description */
data descfmt; set codes end=eof;
keep fmtname type start label hlo;
retain fmtname 'descfmt' type 'C';
start = Code;
label = Description;
output;
run;
proc format cntlin=descfmt;
run;
/* Create format to translate from Code to Category */
data catfmt; set codes end=eof;
keep fmtname type start label hlo;
retain fmtname 'catfmt' type 'C';
start = Code;
label = Category;
run;
proc format cntlin=catfmt;
run;
Now we are ready to do the recoding String in a data step. Other variables besides String in the input data set are transferred "as is" to the output data set.
The idea is to process each record and loop over the elements in the input string. In the loop, we extract the element and split it in Prefix and Code. We then use the Category-format to find out which variable should hold the description, and use the Description-format to get the value. All variables are initiated with a not-ascertainable-value, that will be kept if a code is not found in the current string to match a given category.
If there is anything in the code you don't understand, like the functions used, then try googling it, just remember always to precede your search term (like scan function) with SAS. The built-in help is actually pretty good, too.
* Use formats to recode input;
data want; set have;
length Code $4 wpref GenderPrefix Gender StatePrefix State CountryPrefix Country $40;
drop String Code i wstr strl wpref;
* Initiate variables with n.a;
Gender = 'n.a.'; State = 'n.a.'; Country = 'n.a.';
GenderPrefix = 'n.a.'; StatePrefix = 'n.a.'; CountryPrefix = 'n.a.';
* Loop over elements in String;
do i = 1 to count(string,';');
* Split in single results and set result length;
wstr = left(scan(string,i,';'));
strl = length(trim(wstr));
* Extract prefix as everything except last four characters, and code as rest;
wpref = substr(wstr,1,strl-4);
Code = substr(wstr,strl-3);
* Set Description in proper variable according to Category;
if put(Code,$catfmt.) = 'Gender' then do;
GenderPrefix = wpref;
Gender = put(Code,$descfmt.);
end;
else if put(Code,$catfmt.) = 'State' then do;
StatePrefix = wpref;
State = put(Code,$descfmt.);
end;
else if put(Code,$catfmt.) = 'Country' then do;
CountryPrefix = wpref;
Country = put(Code,$descfmt.);
end;
end;
output;
run;
And what do we get as output:
I hope this helps.
... View more