I want to built a function to remove the duplicates in a string. In data step, the logic works well, however when I transfer it into function, it did not work. Does someone help me to figure out? thanks
Below is the data step:
*******************************************************************
data a;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=string;
length word $100;
i = 2;
do while(scan(string2, i, ',') ^= '');
word = scan(string2, i, ',');
do j = 1 to i - 1;
if word = scan(string2, j, ',') then do;
start = findw(string2, word, ',', findw(string2, word, ',', 't') + 1, 't');
string2 = cats(substr(string2, 1, start - 2), substr(string2, start + length(word)));
end;
end;
i = i + 1;
end;
keep string string2;
run;
**************************************************************************
Below is the function I want to built:
***************************************************
proc fcmp outlib=work.funcs.rem_dup;
function rem_dup(string $) $;
string2=string;
length word $100;
i = 2;
do while(scan(string2, i, ',') ^= '');
word = scan(string2, i, ',');
do j = 1 to i - 1;
if word = scan(string2, j, ',') then do;
start = findw(string2, word, ',', findw(string2, word, ',', 't') + 1, 't');
string2 = cats(substr(string2, 1, start - 2), substr(string2, start + length(word)));
end;
end;
i = i + 1;
end;
return(string2);
endsub;
run;
**************************************************
Call the function:
**************************************
options cmplib=(work work.funcs);
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=rem_dup(string);
run;
Let's look at you actual data step.
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=rem_dup(string);
run;
No where in there did you tell SAS what length to use for STRING2. So it will guess how to define it based on the fact that you are assigning it the value from your function.
What happens if you define the length of STRING2 in the data step?
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
length string2 $200;
string2=rem_dup(string);
run;
Also what does SAS know about REM_DUP() that can help it figure out what length to use to define STRING2? You didn't tell it what length of string it returns, just that it returns a string.
Did you try adding the length into the FUNCTION statement?
function rem_dup(string $) $ 200;
Your log would have
30 options cmplib=(work work.funcs); 31 data B; 32 string = "spanner,span,spaniel,span,abc,span,bcc"; 33 string2=rem_dup(string); NOTE: No CMP or C functions found in library work. 34 run;
Try instead
options cmplib=work.funcs; data B; string = "spanner,span,spaniel,span,abc,span,bcc"; string2=rem_dup(string); run;
You use APPEND or INSERT to add to an existing cmplib path I believe.
Thanks. I tried. Unfortunately, it still does not work.
@Niugg2010 wrote:
Thanks. I tried. Unfortunately, it still does not work.
"does not work" is a very vague statement.
Are you getting error messages? Wrong results?
Show your log.
Show your data.
@Niugg2010 wrote:
Thanks. I tried. Unfortunately, it still does not work.
Doesn't work is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the {i} to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
I ran your code with my suggested change and it "worked" sort of.
string string2 spanner,span,spaniel,span,abc,span,bcc spanner,span,spaniel,abc,spa
If that is the output you get, then you need to look into your function definition such as the length of the character variable on the function statement and making sure that the STRING2 variable has a matching length
Below is what I want.
Below is the result with Function.
I am not sure how to define the variable length in function. Can you please help me to revise the code and post the correct function?
Thanks.
Your original function code was defining the length of WORD.
Did you try do the same thing for STRING2?
After defined length. Now it is correct.
At beginning, I used format to define its length. I did not get correct results.
why format did not work? please advice
Thanks
Why would you use a FORMAT statement to define the length? A FORMAT statement is for attaching special instructions for how to display the value.
SAS will guess what length to define a variable based on how it is first used. So if it is first used in the FORMAT statement then SAS will guess that the length should match the width of the format that is being attached to the variable.
But if the length of the variable is already defined then a FORMAT statement will have no impact. SAS will happily let you use the $100. format with a variable that is only 5 characters long. Or the reverse.
Sorry I still did not understand.
In my case, please see below code. If I define length with format in function. Why the actual length of string2 in dataset B is not 100?
proc fcmp outlib=work.funcs.rem_dup;
function rem_dup(string $) $;
format string2 $100.;
string2=string;
length word $100;
.....
run;
options cmplib=work.funcs;
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=rem_dup(string);
run;
Let's look at you actual data step.
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=rem_dup(string);
run;
No where in there did you tell SAS what length to use for STRING2. So it will guess how to define it based on the fact that you are assigning it the value from your function.
What happens if you define the length of STRING2 in the data step?
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
length string2 $200;
string2=rem_dup(string);
run;
Also what does SAS know about REM_DUP() that can help it figure out what length to use to define STRING2? You didn't tell it what length of string it returns, just that it returns a string.
Did you try adding the length into the FUNCTION statement?
function rem_dup(string $) $ 200;
Tom,
Thanks. Kind of understand.
in function if I defined the length like below, in B dataset the string2 length is also 200.
proc fcmp outlib=work.funcs.rem_dup;
function rem_dup(string $) $;
length string2 $200;
But if I coded in below way, without define string2 length, in B dataset the string2 length is 200, but the results is wrong.
proc fcmp outlib=work.funcs.rem_dup;
function rem_dup(string $) $ 200;
length word $100;
It looks it is necessary to define length in function, otherwise it may give wrong results.
This looks funny. Good to learn.
George
In a normal SAS data step if you do not define the length for a variable SAS will set its length to $8.
Experimenting with your program we can see that in an FCMP function if you do not define a length for a local character variable SAS will set it to $33.
So your original value was getting truncated when you assigned STRING into STRING2.
After that your function's logic worked. The one extra copy of 'span' was removed, but the last word was truncated to just 'spa' so it was unique and stayed.
Tom,
Great. Understand now.
Thanks
George
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: