BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Niugg2010
Obsidian | Level 7

I want to built a function to remove the duplicates in a string. In data step, the logic works well, however when I transfer it into function, it did not work. Does someone help me to figure out? thanks

 

Below is the data step:

*******************************************************************

data a;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=string;

length word $100;
i = 2;
do while(scan(string2, i, ',') ^= '');
word = scan(string2, i, ',');
do j = 1 to i - 1;
if word = scan(string2, j, ',') then do;
start = findw(string2, word, ',', findw(string2, word, ',', 't') + 1, 't');
string2 = cats(substr(string2, 1, start - 2), substr(string2, start + length(word)));
end;
end;
i = i + 1;
end;


keep string string2;
run;

**************************************************************************

 

Below is the function I want to built:

***************************************************

proc fcmp outlib=work.funcs.rem_dup;
function rem_dup(string $) $;
string2=string;


length word $100;
i = 2;
do while(scan(string2, i, ',') ^= '');
word = scan(string2, i, ',');
do j = 1 to i - 1;
if word = scan(string2, j, ',') then do;
start = findw(string2, word, ',', findw(string2, word, ',', 't') + 1, 't');
string2 = cats(substr(string2, 1, start - 2), substr(string2, start + length(word)));
end;
end;
i = i + 1;
end;


return(string2);
endsub;
run;

 **************************************************

 

Call the function:

 

**************************************
options cmplib=(work work.funcs);
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=rem_dup(string);
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Let's look at you actual data step.

data B;
  string = "spanner,span,spaniel,span,abc,span,bcc";
  string2=rem_dup(string);
run;

No where in there did you tell SAS what length to use for STRING2.  So it will guess how to define it based on the fact that you are assigning it the value from your function.

What happens if you define the length of STRING2 in the data step?

data B;
  string = "spanner,span,spaniel,span,abc,span,bcc";
  length string2 $200;
  string2=rem_dup(string);
run;

Also what does SAS know about REM_DUP() that can help it figure out what length to use to define STRING2?  You didn't tell it what length of string it returns, just that it returns a string.

 

Did you try adding the length into the FUNCTION statement?

function rem_dup(string $) $ 200;

 

View solution in original post

13 REPLIES 13
ballardw
Super User

Your log would have

30   options cmplib=(work work.funcs);

31   data B;
32   string = "spanner,span,spaniel,span,abc,span,bcc";
33   string2=rem_dup(string);
NOTE: No CMP or C functions found in library work.
34   run;

Try instead

 

options cmplib=work.funcs;
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=rem_dup(string);
run;

You use APPEND or INSERT to add to an existing cmplib path I believe.

 

Niugg2010
Obsidian | Level 7

Thanks. I tried. Unfortunately, it still does not work. 

Tom
Super User Tom
Super User

@Niugg2010 wrote:

Thanks. I tried. Unfortunately, it still does not work. 


"does not work" is a very vague statement.

Are you getting error messages?  Wrong results?

Show your log.

Show your data.

 

ballardw
Super User

@Niugg2010 wrote:

Thanks. I tried. Unfortunately, it still does not work. 


Doesn't work is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the {i} to maintain formatting of error messages.

No output? Post any log in a code box.

Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

I ran your code with my suggested change and it "worked" sort of.

                string                              string2

spanner,span,spaniel,span,abc,span,bcc    spanner,span,spaniel,abc,spa



If that is the output you get, then you need to look into your function definition such as the length of the character variable on the function statement and making sure that the STRING2 variable has a matching length

 

Niugg2010
Obsidian | Level 7

Below is what I want.

1.JPG

 

Below is the result with Function.

2.JPG

 

I am not sure how to define the variable length in function. Can you please help me to revise the code and post the correct function?

 

Thanks. 

 

 

 

Tom
Super User Tom
Super User

Your original function code was defining the length of WORD.

Did you try do the same thing for STRING2?

 

Niugg2010
Obsidian | Level 7

After defined length. Now it is correct. 

At beginning, I used format to define its length. I did not get correct results.

why format did not work? please advice

 

Thanks

 

 

Tom
Super User Tom
Super User

Why would you use a FORMAT statement to define the length?  A FORMAT statement is for attaching special instructions for how to display the value.

 

SAS will guess what length to define a variable based on how it is first used. So if it is first used in the FORMAT statement then SAS will guess that the length should match the width of the format that is being attached to the variable. 

 

But if the length of the variable is already defined then a FORMAT statement will have no impact. SAS will happily let you use the $100. format with a variable that is only 5 characters long. Or the reverse.

 

Niugg2010
Obsidian | Level 7

Sorry I still did not understand.

In my case, please see below code.  If I define length with format in function. Why the actual length of string2 in dataset B is not 100?

 

proc fcmp outlib=work.funcs.rem_dup;
function rem_dup(string $) $;
format string2 $100.;

string2=string;
length word $100;
.....
run;


options cmplib=work.funcs;
data B;
string = "spanner,span,spaniel,span,abc,span,bcc";
string2=rem_dup(string);
run;

Tom
Super User Tom
Super User

Let's look at you actual data step.

data B;
  string = "spanner,span,spaniel,span,abc,span,bcc";
  string2=rem_dup(string);
run;

No where in there did you tell SAS what length to use for STRING2.  So it will guess how to define it based on the fact that you are assigning it the value from your function.

What happens if you define the length of STRING2 in the data step?

data B;
  string = "spanner,span,spaniel,span,abc,span,bcc";
  length string2 $200;
  string2=rem_dup(string);
run;

Also what does SAS know about REM_DUP() that can help it figure out what length to use to define STRING2?  You didn't tell it what length of string it returns, just that it returns a string.

 

Did you try adding the length into the FUNCTION statement?

function rem_dup(string $) $ 200;

 

Niugg2010
Obsidian | Level 7

Tom,

 

     Thanks. Kind of understand.

     in function if I defined the length like below, in B dataset the string2 length is also 200.

 

proc fcmp outlib=work.funcs.rem_dup;
function rem_dup(string $) $;

length string2 $200;

 

But if I coded in below way, without define string2 length, in B dataset the string2 length is 200, but the results is wrong.

proc fcmp outlib=work.funcs.rem_dup;
function rem_dup(string $) $ 200;
length word $100;

 

It looks it is necessary to define length in function, otherwise it may give wrong results.

This looks funny. Good to learn.

 

George

Tom
Super User Tom
Super User

In a normal SAS data step if you do not define the length for a variable SAS will set its length to $8.

Experimenting with your program we can see that in an FCMP function if you do not define a length for a local character variable SAS will set it to $33.  

 

So your original value was getting truncated when you assigned STRING into STRING2.

 

After that your function's logic worked. The one extra copy of 'span' was removed, but the last word was truncated to just 'spa' so it was unique and stayed.

Niugg2010
Obsidian | Level 7

Tom,

 

      Great. Understand now.

 

Thanks

George

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3322 views
  • 1 like
  • 3 in conversation