DATA Step, Macro, Functions and more

how to remove duplicate values with different lengths

Reply
Super Contributor
Posts: 272

how to remove duplicate values with different lengths

Dear,

 

In my raw data (data1)the following values present. I used this code1 to get ouput I need (data2)

 

code1:

if length(term) < 200 then TERM1=strip(term);
Else if length(term) >= 200 then do;
TERM 1= strip(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));
end;

 

Then I need to remove the duplicate vales:

code2 :

 

Proc sort data=data2 out=data3 nodupkey;
by id term1;
run;

 

The data3 has both obs, eventhough the OBS have same id and term1 values.

 

I think this is due to different lengths.

In data1 the term length s are 193 and 209

In data2, the term1 lengths are 193 and 194.

 

How to get same lengths for the values to remove the duplicate values. Please help. Thank you.

 

 

 

data data1;
input id term $628;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
Duplicate data
;

 

data data2:
input id term1 $200;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;

 

Trusted Advisor
Posts: 1,401

Re: how to remove duplicate values with different lengths

Try using function compbl instead strip:

code1:
if length(term) < 200 then TERM1=compbl(term);
Else
TERM1= compbl(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));

 

 

Super Contributor
Posts: 272

Re: how to remove duplicate values with different lengths

Thank you for the support. But it did not work. The lengths of term1 still shows 193 and194.

 

In my program, one time I used "options varlenchk=nowarn;". Does this effect compbl function. Thank you 

Trusted Advisor
Posts: 1,401

Re: how to remove duplicate values with different lengths

I'm short in time now to run it myself. Run next code and check in log.

Especialy check proc compare output ?

data data1;
input id term $628;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;
run;
 
data data2:
input id term1 $200;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;
run;

proc compare base=data1 compare=data2; run;

%macro ex;
len=length(term);
put _N_= ' Before ' len=;
if length(term) < 200 then TERM1=compbl(term);
Else
TERM1= compbl(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));
len=length(term);
put _N_= ' After ' len=;
%mend ex;

data1x;
 set data1;
     %ex;
run;

data2x;
 set data2;
     %ex;
run;

Proc sort data=data1x out=data1y nodupkey;
by id term1;
run;

Proc sort data=data2x out=data2y nodupkey; by id term1; run;

 

Trusted Advisor
Posts: 1,401

Re: how to remove duplicate values with different lengths

I have run the code I sent to you. The code is mostly copy/paste of code you posted.

There were errors - listed below:

1) You missed a dot after the informat - should be:  $628.  $200.

2) The length of TERM is 193 charcters, less than length defined 628.

    To avoid the note: "NOTE: SAS went to a new line when INPUT statement reached past the end of a line"

    I added code line:  infile datalines truncover; 

3) I fixed some more mine typos.

 

Run next fixed code: the data is identical and duplicaes were filtered:

data data1;
infile datalines truncover;
input id term $628.;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;
run;
 
data data2;
infile datalines truncover;
input id term1 $200.;
datalines;
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
1 Subject 00000000's week 1 OO sample was drawn at 0000 post study dose at 0000 which was not within the required 0-0 hours after study dose. Site retrained on following protocol for OO sampling.
;
run;

proc compare base=data1 compare=data2(rename=(term1=term)); run;

%macro ex(var);
len=length(&var);
put _N_= ' Before ' len=;
if length(&var) < 200 then TERM1=compbl(&var);
Else
TERM1= compbl(prxChange("s/(.{1,200})\s.*/\1/os", 1, &var));
len=length(&var);
put _N_= ' After ' len=;
%mend ex;

data data1x;
 set data1;
     %ex(term);
run;

data data2x;
 set data2;
     %ex(term1);
run;

data data3;
  set data1x data2x;
run;
Proc sort data=data3 out=data4 nodupkey;
by id term1;
run;
Trusted Advisor
Posts: 1,401

Re: how to remove duplicate values with different lengths

beyond there is an extra blank between TERM and 1 - in line:
TERM 1= strip(prxChange("s/(.{1,200})\s.*/\1/os", 1, term));
Ask a Question
Discussion stats
  • 5 replies
  • 116 views
  • 0 likes
  • 2 in conversation