Solved: Re: How to create one ID with a specific form from three messy ID colu...

triunk · Posted 05-11-2019 10:10 AM

Hello

I have a problem that cannot wrap my head around it... :

I have this table:

HAVE			WANT
ID1	ID2	ID3	ID
		K02563087	K02563087
0.035501
.0V0339435	K03394375		K03394375
KO3499516	K03499516		K03499516
K03508960	K03339796	K03339796	K03339796
000030164219NCZUTL02	K03415332		K03415332
1E+09	K02748355		K02748355
EWOLFENWAFEN	K03584884		K03584884
EBILHOITE
ERICHARSON115	K03656588		K03656588
K03516138	K01972934		K01972934
K03516140			K03516140

The first three columns are what I have and the fourth column is the ideal scenario.

How would you approach this?

Takis

P.S. The correct format of ID is letter K and then 8 numbers

Shmuel · Posted 05-11-2019 12:37 PM

I have added some validity code:

data want;
 set have;
       array idx $ id1-id3;  /* original data */

     /* assuming ID is the last of valid IDs */
      do i=1 to 3;
           idx(i) = upcase(idx(i));    /* ensure capital K */
           if  substr(idx(i),1,1) = 'K' and
               length(compress(idx(i),'K',kd) = 8  
            then id = idx(i);
     end;
run;

as space is non valid ID then the last valid is choosed.

View solution in original post

Shmuel · Posted 05-11-2019 11:38 AM

Try next logic/code:

data want;
 set have;
       array idx $ id1-id3;  /* original data */

     /* assuming ID is the last of valid IDs */
      do i=1 to 3;
           if  substr(idx(i),1,1) = 'K' then id = idx(i);
     end;
run;

Check:

- can an IDx start with 'K' but have more alphabetic characters ?

- can an IDx have less then 8 digits after the 'K' ?

- Is it always capital 'K' ( never 'k') ?

- Is there any case you will prefer the first IDx instead the last valid ?

triunk · Posted 05-11-2019 12:18 PM

Hi Shmuel,

Thank you for these input and great questions:

Check:

- can an IDx start with 'K' but have more alphabetic characters ?

No it has to start with K and then have numbers if you see in ID1 one of the IDs has letter O instead of 0

- can an IDx have less then 8 digits after the 'K' ?

Nope

- Is it always capital 'K' ( never 'k') ?

yes it should be K although people may type it as k

- Is there any case you will prefer the first IDx instead the last valid ?

yes if it is empty the ID3 I would prefere ID1 or ID2 that has this format

I hope now it is more clear. Thank you again very much!

Shmuel · Posted 05-11-2019 12:37 PM

I have added some validity code:

data want;
 set have;
       array idx $ id1-id3;  /* original data */

     /* assuming ID is the last of valid IDs */
      do i=1 to 3;
           idx(i) = upcase(idx(i));    /* ensure capital K */
           if  substr(idx(i),1,1) = 'K' and
               length(compress(idx(i),'K',kd) = 8  
            then id = idx(i);
     end;
run;

as space is non valid ID then the last valid is choosed.

cgprog · Posted 05-21-2019 09:21 AM

Hi Shmuel I was so excited and accepted your solution too soon.

When I tried to run it it gives me this error:

12813 data want3;
12814 set have;
12815 array idx $ id1-id3; /* original data */
12816
12817 /* assuming ID is the last of valid IDs */
12818 do i=1 to 3;
12819 idx(i) = upcase(idx(i)); /* ensure capital K */
12820 if substr(idx(i),1,1) = 'K' and length(compress(idx(i),'K',kd) = 8

------
72
12821 then id = idx(i);
---- -
388 79
202
ERROR 72-185: The LENGTH function call has too many arguments.

ERROR 388-185: Expecting an arithmetic operator.

ERROR 79-322: Expecting a ).

ERROR 202-322: The option or parameter is not recognized and will be ignored.

12822 end;
12823 run;

Now as you see that I have one more comment that I did not make clear...

If I have in ID1: KO3524846 (letter O instead of number 0) but nothing on ID2 and ID3 how I can include in the code a correction of ID1 to K03524846 (all numbers)

If I have ID1:K03508960 (good format) and different ID2:K03339796 (good also format) how I can include in the code to take ID1 as correct?

I attach the table here again with those two revised examples:

Shmuel · Posted 05-27-2019 07:37 AM

Sorry for my late answer, as I was on vacation.

Just in case you did not overcome yourself the error:

12820 if substr(idx(i),1,1) = 'K' and length(compress(idx(i),'K',kd) = 8

 


------
72
12821 then id = idx(i);

the code line should be:

if substr(idx(i),1,1) = 'K' and length(compress(idx(i)) , 'K' , kd ) = 8
then id = idx(i);

I have added a ) at the right place.

PeterClemmensen · Posted 05-11-2019 11:45 AM

data have;
input (id1-id3)(:$20.);
infile datalines dsd missover dlm=',';
datalines;
,,K02563087
0.035501,,     
.0V0339435,K03394375, 
KO3499516,K03499516, 
K03508960,K03339796,K03339796
000030164219NCZUTL02,K03415332,
1E+09,K02748355, 
EWOLFENWAFEN,K03584884, 
EBILHOITE,,
ERICHARSON115,K03656588, 
K03516138,K01972934, 
K03516140,,
;

data want(drop=i j);
   set have;
   array id{3} $ id1-id3;
   do i=1 to dim(id);
      if char(id[i], 1) ne 'K' then id[i]=" ";
   end;
   do j=dim(id) to 1 by -1;
      if not missing(id[j]) then do;
         newvar=id[j];
         leave;
      end;
   end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

ghosh · Posted 05-11-2019 12:06 PM

WantID = COALESCEC(ID3, ID2, ID1);

Then test for the validity of WantID (ie starting with K followed by 8 numbers)

Docs here for the function: Coalescec function

triunk · Posted 05-11-2019 12:18 PM

a good way to check validity thank you!

ghosh · Posted 05-11-2019 12:22 PM

The validity test is for you to code.
But COALESCEC will give you the ID value you need based on the non blank values of ID3 to ID1. Please note the order of the variables

Shmuel · Posted 05-11-2019 12:22 PM

@ghosh , please pay attention that according to documentation - COALESCE:

"Returns the first nonmissing value from a list of character arguments."

and that it not what wanted according to given sample.

ghosh · Posted 05-11-2019 12:23 PM

@Shmuel
It's actually the COALESCEC fn

From his sample, I understood he wants to pick the last of the IDn values if valid

How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Re: How to create one ID with a specific form from three messy ID columns

Registration is open

SAS Training: Just a Click Away