Hello
I have a problem that cannot wrap my head around it... :
I have this table:
HAVE | WANT | ||
ID1 | ID2 | ID3 | ID |
K02563087 | K02563087 | ||
0.035501 | |||
.0V0339435 | K03394375 | K03394375 | |
KO3499516 | K03499516 | K03499516 | |
K03508960 | K03339796 | K03339796 | K03339796 |
000030164219NCZUTL02 | K03415332 | K03415332 | |
1E+09 | K02748355 | K02748355 | |
EWOLFENWAFEN | K03584884 | K03584884 | |
EBILHOITE | |||
ERICHARSON115 | K03656588 | K03656588 | |
K03516138 | K01972934 | K01972934 | |
K03516140 | K03516140 |
The first three columns are what I have and the fourth column is the ideal scenario.
How would you approach this?
Takis
P.S. The correct format of ID is letter K and then 8 numbers
I have added some validity code:
data want;
set have;
array idx $ id1-id3; /* original data */
/* assuming ID is the last of valid IDs */
do i=1 to 3;
idx(i) = upcase(idx(i)); /* ensure capital K */
if substr(idx(i),1,1) = 'K' and
length(compress(idx(i),'K',kd) = 8
then id = idx(i);
end;
run;
as space is non valid ID then the last valid is choosed.
Try next logic/code:
data want;
set have;
array idx $ id1-id3; /* original data */
/* assuming ID is the last of valid IDs */
do i=1 to 3;
if substr(idx(i),1,1) = 'K' then id = idx(i);
end;
run;
Check:
- can an IDx start with 'K' but have more alphabetic characters ?
- can an IDx have less then 8 digits after the 'K' ?
- Is it always capital 'K' ( never 'k') ?
- Is there any case you will prefer the first IDx instead the last valid ?
Hi Shmuel,
Thank you for these input and great questions:
Check:
- can an IDx start with 'K' but have more alphabetic characters ?
No it has to start with K and then have numbers if you see in ID1 one of the IDs has letter O instead of 0
- can an IDx have less then 8 digits after the 'K' ?
Nope
- Is it always capital 'K' ( never 'k') ?
yes it should be K although people may type it as k
- Is there any case you will prefer the first IDx instead the last valid ?
yes if it is empty the ID3 I would prefere ID1 or ID2 that has this format
I hope now it is more clear. Thank you again very much!
I have added some validity code:
data want;
set have;
array idx $ id1-id3; /* original data */
/* assuming ID is the last of valid IDs */
do i=1 to 3;
idx(i) = upcase(idx(i)); /* ensure capital K */
if substr(idx(i),1,1) = 'K' and
length(compress(idx(i),'K',kd) = 8
then id = idx(i);
end;
run;
as space is non valid ID then the last valid is choosed.
Hi Shmuel I was so excited and accepted your solution too soon.
When I tried to run it it gives me this error:
12813 data want3;
12814 set have;
12815 array idx $ id1-id3; /* original data */
12816
12817 /* assuming ID is the last of valid IDs */
12818 do i=1 to 3;
12819 idx(i) = upcase(idx(i)); /* ensure capital K */
12820 if substr(idx(i),1,1) = 'K' and length(compress(idx(i),'K',kd) = 8
------
72
12821 then id = idx(i);
---- -
388 79
202
ERROR 72-185: The LENGTH function call has too many arguments.
ERROR 388-185: Expecting an arithmetic operator.
ERROR 79-322: Expecting a ).
ERROR 202-322: The option or parameter is not recognized and will be ignored.
12822 end;
12823 run;
Now as you see that I have one more comment that I did not make clear...
If I have in ID1: KO3524846 (letter O instead of number 0) but nothing on ID2 and ID3 how I can include in the code a correction of ID1 to K03524846 (all numbers)
If I have ID1:K03508960 (good format) and different ID2:K03339796 (good also format) how I can include in the code to take ID1 as correct?
I attach the table here again with those two revised examples:
Sorry for my late answer, as I was on vacation.
Just in case you did not overcome yourself the error:
12820 if substr(idx(i),1,1) = 'K' and length(compress(idx(i),'K',kd) = 8 ------ 72 12821 then id = idx(i);
the code line should be:
if substr(idx(i),1,1) = 'K' and length(compress(idx(i)) , 'K' , kd ) = 8
then id = idx(i);
I have added a ) at the right place.
data have;
input (id1-id3)(:$20.);
infile datalines dsd missover dlm=',';
datalines;
,,K02563087
0.035501,,
.0V0339435,K03394375,
KO3499516,K03499516,
K03508960,K03339796,K03339796
000030164219NCZUTL02,K03415332,
1E+09,K02748355,
EWOLFENWAFEN,K03584884,
EBILHOITE,,
ERICHARSON115,K03656588,
K03516138,K01972934,
K03516140,,
;
data want(drop=i j);
set have;
array id{3} $ id1-id3;
do i=1 to dim(id);
if char(id[i], 1) ne 'K' then id[i]=" ";
end;
do j=dim(id) to 1 by -1;
if not missing(id[j]) then do;
newvar=id[j];
leave;
end;
end;
run;
WantID = COALESCEC(ID3, ID2, ID1);
Then test for the validity of WantID (ie starting with K followed by 8 numbers)
Docs here for the function: Coalescec function
a good way to check validity thank you!
The validity test is for you to code.
But COALESCEC will give you the ID value you need based on the non blank values of ID3 to ID1. Please note the order of the variables
@ghosh , please pay attention that according to documentation - COALESCE:
"Returns the first nonmissing value from a list of character arguments."
and that it not what wanted according to given sample.
@Shmuel
It's actually the COALESCEC fn
From his sample, I understood he wants to pick the last of the IDn values if valid
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.