Hello,
I have a field that lists all duplicates a case number has (can be anything from 1 duplicate to 15 duplicates). These are in one variable (titled duplicates) separated by |. I want to try to split this duplicate field into 15 fields, with each new field having one duplicate case number.
I am new to SAS and have been working at this for a while and can't seem to figure it out. Is there any advice for getting this done? Below is an example of what one of the duplicate observations look like.
duplicate= MFR-1Q19-0039|MFR-2Q19-0081|MFR-2Q19-0082
Ideally id like: var1=MFR-1Q19-0039 var2=MFR-2Q19-0081 var3=MFR-1Q19-0082
Thanks in advance!
Use next code:
data want;
set have;
length var1-var15 $15; /* adapt to max length */
array varx $ var1-var15;
i =1;
do until (missing(vx));
vx = scan(duplicate,i,'|');
if not missing(vx) then varx(i) = vx;
i+1;
end;
drop vx i;
run;
@Shmuel Thank you for your help! I ran this code and it creates the variables, but does not split the duplicate values into the newly created variables/columns (new columns are blank). Any ideas? Some of the duplicate fields have more duplicates than others; would that affect the code? My current code looks like this:
data want;
set drop_dup;
length var1-var15 $20;
array varx $ var1-var15;
i=1;
do until (missing(vx));
vx= scan(duplicate2,i,'|');
if not missing(vx) then varx(i)= vx;
i+1;
end;
run;
Thanks again!!
edit: it did also give me this note in the log:
NOTE: Invalid numeric data, 'MFR-2Q19-0056' , at line 1803 column 13.
You should post the full log. I can't guess what is in line 1803 of the log;
I suggest to add some lines to the code for debug. Those lines should be removed at final run.
Please copy the full log, in any issue, using the </> icon window:
data want;
set drop_dup;
length vx var1-var15 $20;
array varx $ var1-var15;
i=1;
/*DBG*/ putlog duplicate2=;
do until (missing(vx));
vx= scan(duplicate2,i,'|');
/*DBG*/ putlog i= vx= ;
if not missing(vx) then varx(i)= vx;
i+1;
end;
run;
Ok sorry about that, attached is the log from running the second piece of code you attached. Thanks again for your help!
If you look at the icons menu you'll find the </> icon. Clicking on it will open a window.
Next time copy the log to this window, click ok and then is is ready to post.
anyhow, if a new variable should be char type, like VX which holds the part of the DUPLICAT2 string
then it should be defined in the length statement with $.
Change the line - the length statement to
length vx var1-var15 $20;
Or just simplify the code so the extra variable VX fis not needed.
data want;
set drop_dup;
length var1-var15 $20;
array varx var1-var15;
do i=1 to dim(varx);
varx[i]=scan(duplicate2,i,'|');
end;
drop i;
run;
@Tom , I have used VX because the "number has (can be anything from 1 duplicate to 15 duplicates)" and the first null scaned value should leave the loop.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.