BookmarkSubscribeRSS Feed
dgaribal
Fluorite | Level 6

Hello, 

 

I have a field that lists all duplicates a case number has (can be anything from 1 duplicate to 15 duplicates). These are in one variable (titled duplicates) separated by |. I want to try to split this duplicate field into 15 fields, with each new field having one duplicate case number.

 

I am new to SAS and have been working at this for a while and can't seem to figure it out. Is there any advice for getting this done? Below is an example of what one of the duplicate observations look like. 

 

duplicate= MFR-1Q19-0039|MFR-2Q19-0081|MFR-2Q19-0082

 

Ideally id like:   var1=MFR-1Q19-0039  var2=MFR-2Q19-0081   var3=MFR-1Q19-0082

 

Thanks in advance!

7 REPLIES 7
Shmuel
Garnet | Level 18

Use next code:

data want;
  set have;
      length var1-var15 $15;   /* adapt to max length */
      array varx  $   var1-var15;
       i =1;
       do until (missing(vx));
            vx = scan(duplicate,i,'|');
            if not missing(vx) then varx(i) = vx;
           i+1;
      end;
     drop vx i;
run;
dgaribal
Fluorite | Level 6

@Shmuel Thank you for your help! I ran this code and it creates the variables, but does not split the duplicate values into the newly created variables/columns (new columns are blank).  Any ideas? Some of the duplicate fields have more duplicates than others; would that affect the code? My current code looks like this: 

 

data want;
set drop_dup;
length var1-var15 $20;
array varx $ var1-var15;
i=1;
do until (missing(vx));
vx= scan(duplicate2,i,'|');
if not missing(vx) then varx(i)= vx;
i+1;
end;
run;

 

Thanks again!!

 

edit: it did also give me this note in the log:

NOTE: Invalid numeric data, 'MFR-2Q19-0056' , at line 1803 column 13.

Shmuel
Garnet | Level 18

You should post the full log. I can't guess what is in line 1803 of the log;

 

I suggest to add some lines to the code for debug. Those lines should be removed at final run.

Please copy the full log, in any issue, using the </> icon window:

data want;
 set drop_dup;
    length vx var1-var15 $20;
    array varx $ var1-var15;
    i=1;
/*DBG*/ putlog duplicate2=;
    do until (missing(vx));
         vx= scan(duplicate2,i,'|');
/*DBG*/ putlog i= vx= ;
         if not missing(vx) then varx(i)= vx;
         i+1;
      end;
run;
dgaribal
Fluorite | Level 6

Ok sorry about that, attached is the log from running the second piece of code you attached. Thanks again for your help!

Shmuel
Garnet | Level 18

If you look at the icons menu you'll find the </> icon. Clicking on it will open a window.

Next time copy the log to this window, click ok and then is is ready to post.

 

anyhow, if a new variable should be char type, like VX which holds the part of the DUPLICAT2 string

then it should be defined in the length statement with $.

 

Change  the line - the length statement to

length vx var1-var15 $20;
Tom
Super User Tom
Super User

Or just simplify the code so the extra variable VX fis not needed.

data want;
  set drop_dup;
  length var1-var15 $20;
  array varx var1-var15;
  do i=1 to dim(varx);
    varx[i]=scan(duplicate2,i,'|');
  end;
  drop i;
run;
Shmuel
Garnet | Level 18

@Tom , I have used VX because the "number has (can be anything from 1 duplicate to 15 duplicates)" and the first null scaned value should leave the loop.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1184 views
  • 1 like
  • 3 in conversation