BookmarkSubscribeRSS Feed
dgaribal
Fluorite | Level 6

Hello, 

 

I have a field that lists all duplicates a case number has (can be anything from 1 duplicate to 15 duplicates). These are in one variable (titled duplicates) separated by |. I want to try to split this duplicate field into 15 fields, with each new field having one duplicate case number.

 

I am new to SAS and have been working at this for a while and can't seem to figure it out. Is there any advice for getting this done? Below is an example of what one of the duplicate observations look like. 

 

duplicate= MFR-1Q19-0039|MFR-2Q19-0081|MFR-2Q19-0082

 

Ideally id like:   var1=MFR-1Q19-0039  var2=MFR-2Q19-0081   var3=MFR-1Q19-0082

 

Thanks in advance!

7 REPLIES 7
Shmuel
Garnet | Level 18

Use next code:

data want;
  set have;
      length var1-var15 $15;   /* adapt to max length */
      array varx  $   var1-var15;
       i =1;
       do until (missing(vx));
            vx = scan(duplicate,i,'|');
            if not missing(vx) then varx(i) = vx;
           i+1;
      end;
     drop vx i;
run;
dgaribal
Fluorite | Level 6

@Shmuel Thank you for your help! I ran this code and it creates the variables, but does not split the duplicate values into the newly created variables/columns (new columns are blank).  Any ideas? Some of the duplicate fields have more duplicates than others; would that affect the code? My current code looks like this: 

 

data want;
set drop_dup;
length var1-var15 $20;
array varx $ var1-var15;
i=1;
do until (missing(vx));
vx= scan(duplicate2,i,'|');
if not missing(vx) then varx(i)= vx;
i+1;
end;
run;

 

Thanks again!!

 

edit: it did also give me this note in the log:

NOTE: Invalid numeric data, 'MFR-2Q19-0056' , at line 1803 column 13.

Shmuel
Garnet | Level 18

You should post the full log. I can't guess what is in line 1803 of the log;

 

I suggest to add some lines to the code for debug. Those lines should be removed at final run.

Please copy the full log, in any issue, using the </> icon window:

data want;
 set drop_dup;
    length vx var1-var15 $20;
    array varx $ var1-var15;
    i=1;
/*DBG*/ putlog duplicate2=;
    do until (missing(vx));
         vx= scan(duplicate2,i,'|');
/*DBG*/ putlog i= vx= ;
         if not missing(vx) then varx(i)= vx;
         i+1;
      end;
run;
dgaribal
Fluorite | Level 6

Ok sorry about that, attached is the log from running the second piece of code you attached. Thanks again for your help!

Shmuel
Garnet | Level 18

If you look at the icons menu you'll find the </> icon. Clicking on it will open a window.

Next time copy the log to this window, click ok and then is is ready to post.

 

anyhow, if a new variable should be char type, like VX which holds the part of the DUPLICAT2 string

then it should be defined in the length statement with $.

 

Change  the line - the length statement to

length vx var1-var15 $20;
Tom
Super User Tom
Super User

Or just simplify the code so the extra variable VX fis not needed.

data want;
  set drop_dup;
  length var1-var15 $20;
  array varx var1-var15;
  do i=1 to dim(varx);
    varx[i]=scan(duplicate2,i,'|');
  end;
  drop i;
run;
Shmuel
Garnet | Level 18

@Tom , I have used VX because the "number has (can be anything from 1 duplicate to 15 duplicates)" and the first null scaned value should leave the loop.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 792 views
  • 1 like
  • 3 in conversation