Solved: Re: how to deal with these data

daisy6 · Posted 12-18-2017 02:07 PM

Hi, I have a data set like:

data test;
input name $10. rate 2. ;
datalines;
I12345 2.3
0.1 1.2
0.2 1.6
0.3 1.5
0.4 1.4
I23456 3.4
0.1 2.3
0.2 1.8
0.3 1.4
0.4 1.5
I34567 2.6
0.1 1.2
0.2 1.5
0.3 1.6
0.4 1.7
;
run;

I want the data set like:

name rate

I12345 2.3
I12345A    1.2
I12345B 1.6
I12345C 1.5
I12345D 1.4
I23456 3.4
I23456A 2.3
I23456B    1.8
I23456C 1.4
I23456D    1.5
I34567 2.6
I34567A 1.2
I34567B   1.5
I34567C 1.6
I34567D   1.7

I appreciate any idea.

antonbcristina · Posted 12-18-2017 02:27 PM

Try using a format and then keep any oldname that doesn't begin with a zero.

proc format;
   value $append '0.1' = 'A'
                 '0.2' = 'B'
                 '0.3' = 'C'
                 '0.4' = 'D';
run;

data test2;
   set test(rename=(name=oldname));
   retain last_name;

   if oldname^=:'0' then do;
      name=oldname;
      last_name=oldname;
   end;
   else name = strip(last_name)||put(oldname,$append.);

   drop last_name oldname;
run;

View solution in original post

mkeintz · Posted 12-18-2017 02:21 PM

Every time you encounter a name that begins with uppercase "I", you want to retain that value as a name root. For subsequent observations you want name to have that name root, with consecutive suffixes of "A", "B", "C", etc:

data test;
input name :$10. rate  ;
datalines;
I12345     2.3
0.1          1.2
0.2          1.6
0.3          1.5
0.4          1.4
I23456    3.4
0.1          2.3
0.2          1.8
0.3          1.4
0.4           1.5
I34567      2.6
0.1            1.2
0.2            1.5
0.3            1.6
0.4             1.7
;
run;

data want (drop=i name_root);
  set test;
  length name_root $10;
  retain name_root;
  array suffix {8} $1 _temporary_ (' ','A','B','C','D','E','F','G');
  i+1;
  if name=: 'I' then do;
     i=1;
    name_root=name;
  end;

  name=cats(name_root,suffix{i});
run;

Notes:

The "=:" comparison operator takes the shortest character operand, and determines whether that matches the beginning of the longer operant. So "if name=: 'I' asks whether name starts with an "I".
The temporary array suffix has initial values. Notes the first initial values is a blank, so when I=1 (when a new root is found) the cats functions appends a blank, for I=2 it's an "A", etc. If you expect more than 8 cases with the same root, make the suffix array bigger, with more initial values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

daisy6 · Posted 12-18-2017 02:30 PM

Thansk for the reply. But the case is not always right. Those beginning with "I " are questions and not always begin with "I". Some with "I", some with "N", some with "M".

art297 · Posted 12-18-2017 02:46 PM

You already have an answer but, if the value (rather than order) of name determines the letter value assigned, I would suggest something like:

data test;
  input name :$10. rate  ;
  datalines;
I12345     2.3
0.1          1.2
0.2          1.6
0.3          1.5
0.4          1.4
M23456    3.4
0.1          2.3
0.2          1.8
0.3          1.4
0.4           1.5
N34567      2.6
0.1            1.2
0.2            1.5
0.3            1.6
0.4             1.7
;
run;

data want (drop=name_root);
  set test;
  length name_root $10;
  retain name_root;
  if anyalpha(name) then do;
    name_root=name;
    name=-3.2;
  end;
  name=cats(name_root,byte(name*10+64));
run;

Art, CEO, AnalystFinder.com

antonbcristina · Posted 12-18-2017 03:28 PM

Hi Art,

Your solution appends an '8' to the name when anyalpha(name)=TRUE. For these cases, I would suggest

    name=-3.2;

so that you're appending a space.

art297 · Posted 12-18-2017 04:06 PM

@antonbcristina: thanks for catching that. I've changed it in my suggested code.

Art, CEO, AnalystFinder.com

antonbcristina · Posted 12-18-2017 02:27 PM

Try using a format and then keep any oldname that doesn't begin with a zero.

proc format;
   value $append '0.1' = 'A'
                 '0.2' = 'B'
                 '0.3' = 'C'
                 '0.4' = 'D';
run;

data test2;
   set test(rename=(name=oldname));
   retain last_name;

   if oldname^=:'0' then do;
      name=oldname;
      last_name=oldname;
   end;
   else name = strip(last_name)||put(oldname,$append.);

   drop last_name oldname;
run;

daisy6 · Posted 12-18-2017 02:35 PM

Thanks for the idea. However, my data not always begin with "I'. It depends on the exam. Different exam has a different suffix. I want a general code to apply all kinds of exams

antonbcristina · Posted 12-18-2017 02:37 PM

I saw your comment and added in the:

if oldname^=:'0' then do;

daisy6 · Posted 12-18-2017 02:41 PM

I test you code and it works. Thank you very much, Merry Christmas!

antonbcristina · Posted 12-18-2017 02:56 PM

If you need something even fancier, you can try using Perl Regular Expressions to capture the pattern of your expected names. In this case, I just tried to match a non-digit (\D) followed by one or more digits (\d?). prxmatch returns the position where the pattern is found, and we would like it to start in position one.

   if prxmatch('/\D\d?/',oldname)=1 then do;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away