DATA Step, Macro, Functions and more

how to deal with these data

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 102
Accepted Solution

how to deal with these data

Hi, I have a data set like:

 


data test;
input name $10. rate 2. ;
datalines;
I12345     2.3
0.1          1.2
0.2          1.6
0.3          1.5
0.4          1.4
I23456    3.4
0.1          2.3
0.2          1.8
0.3          1.4
0.4           1.5
I34567      2.6
0.1            1.2
0.2            1.5
0.3            1.6
0.4             1.7
;
run;

 

I want the data set like:

name         rate

I12345       2.3
I12345A     1.2
I12345B     1.6
I12345C     1.5
I12345D     1.4
I23456        3.4
I23456A      2.3
I23456B     1.8
I23456C     1.4
I23456D     1.5
I34567        2.6
I34567A      1.2
I34567B      1.5
I34567C     1.6
I34567D      1.7

 

I appreciate any idea.

 

 


Accepted Solutions
Solution
‎12-18-2017 02:41 PM
SAS Employee
Posts: 24

Re: how to deal with these data

[ Edited ]

Try using a format and then keep any oldname that doesn't begin with a zero.

 

proc format;
   value $append '0.1' = 'A'
                 '0.2' = 'B'
                 '0.3' = 'C'
                 '0.4' = 'D';
run;

data test2;
   set test(rename=(name=oldname));
   retain last_name;

   if oldname^=:'0' then do;
      name=oldname;
      last_name=oldname;
   end;
   else name = strip(last_name)||put(oldname,$append.);

   drop last_name oldname;
run;

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,312

Re: how to deal with these data

Every time you encounter a name that begins with uppercase "I", you want to retain that value as a  name root.  For subsequent observations you want name to have that name root, with consecutive suffixes of "A", "B", "C", etc:

 

data test;
input name :$10. rate  ;
datalines;
I12345     2.3
0.1          1.2
0.2          1.6
0.3          1.5
0.4          1.4
I23456    3.4
0.1          2.3
0.2          1.8
0.3          1.4
0.4           1.5
I34567      2.6
0.1            1.2
0.2            1.5
0.3            1.6
0.4             1.7
;
run;

data want (drop=i name_root);
  set test;
  length name_root $10;
  retain name_root;
  array suffix {8} $1 _temporary_ (' ','A','B','C','D','E','F','G');
  i+1;
  if name=: 'I' then do;
     i=1;
    name_root=name;
  end;

  name=cats(name_root,suffix{i});
run;

 

Notes:

  1. The "=:" comparison operator takes the shortest character operand, and determines whether that matches the beginning of the longer operant.  So   "if name=: 'I' asks whether name starts with an "I".
  2. The temporary array suffix has initial values.  Notes the first initial values is a blank, so when I=1  (when a new root is found) the cats functions appends a blank,  for I=2 it's an "A", etc.  If you expect more than 8 cases with the same root, make the suffix array bigger, with more initial values.
Frequent Contributor
Posts: 102

Re: how to deal with these data

Thansk for the reply. But the  case is not always right.  Those beginning with "I " are questions and not always begin with "I". Some with "I", some with "N", some with "M".

 

 

PROC Star
Posts: 8,150

Re: how to deal with these data

[ Edited ]

You already have an answer but, if the value (rather than order) of name determines the letter value assigned, I would suggest something like:

data test;
  input name :$10. rate  ;
  datalines;
I12345     2.3
0.1          1.2
0.2          1.6
0.3          1.5
0.4          1.4
M23456    3.4
0.1          2.3
0.2          1.8
0.3          1.4
0.4           1.5
N34567      2.6
0.1            1.2
0.2            1.5
0.3            1.6
0.4             1.7
;
run;

data want (drop=name_root);
  set test;
  length name_root $10;
  retain name_root;
  if anyalpha(name) then do;
    name_root=name;
    name=-3.2;
  end;
  name=cats(name_root,byte(name*10+64));
run;

Art, CEO, AnalystFinder.com

 

SAS Employee
Posts: 24

Re: how to deal with these data

Hi Art,

 

Your solution appends an '8' to the name when anyalpha(name)=TRUE. For these cases, I would suggest

 

    name=-3.2;

 so that you're appending a space.

PROC Star
Posts: 8,150

Re: how to deal with these data

Posted in reply to antonbcristina

@antonbcristina: thanks for catching that. I've changed it in my suggested code.

 

Art, CEO, AnalystFinder.com

 

Solution
‎12-18-2017 02:41 PM
SAS Employee
Posts: 24

Re: how to deal with these data

[ Edited ]

Try using a format and then keep any oldname that doesn't begin with a zero.

 

proc format;
   value $append '0.1' = 'A'
                 '0.2' = 'B'
                 '0.3' = 'C'
                 '0.4' = 'D';
run;

data test2;
   set test(rename=(name=oldname));
   retain last_name;

   if oldname^=:'0' then do;
      name=oldname;
      last_name=oldname;
   end;
   else name = strip(last_name)||put(oldname,$append.);

   drop last_name oldname;
run;

 

Frequent Contributor
Posts: 102

Re: how to deal with these data

Posted in reply to antonbcristina

Thanks for the idea. However, my data not always begin with "I'. It depends on the exam. Different exam has a different suffix. I want a general code to apply all kinds of exams

SAS Employee
Posts: 24

Re: how to deal with these data

I saw your comment and added in the:

 

if oldname^=:'0' then do;
Frequent Contributor
Posts: 102

Re: how to deal with these data

Posted in reply to antonbcristina

I test you code and it works. Thank you very much, Merry Christmas!

SAS Employee
Posts: 24

Re: how to deal with these data

If you need something even fancier, you can try using Perl Regular Expressions to capture the pattern of your expected names. In this case, I just tried to match a non-digit (\D) followed by one or more digits (\d?). prxmatch returns the position where the pattern is found, and we would like it to start in position one.  

 

   if prxmatch('/\D\d?/',oldname)=1 then do;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 306 views
  • 2 likes
  • 4 in conversation