Help using Base SAS procedures

row sorting

Reply
Occasional Contributor
Posts: 11

row sorting

Hello all,

Could anyone help me to figure out this problem? Thanks in advance!

My original dataset is like this:

IDABC
1234
25.3
3951
45..
5...

I'd like to create a new variable that indicates the order pattern from the minimum to the maximum. So the new dataset should look like this:

IDABCPattern
1234A_B_C
25.3C_A
3951C_B_A
45..A
5....

Any suggestion is appreciated! Thanks!

Super User
Posts: 10,543

Re: row sorting

A couple of details may need some clarification. Is this for only 3 variables or do you need something that works for an abitrary number of values? Are the actual variable names A B C or are you using a placeholder names for example? This is somewhat important as the length of Pattern needs to be set large enough to hold the longest possible string.

Also, rules for tie breaking? What is the result if A and C are the same value? What if all variables have the same value? And do you want an actual period '.' or a missing for pattern when all values are missing?

Occasional Contributor
Posts: 11

Re: row sorting

Ballardw, thanks for bring these to my attention! You are absolutely right. I just made this as a simple example. In my real dataset, I have 8 variables and the actual names are 2-4 letters. To make the 'pattern' variable short, I could use numbers to replace the letters, say if pattern is A_B_C then I could use 1_2_3 instead.

And for the equal value, if A, B, and C are 1, 1, and 2 respectively, then the pattern could be AB_C.

Contributor
Posts: 62

Re: row sorting

ballardw is right; more info is helpful... sounds like you want to retain names of nonmissing variables; maybe  you can build on this...

data have;

input id a b c d;

cards ;

1 2 3 4 .

2 . 5 2 5

3 . . . .

4 . 3 2 .

5 . . 1 7

;

run;

data want;

     set have;

     array _vars a b c d;

     array _varN $ x1-x4;

     do over _varN;

       if _vars ne . then _varN= VNAME (_vars);    

     end;

     pattern=catx ("_",OF X1-x4);

run;

proc print ;

var id a b c d pattern;

run;

Occasional Contributor
Posts: 11

Re: row sorting

Hello Altal,

Thanks for your answer. It is helpful. The new variable 'pattern' you make indicates the pattern of present/absent. But how to make the 'pattern' variable reflect the order?

Contributor
Posts: 62

Re: row sorting

There may be a smarter way, but you can use something like this... still some coded is needed for tied values..

data have;

     input id a b c d;

cards ;

1 2 3 4 .

2 . 5 2 5

3 . . . .

4 . 3 2 .

5 . . 1 7

6 1 1 . .

7 6 5 4 1

; run;

data want;  

     set have;  

     array _vars a b c d;  

     array _varN $ x1-x4;  

     do over _varN;     

          if _vars ne . then _varN= VNAME (_vars);           

     end;

run;

proc transpose data=want out=temp1;

     var a b c d;

     by id;

run;

proc transpose data=want out=temp2;

     var X1-X4;

by id;

run;

data a;

     merge temp1 temp2 (rename=(col1=col2));

     by id;

run;

proc sort data=a;

     by id col1;

run;

proc transpose data=a out=temp3 (drop=_NAME_) prefix=col;

     var col2; by id;

run;

data temp3;

     set temp3; pattern=catx ("_",OF col1-col4);

run;

proc print ;run;

Occasional Contributor
Posts: 11

Re: row sorting

Hi Altal,

Your codes are very helpful! I still have one question about the equal value, for instance, id=2 the pattern should be 'c_bd' or 'c_db', because b=d=5. Is there an easier way to deal with that? Thanks!

Respected Advisor
Posts: 3,777

Re: row sorting

I would probably take the obvious approach.

data have;
  input id a b c d;
cards;
1 2 3 4 .
2 . 5 2 5
3 . . . .
4 . 3 2 .
5 . . 1 7
6 1 1 . .
7 6 5 4 1
;;;;
   run;
proc transpose data=have out=have2;
   by id;
   run;
proc rank data=have2 out=ranks ties=low;
   by id;
   var col1;
   run;
proc sort data=ranks;
   by id col1;
   where not missing(col1);
   run;
data pattern;
   do until(last.id);
      do until(last.col1);
         set ranks;
         by id col1;
         length p pattern $128;
         p = cats(p,_name_);
        
end;
     
pattern = catx('_',pattern,p);
      call missing(p);
      end;
  
drop col1 _name_ p;
   run;
data pattern;
   merge have pattern;
   by id;
   run;
proc print;
  
run;

1234.a_b_c
2.525c_bd
3....
4.32.c_b
5..17c_d
611..ab
76541d_c_b_a


Respected Advisor
Posts: 3,902

Re: row sorting

Just another approach with a single pass through the data:

data have;
  input id a b c d;
cards;
1 2 3 4 .
2 . 5 2 5
3 . . . .
4 . 3 2 .
5 . . 1 7
6 1 1 . .
7 6 5 4 1
;;;;
run;


data want(drop= _Smiley Happy;
  set have;

  length pattern $32.;
  array myarr {*} a b c d;

  if _n_=1 then
    do;
      length _value 8 _varname $32;
      dcl hash h (multidata:'y', ordered:'y', hashexp:3);
      dcl hiter hit ('h');
      _rc=h.defineKey('_value');
      _rc=h.defineData('_varname','_value');
      _rc=h.defineDone();
    end;

  _rc=h.clear();

  do _i=1 to dim(myarr);
    _value=myarr[_i];
    if missing(_value) then continue;
    _varname=vname(myarr[_i]);
    _rc=h.add();
  end;

  _rc = hit.first();
  _value_old=_value;

  do while (_rc = 0);
    if  _value_old=_value then
      do;
        pattern=cats(pattern,_varname);
      end;
    else 
      pattern=cats(pattern,'_',_varname);

    _value_old=_value;
    _rc = hit.next();
  end;

run;

Contributor
Posts: 70

Re: row sorting

Hi ..

Small update to the @data_null_; code..

data pattern;
   do until(last.id);
      do until(last.col1);
         set ranks;
         by id col1;
         length p pattern $128;
         p = catX('_',p,_name_);/*Updated cats(p,_name_) to
catX('_',p,_name_)*/


        
end;
     
pattern = catx('_',pattern,p);
      call missing(p);
      end;
  
drop col1 _name_ p;
   run;


Hope this will helps..

Thanks,Yash.

Ask a Question
Discussion stats
  • 9 replies
  • 444 views
  • 3 likes
  • 6 in conversation