DATA Step, Macro, Functions and more

Single Mining View

Accepted Solution Solved
Reply
Contributor
Posts: 57
Accepted Solution

Single Mining View

Hi All,

I am wondering if its possible to solve the following problem in SAS:

I have a dataset as follows -

id var1 var2
8250 1 23987
8250 45 13841
8250 79 13223
8250 24 10031
8250 54 8849
8250 88 8284
8250 51 4733
8250 71 4109
14984 80 1695
14984 55 195
14984 83 184
14984 26 166
14984 84 141
14984 40 127
14984 17 122
14984 96 120
14984 10 96



I just would like to have a unique record per column id which takes the max value from var 2 and gives -

id var1_2
8250 1 45 79 24 54
14984 80 55 83 26 84

 

 

Is thsi possible in Base SAS?

 

KR

SK


Accepted Solutions
Solution
‎07-25-2016 11:50 PM
Super User
Posts: 10,044

Re: Single Mining View

Posted in reply to Siddharth123
So you want the max five value for each ID ?



data have; 
infile cards expandtabs truncover;
input id	var1	var2;
datalines;
8250	1	23987
8250	51	4733
8250	45	13841
8250	71	4109
8250	24	10031
8250	54	8849
8250	88	8284
8250	79	13223
14984	10	96
14984	55	195
14984	26	166
14984	84	141
14984	40	127
14984	83	184
14984	17	122
14984	80	1695
14984	96	120
;
run;
proc sort data=have;
 by id descending var2;
run;
data temp;
 set have;
 by id;
 if first.id then n=0;
 n+1;
 if n le 5;
run;
data want;
 length var1_2 $ 2000;
 do until(last.id);
  set temp;
  by id;
  var1_2=catx(' ',var1_2,var1);
 end; 
 keep id var1_2;
run;


View solution in original post


All Replies
Super Contributor
Posts: 298

Re: Single Mining View

[ Edited ]
Posted in reply to Siddharth123

[1] Your input data seems to be sorted both ID and VAR2. Here is a solution which requires data to be sorted by ID alone.

[2] You seem to want the first 5 largest values of Var2, although it is not mentioned. The present solution can handle as many values as you want.

[3] An array approch is used. It requires the largest ID-Group size which is 9 in your data. Again this size can be done programmatically. Proc Sql may be used but I use a data step which will be faster than SQL when data set is large.

[4] Your input data set is slightly disordered by Var2 within ID.

Here is the program:

 

data have; 
input id	var1	var2;
datalines;
8250	1	23987
8250	51	4733
8250	45	13841
8250	71	4109
8250	24	10031
8250	54	8849
8250	88	8284
8250	79	13223
14984	10	96
14984	55	195
14984	26	166
14984	84	141
14984	40	127
14984	83	184
14984	17	122
14984	80	1695
14984	96	120

;
run;

data _null_;
   retain max  0;
   do i = 1 by 1 until(last.id);
      set have end = eof;
      by id;
      
   end;
   if i > max then max = i;
   if eof then call symputx('groupsize', max);
run;


%let h = 5;

data want;
   array k[&groupsize] _temporary_;
   array m[&groupsize] _temporary_;
   length var1_2 $50;
   do i = 1 by 1 until(last.id);
      set have;
      by id;
      k[i] = var2;
      m[i] = var1;
      if first.id then call missing(var1_2);
   end;
   call catx(' ', var1_2, id);
   do i = 1 to &h;
      b = largest(i, of k[*]);
      v = m[whichn(b, of k[*])];
      call catx(' ', var1_2, v);
   end;
keep var1_2;   
run;
 

I hope you will not have any problem to understand the program and this will be an acceptable solution to you.

 

Edited . . . . .

 

In the above program ID goes as part of new variable, VAR1_2. If you don't want it, delete

call catx(' ', var1_2, id);

and change the last statement to

 

keep ID var1_2;

 

If you don't want a general program but just a simple program for the given data(where ID and VAR2 are sorted), few statements will do as in:

data want1;
   do i = 1 by 1 until(last.id);
      set have;
      by id;
   length var1_2 $50;
   if i < 6 then call catx(' ', var1_2, var1);
   else continue;
   end;
keep id var1_2;
run;

 

Solution
‎07-25-2016 11:50 PM
Super User
Posts: 10,044

Re: Single Mining View

Posted in reply to Siddharth123
So you want the max five value for each ID ?



data have; 
infile cards expandtabs truncover;
input id	var1	var2;
datalines;
8250	1	23987
8250	51	4733
8250	45	13841
8250	71	4109
8250	24	10031
8250	54	8849
8250	88	8284
8250	79	13223
14984	10	96
14984	55	195
14984	26	166
14984	84	141
14984	40	127
14984	83	184
14984	17	122
14984	80	1695
14984	96	120
;
run;
proc sort data=have;
 by id descending var2;
run;
data temp;
 set have;
 by id;
 if first.id then n=0;
 n+1;
 if n le 5;
run;
data want;
 length var1_2 $ 2000;
 do until(last.id);
  set temp;
  by id;
  var1_2=catx(' ',var1_2,var1);
 end; 
 keep id var1_2;
run;


Super Contributor
Posts: 298

Re: Single Mining View

Posted in reply to Siddharth123

You seemed to have not gone through my answers. Just see the end of my answer. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 249 views
  • 0 likes
  • 3 in conversation