Solved: Re: Result by group

sasuser_8

Hi,

I have this table:

DATA HAVE;
INPUT CATEGORY TYPE$ INV_M1	INV_M2	INV_M3 ;
CARDS;

1 A 5 5 9 
1 B 10 2 2  
2 A 2 2 3 
2 B 6 6 5 
3 A 8 7 9 
3 B 4 9 8 
4 A 10 8 9 
4 B 5 2 1 
;
RUN;

For each CATEGORY and for each month (M1,M2,M3), I have to calculate the following result (TYPE B / TYPE A)

The result should be this:

What function can I use ?

ballardw

One way given that data set:

data want;
   set have;
   by category;
   array in (*) INV_M1	INV_M2	INV_M3;
   array l (*) LINV_M1	LINV_M2	LINV_M3;
   Array r (*) Result_M1 Result_M2 Result_M3;
   LINV_M1 = Lag(INV_M1);
   LINV_M2 = Lag(INV_M2);
   LINV_M3 = Lag(INV_M3);
   if last.category then do;
      do i=1 to dim(In);
         r[i]= in[i]/l[i];
      end;
      output;
   end;
  drop i Inv_: Linv: type;
run;

However the data is poorly structured it would be better to be

Category Month B A. and likely the Months should actually be a date instead of some random number like 1, 2 or 3 Or 25...

Code like this can reshape the data to that nicer form for calculating the result:

proc transpose data=have out=trans (rename=(_name_=month));
   by category type;
   var Inv_m1 Inv_m2 Inv_m3;
run;

proc sort data=trans;
   by category month type;
run;

proc transpose data=trans out=need (drop=_name_);
   by category month;
   id type;
   var col1;
run;

then

data calc;
  set need;
  result= B / A;
run;

Normally I would stop manipulating data and make a report for that set if you really need separate columns for the result such as one of these:

proc report data=calc;
   columns category month,result;
   define category /group;
   define month/across "";
   define result/'';
run;
proc tabulate data=calc;
   class category month;
   var result;
   tables category='',
          month=''*result=''*sum=''
          /box='Category'
   ;
run;

if you really need a poorly structured data set:

proc transpose data=calc out=want2 (drop=_name_)
    prefix=Result_;
  by category;
  id month;
  var result;
run;

View solution in original post

ballardw

One way given that data set:

data want;
   set have;
   by category;
   array in (*) INV_M1	INV_M2	INV_M3;
   array l (*) LINV_M1	LINV_M2	LINV_M3;
   Array r (*) Result_M1 Result_M2 Result_M3;
   LINV_M1 = Lag(INV_M1);
   LINV_M2 = Lag(INV_M2);
   LINV_M3 = Lag(INV_M3);
   if last.category then do;
      do i=1 to dim(In);
         r[i]= in[i]/l[i];
      end;
      output;
   end;
  drop i Inv_: Linv: type;
run;

However the data is poorly structured it would be better to be

Category Month B A. and likely the Months should actually be a date instead of some random number like 1, 2 or 3 Or 25...

Code like this can reshape the data to that nicer form for calculating the result:

proc transpose data=have out=trans (rename=(_name_=month));
   by category type;
   var Inv_m1 Inv_m2 Inv_m3;
run;

proc sort data=trans;
   by category month type;
run;

proc transpose data=trans out=need (drop=_name_);
   by category month;
   id type;
   var col1;
run;

then

data calc;
  set need;
  result= B / A;
run;

Normally I would stop manipulating data and make a report for that set if you really need separate columns for the result such as one of these:

proc report data=calc;
   columns category month,result;
   define category /group;
   define month/across "";
   define result/'';
run;
proc tabulate data=calc;
   class category month;
   var result;
   tables category='',
          month=''*result=''*sum=''
          /box='Category'
   ;
run;

if you really need a poorly structured data set:

proc transpose data=calc out=want2 (drop=_name_)
    prefix=Result_;
  by category;
  id month;
  var result;
run;

sasuser_8

I'm gonna go with the proc transpose solution... Thanks !

Kurt_Bremser

Transpose the INV_Μ: variables to long layout by category and type.

Then sort by category, _NAME_ and type, after which it is easy to compare the two successive values in a DATA step.

proc transpose data=have out=long;
by category type;
var inv_m:;
run;

proc sort data=long;
by category _name_ type;
run;

data want;
set long;
by category _name_;
l_col1= lag(col1);
if last._name_;
result = col1 / l_col1;
drop l_col1 col1;
run;

You can transpose the want dataset to wide,but I recommend to keep the long layout, as it is much more flexible for further analysis.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller

Adding:

I agree with @ballardw the data is poorly structured and would be much easier to handle with a better data structure, which he describes.

--
Paige Miller

sasuser_8

I know the data is structured in a strange way but there is a reason behind it. Can you elaborate on the idea SQL-macro variables because I have to do the calculation for the 12 months... It could be useful.

PaigeMiller

@sasuser_8 wrote:

I know the data is structured in a strange way but there is a reason behind it. Can you elaborate on the idea SQL-macro variables because I have to do the calculation for the 12 months... It could be useful.

Just because the data may arrive in that wide format does not mean you have to do analysis in that format. Doing analysis in the long format makes programming easier, as shown above by @Tom and others. The solution from Tom works without modification even if you have 100 INV_M variables. A word to the wise: work with long data sets.

--
Paige Miller

Tom

To divide two numbers they need to be on the same observation.

Simplest way is to transpose your dataset so that the TYPE values, A and B, are the variable names then MONTH values are stored into a variable.

proc transpose data=have name=month out=have_t;
  by category;
  id type;
  var inv_m1-inv_m3;
run;

data want;
 set have_t;
 result=b/a;
 format result 5.2 ;
run;

You can then easily create a REPORT that looks like your request.

proc report;
  column category month,(a b result);
  define category / group;
  define month / across ' ';
run;

Results

Classroom Training Available!