BookmarkSubscribeRSS Feed
lmyers2
Obsidian | Level 7

Hello,

 

I'm using SAS 9.4 and need to create a variable (mean payment) using information from two different arrays. The example data below will hopefully highlight what I mean. There are 2 arrays: Defendant and Payment. Payment 1 refers to payment on behalf of Defendant 1; Payment 2 refers to payment on behalf of Defendant 2, etc. I'm interested in the mean payment when one was specifically made on behalf of a Student. The correct answer in this example data would be ($100+$200)/2=$150. How do I write SAS code extracting this information?  Thanks for the help!

 

Example data

Case  Defendant1 Defendant2 Defendant3  Payment1 Payment2 Payment3

1        Student        .                  .                    $100          .                .

2        Physician     Student       .                     0               $200        .

3        Physician     Nurse          Student        $100           $300        0

 

 

 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@lmyers2 wrote:

Hello,

 

I'm using SAS 9.4 and need to create a variable (mean payment) using information from two different arrays. The example data below will hopefully highlight what I mean. There are 2 arrays: Defendant and Payment. Payment 1 refers to payment on behalf of Defendant 1; Payment 2 refers to payment on behalf of Defendant 2, etc. I'm interested in the mean payment when one was specifically made on behalf of a Student. The correct answer in this example data would be ($100+$200)/2=$150. How do I write SAS code extracting this information?  Thanks for the help!

 

Example data

Case  Defendant1 Defendant2 Defendant3  Payment1 Payment2 Payment3

1        Student        .                  .                    $100          .                .

2        Physician     Student       .                     0               $200        .

3        Physician     Nurse          Student        $100           $300        0


If the answer you want is ($100+$200+$0)/3=$100, then this code should work. Otherwise you will have to modify the code so that payments of zero are not included in the average.

 

/* UNTESTED CODE */

data want;
    set have;
    array defendant defendant1-defendant3;
    array payment payment1-payment3;
    where_student=whichc('Student',of defendant[*]);
    if where_student>0 then payment_to_student=payment(where_student);
run;
proc summary data=want;
    var payment_to_student;
    output out=mean mean=;
run;
    
--
Paige Miller
lmyers2
Obsidian | Level 7

Thanks, I tried that and got the following error. I pasted my code too. 

 

data want; set have;
array defendant [*] $ DEFENDANT_TYPE1-DEFENDANT_TYPE69;
array payment [*] DEF_INDEMNITY_PAID1-DEF_INDEMNITY_PAID69;
where_student=whichc('student', of DEFENDANT_TYPE[*]);
pay_student=DEF_INDEMNITY_PAID(where_student);
run;

Error message: "ERROR: Array subscript out of range at line 1304 column 8". This refers to the line starting with pay_student. Any ideas?

 

Tom
Super User Tom
Super User

You have 69 pairs of variables?  

The error is probably saying that 'student' was not found.  If the value always in lowercase like you are searching for?  Or is it in uppercase? Mixed case?  Is it consistent or can it vary?

Also could there be more than one pair where the name is STUDENT and the value is non-zero?

 

It might be better to transpose the data instead.

data want;
  set have;
  array A_defendant DEFENDANT_TYPE1-DEFENDANT_TYPE69;
  array A_payment DEF_INDEMNITY_PAID1-DEF_INDEMNITY_PAID69;
  do i=1 to dim(A_defendant) ;
    defendant=upcase(A_defendant[i]);
    payment=A_payment[i];
    if not missing(payment) then output;
  end;
  drop i DEFENDANT_TYPE1-DEFENDANT_TYPE69
    DEF_INDEMNITY_PAID1-DEF_INDEMNITY_PAID69
   ;
run;

proc means N SUM MEAN MIN MAX;
  class defendant;
  var payment;
run;
PaigeMiller
Diamond | Level 26

@lmyers2 wrote:

Thanks, I tried that and got the following error. I pasted my code too. 

 

data want; set have;
array defendant [*] $ DEFENDANT_TYPE1-DEFENDANT_TYPE69;
array payment [*] DEF_INDEMNITY_PAID1-DEF_INDEMNITY_PAID69;
where_student=whichc('student', of DEFENDANT_TYPE[*]);
pay_student=DEF_INDEMNITY_PAID(where_student);
run;

Error message: "ERROR: Array subscript out of range at line 1304 column 8". This refers to the line starting with pay_student. Any ideas?

 


You don't have an array named DEF_INDEMNITY_PAID. You do have an array named PAYMENT. Your PAY_STUDENT= has to reference array PAYMENT.

 

You don't have an array named DEFENDANT_TYPE, you have an array named DEFENDANT. WHICHC has to reference array DEFENDANT.

 

Also, please note I Modified my original code as follows:

 

if where_student>0 then payment_to_student=payment(where_student);

 

Also, details are important, your code says

 

whichc('student', ... )

 

but it should say

 

whichc('Student', ...)

 

--
Paige Miller
lmyers2
Obsidian | Level 7

Thanks, I think this will work but there is one caveat I didn't anticipate. That is, in each row of data, there could be two data points with a "Student" and they'd need to be summed by row before counting in the average. Would this be easier to do by proc sql?

ballardw
Super User

@lmyers2 wrote:

Thanks, I think this will work but there is one caveat I didn't anticipate. That is, in each row of data, there could be two data points with a "Student" and they'd need to be summed by row before counting in the average. Would this be easier to do by proc sql?


That makes me think that the transpose approach to get a single record for each pair of values with the identification values might be a better bet. Then use proc means/summary twice, once to get the case sums by defendant type and the second to get a mean.

data work.have;
informat case $5.  Defendant1 Defendant2 Defendant3 $10.   Payment1 Payment2 Payment3 comma12.;
input Case  Defendant1 Defendant2 Defendant3  Payment1 Payment2 Payment3 ;
datalines;
1  Student     .         .       $100      .            .
2  Physician  Student    .          0      $200        .
3  Physician  Nurse      Student $100      $300        0
4  Student    Nurse      Student $200      $200      $500
;


data work.trans;
   set work.have;
   array d defendant1-defendant3;
   array p payment1-payment3;
   do i= 1 to dim(d);
      defendant= d[i];
      payment  = p[i];
      output;
   end;
   keep case defendant payment;
run;

proc summary data=work.trans nway;
   where defendant='Student';
   class case ;
   var payment;
   output out=work.sum (drop=_:) sum=;
run;

proc summary data=work.sum;
   var payment;
   output out=work.mean mean=paymentmean;
run;

/* or this will summarize all the types of defendants */
proc summary data=work.trans nway;
   class defendant case;
   var payment;
   output out=work.sum2 (drop=_:) sum=;
run;

proc summary data=work.sum2 nway;
   class defendant;
   var payment;
   output out=work.mean2 mean=paymentmean;
run;


The key bit is the work.trans data set. Change the 3 to 69 (if that's how many sets of values you have).

An exercise for the interested reader: don't include the missing payments in the transposition output set.

 

 

PaigeMiller
Diamond | Level 26

I agree with @ballardw , you probably want to re-arrange the data to be one very long data set, and then the analysis is pretty easy.

 

I don't see how SQL will help here, in fact it seems harder than re-arranging the data.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 709 views
  • 0 likes
  • 4 in conversation