Re: array

HeatherNewton · Posted 12-31-2022 11:54 AM

data acct;
set new;
array pay(12) pay01-pay12;
do i=1 to 12;
if total_ref=0 then pay(i)=.;
else
pay(i)=pay(i)*default_amt/total_ref;
end;
run;

is this strange, if I create an array what it the original value of each pay(i)

is this code correct?

PaigeMiller · Posted 12-31-2022 11:56 AM

The original value of PAY01 to PAY12 is probably determined by what is in the data set named NEW. These variables probably have values in that data set.

--
Paige Miller

Tom · Posted 12-31-2022 01:20 PM

Are you asking to explain the purpose of doing something like that?

It looks like the goal is to scale the PAYxx variables from absolute values based on the TOTAL_REF and DEFALT_AMT variables.

Note you can use the DIVIDE() function instead of the IF/THEN/ELSE block. Also you don't need to manually count the number of elements in the array, you let SAS count how many variables you listed in the ARRAY statement instead.

data acct;
  set new;
  array pay pay01-pay12;
  do index=1 to dim(pay);
    pay[index]=divide(pay[index]*default_amt,total_ref);
  end;
run;

Kurt_Bremser · Posted 01-01-2023 10:01 AM

An array in SAS is, in most cases, not a data structure in itself, but a series of references to other data objects (variables). These must be of the same type, but can have different attributes (length, format, label). Elements are scattered more or less randomly in physical memory

In other languages, the individual items have no name of their own, they can only be addressed by array name and index. They also have the same attributes (size!) throughout the array.

In your given example, array element pay{1} is in fact variable pay01. From the code, it is assumed that the individual variables already exist in the PDV through the SET statement and their presence in dataset new.

The exception in SAS is a temporary array. Here the elements have no individual names (need no PDV entries), are located in direct sequence in RAM, and have the same length. They will also never appear in output datasets. Addressing an element is therefore much faster than with a "normal" array.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

HeatherNewton · Posted 01-03-2023 08:55 AM

data acct;
merge acct_new refinance_acct_pay (Drop=org_code);
by refinance_acct;
array pay(12) pay01-pay12;
do I=1 to 12;
if total_ref_amt=0 then Pay(I)=.;
else
pay(I)=pay(I)*default_amt/total_ref_amt;
end;
run;

this is the actual code, so pay01-pay02 is in the result dataset after acct_new merge with refinance_acct_pay. There is delinquency_1 to delinquency_12 in acct_new so probably pay01=delinquency_1, pay02=delinquency_2 etc? there is no other variable that look like pay01-pay12...

this is so strange, why it does not refer to what variable exactly?

what if there are another set called default_1 to default_12 ...

Tom · Posted 01-03-2023 09:15 AM

Is is referring to the variables PAY01, PAY02, to PAY12 by using the variable list PAY01-PAY12.

It is unclear in your comment whether the PAYxx variables are coming into the data step from one of the two input datasets or not. They will definitely be in the output dataset since if they do not already exist then the ARRAY statement will create them. If they are not in the input dataset then the DO loop is doing nothing since the result is just going to be to assign missing values to the new variables, which is what would happen without the DO loop.

We really cannot answer whether the DELINQUENCY variables should be used instead of the PAY variables. We don't know your datasets. But if you decide you want to make that change then the only thing that needs to change is the ARRAY statement.

array pay delinquency_1 - delinquency_12 ;

AMSAS · Posted 01-03-2023 09:36 AM

@HeatherNewton

This simple example code might help you understand what is going on with arrays:

/* 
	Create sample data containing 1 observation and 18 variables:
		pay01-pay05
		default01-default10 
		any 
		name
		variable
*/
data have ;
	array pay{5} pay01-pay05 (11 12 13 14 15) ;
	array default{10} default01-default10 (21 22 23 24 25 26 27 28 29 30) ;
	/* the variables do not have to be related to the array name */
	array crazy{3} any name variable (1 2 3) ;
	output ;
run; 


data want ;
	/* read the sample dataset */
	set have ;
	/* Create an array "pay" with 10 elements */
	array pay{10} pay01-pay10 ;
	/* Now look at the contents of all the variables read from have */
	put "*************************" ;
	put "All Variables : " ;
	put ;
	put _all_ ;
	put ;
	put "Content of Array varaibles : " ;
	put ;
	/* Now look at the contents of the array */
	do i=1 to dim(pay) ;
		put i= pay{i}= ;
	end ;
run ;

I'd also recommend you review the ARRAY Statement documentation and examples

Kurt_Bremser · Posted 01-03-2023 12:07 PM

If the PAYnn variables are not in one of the incoming datasets, then this statement is bullshit:

pay(I)=pay(I)*default_amt/total_ref_amt;

as the variables in the array will always stay missing anyway.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!