BookmarkSubscribeRSS Feed

Processing Two-Dimensional Arrays to Build multiple regression models

Started ‎09-14-2023 by
Modified ‎09-14-2023 by
Views 345

 

The purpose of this article is to give a brief understanding into processing two-dimensional arrays that involves manipulating and analyzing data organized in rows and columns. SAS provides several efficient methods to handle two-dimensional arrays, offering flexibility and convenience for data manipulation tasks.

 

SAS allows you to create two-dimensional arrays using the ARRAY statement. This statement enables you to define a matrix-like structure with rows and columns, where each element represents a value from your data. You can populate the array by referencing specific variables or observations in your dataset, allowing for efficient access and manipulation of data elements. SAS offers array-processing statements like DO loops to iterate through the elements of the array.

 

Initial and Building New Arrays

The purpose of this demo of articles is to introduce and explain the tools and to illustrate their usefulness through a series of examples. This article provides an overview of concepts on creating and processing two-dimensional as arrays and gaining insight into the data. The advantages of arrays are to help simplify the program to be able to process repetitive code, rotating data, and performing table lookup.

 

In this first example, we’ll use pg3.weather_dublin_daily_5yr dataset. This table and contains four variables: City, Date, TempDailyAvg, and Precip. A portion of the table is shown. The natural interval of the data is Days.

 

dm_1_blg1-300x232.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

It is useful to know that in the first step we loaded the data to display the daily temperature. The next step is initializing the array Avg in the PDV (Program Data Vector), after the program is processed through the PDV.

 

In this DATA step we use nested DO-LOOPS to make a temporary table work.DublinDaily. It will contain the results from processing.

 

  • The array Avg will hold the temporary table 5 rows and 12 columns, creating 60 new columns, Avg1 through Avg60.
  • The work.DublinDaily read the daily values from weather_dublin_daily5yr from 2013 to 2017 for 12 months and compare TempDailyTemp to the TempMonthlyAvg in the Avg array.
  • The outer DO-LOOP will control which rows will be processed. In this process we will keep Temp1 through Temp12 and build a temporary array T[12].
  • The inner DO-LOOP will control how the columns are processed. In this process the inner DO-LOOP will process through each value from Month 1 to Month 12 for each year.
  • The KEEP statement will take the 15th day of each month


data work.DublinTemp;
array Avg[2013:2017,12] _temporary_; 
if _N_=1 then do Yr=2013 to 2017;
set pg3.weather_dublin_monthly5yr(keep=Temp1-Temp12);
array T[12] Temp1-Temp12;
do Month=1 to 12;
Avg[Yr,Month]=T[Month];
end;
end;
set pg3.weather_dublin_daily5yr(where=(day(Date)=15)keep=DateTempDailyAvg);
Y=year(Date);
M=month(Date);
TempMonthlyAvg=avg[Y,M];
Difference=TempDailyAvg-TempMonthlyAvg;
keep Date TempDailyAvg TempMonthlyAvg Difference;
run;

 

Let’s look at the results of the work.DublinDaily and discuss them in detail.

 

Results

dm_2_blg2-300x182.png

 

dm_3_Bimonthly-300x225.png

 

  • The TempDailyAvg variable in the WORK.DUBLINTEMP is interval each day of the temperature in Dublin.
  • The TempMonthlyAvg variable in the WORK.DUBLINTEMP is interval of 30-days of temperature. The first value is the average temperature for each day of the month.
  • The DIFFERNCE variable is the TempDailyAvg subtracted by TempMonthlyAvg.

 

In conclusion, processing two-dimensional arrays involves working with structured data arranged in rows and columns. It provides a powerful tool for organizing and manipulating data in various domains, including image processing, data analysis, and simulation modeling. The versatility of two-dimensional arrays allows for efficient storage and retrieval of information, enabling a wide range of operations and computations. This process is similarly used in PROC TIMESERIES, time series data represents observations collected over regular time intervals, such as stock prices, temperature measurements, or sales data. By representing time series data as a two-dimensional array, it becomes possible to apply similar processing techniques used in this demonstration. Two-dimensional arrays allow for easy indexing and access to specific time points within the series. This enables operations like data filtering, aggregation, and transformation. Additionally, techniques such as smoothing, interpolation, and outlier detection can be applied to time series data using the concepts of windowing and sliding operations, which are commonly used methods in two-dimensional array processing.

 

For more information:

Comments

I guess I don't understand why the title talks about "multiple regression models", when this is not mentioned anywhere in the text.

 

Providing the data as SAS data step code rather than as a screen capture would help all of us.

Version history
Last update:
‎09-14-2023 04:12 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags