In numeric var use first observation to do calculations with other obs...

mgrasmussen · Posted 01-06-2022 12:45 PM

Dear SAS experts

I am trying to write some code where I will be able to do the following; I want to be able to utilize the first observation/data point in a numeric variable in calculations with the remaining data points in the same variable below observation 1 (observation 2 to observation N). Specifically, I want to "take" the data point in observation 1 and for each other other data point in the variable divide it by this number and then multiply by a 1000 (it has to do with standardization of some data according to samples sizes). I want to use an array because this procedure should be repeated for many numeric variables (the code works if I do it for one variable at a time, but not when I try to run the code for many variables using arrays). I have tried to write some code but it does not feel ideal. Below is a simplied version of the data and some suggested code.

- Example 1 does not yield the results I am looking for, while example 2 does. Can someone explain why this might be the case?

- Also, in my actual dataset (which contains much more data in many more variables (there are also character variables)), example 2 does NOT work (while example 1 actually does). The calculations yield results which are much smaller number than they should be (i.e. much smaller numbers than I would like in the desired output). Does anyone have a suggestion to why example 2 might not work in other datasets? It is of course hard to determine without inspection the actual data and code (which I cannot share), but someone might be able to pinpoint what most likely would could be the issue.

- I suspect that there is a much easier way of achieving the results I am looking for, and, if so, I would appreciate someone writing out this code.

data have;
input varone vartwo varthree varfour;
datalines;
1 2 1 1
1 2 1 1
1 3 1 2
1 4 3 10
;
run;

data have;
set have;
globalcatvar="1";
run;

*Example 1: With explicit variable specification;
data have_new (drop=i);
set have;
array modify {2} varone vartwo;
do i=1 to dim(modify);
by globalcatvar notsorted;
if first.globalcatvar then _iorc_=modify(i);
if _n_>1 then modify(i)=(modify(i)/_iorc_)*1000;
end;
run;

*Example 2: With variable specification using _numeric_;
data have_new2 (drop=i);
set have;
array modify {*} _numeric_;
do i=1 to dim(modify);
by globalcatvar notsorted;
if first.globalcatvar then _iorc_=modify(i);
if _n_>1 then modify(i)=(modify(i)/_iorc_)*1000;
end;
run;

Thank you

ballardw · Posted 01-06-2022 01:12 PM

The typical approach to "first" involves first identifying the variable(s) that define a group. Your example data is possibly too simple to show that easily. Then a data step using BY statement with the variable(s) that identify a group coupled with a RETAIN statement.

Example:

data have;
input varone vartwo varthree varfour;
datalines;
1 2 1 1
1 2 1 1
1 3 1 2
1 4 3 10
2 4 4 4
2 5 6 7
2 9 9 9
;
run;

data want;
   set have;
   by varone;
   retain firstvarone;
   if first.varone then firstvarone=varone;
run;

I added a small number of records with different VARONE assuming that variable is to identify a group. The Want data set uses BY Varone to indicate that it is used as a group. The default is to have the data sorted by that variable. If your data is not sorted by the variable(s) then you use the NOTSORTED option on the BY statement to indicate grouped but not in order.

When you use a BY statement SAS supplies automatic variables First.<variablename> and Last.<variablename> , do notice the dot, which are numeric 1/0 valued for true or false. So First.varone is true(1) when the first record is found with the value 1.

Retain keeps the value of a variable across data step boundaries (don't use the name of a variable in the data one the SET statement, it gets reset when the next record is read). So conditionally set the variable name that holds the value of the variable of interest.

I only show holding the value. You can do with it what you want. If you do not need the Retained variable in the output you can Drop it after you are sure it is used correctly.

mgrasmussen · Posted 01-07-2022 04:25 AM

Hey Ballard

Thanks a bunch! I appreciate the explanation. I understand most of it.

I tried updating the code (below) based on your feedback. However, instead of having a variable which groups the data what I do is I create a variable which is the same value for all observations, i.e. the it groups the dataset as one big group (although I might as well have used varone, actually).

The code below seems to work, but I cannot get it to work in an array. I would like to specify the array using: _numeric_. The problem is I do not know how to define/write the variable which is retained and used in the calculation. Do you have a suggestion on how to do this?

data have;
input varone vartwo varthree varfour;
datalines;
1 2 1.5 5
1 2 1 1
1 3 1 2
1 4 3 10
;
run;

data have;
set have;
globalcatvar="1";
run;

data want_three;
set have;
by globalcatvar;
retain firstvartwo;
if first.globalcatvar then firstvartwo=vartwo;
if _n_>1 then vartwo=vartwo/firstvartwo;
run;

Thank you!

In numeric var use first observation to do calculations with other observations in same variable

Re: In numeric var use first observation to do calculations with other observations in same variable

Re: In numeric var use first observation to do calculations with other observations in same variable

In numeric var use first observation to do calculations with other observations in same variable

Re: In numeric var use first observation to do calculations with other observations in same variable

Re: In numeric var use first observation to do calculations with other observations in same variable

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!