Hello,
Let's say I have this dataset :
data have;
input id $ var1 var2 var3 maxvar;
datalines;
id1 . . 999 3
id2 . 10 10 3
id3 10 . . 1
;
run;
The . are what I called missing values.
I want that, whenever I finda missing value in var1-var3, I go and take the first non-missing value to the right and split it by the amount of columns I went through to find this value.
To help myself doing so, I implemented a column that gives the column in which I can find the last non missing value : maxvar
If maxvar=2, it means that var2 is the last non missing value of all vars (so var3, var4,... will be . )
I guess it will be more clear with an example : using the data set I typed above, here is the result that I would like :
id | var1 | var2 | var3 | maxvar |
id1 | 333 | 333 | 333 | 3 |
id2 | 5 | 5 | 10 | 3 |
id3 | 10 | . | . | 1 |
I know that witch this example I could do something like "if var1=. and var2<>. then var1=var2/2 and var2=var2/2" etc, but the problem is that the number of var colums is variable and can go up to 100 colums !
I thought of using array statement and lag function in a data step, but I'm not sure how to manage the variable amount of columns I'll have to go through.
Don't hesitate to ask if you want me to be more precise.
Thanks !
Maybe this code could work.
data want;
set have;
array v _numeric_;
n=1;
nmi=1; *non-missing value index;
do i=1 to dim(v);
if missing(v(i)) then
n=n+1;
else
do;
do j=nmi to i;
v[j]=v[i]/n;
end;
n=1;
nmi=i+1;
end;
end;
drop n nmi i j;
run;
Do all of the missing values start in the first column? If not, it seems that you would need
StartVar and EndVar rather than MaxVar.
As the problem is currently stated, you can try to use arrays, similar to the following. I assume the MaxVar value for ID='id2' was supposed to be 2.
data have;
input id $ var1 var2 var3 maxvar;
datalines;
id1 . . 999 3
id2 . 10 10 2
id3 10 . . 1
;
data want;
set have;
array v[*] var1-var3;
amount = v[maxvar] / maxvar;
do i = 1 to maxvar;
v[i] = amount;
end;
run;
proc print;run;
No, maxvar is not 2 in id2. I understand why you think I mistyped 3 into 2 but it's really a 3, as the last non-missing value is 10 in var3.
Let me give another exemple. If I had a row with var1-var6 such as :
var1 | var2 | var3 | var4 | var5 | var6 |
10 | . | 12 | . | . | 60 |
I would like to have this :
var1 | var2 | var3 | var4 | var5 | var6 |
10 | 6 | 6 | 20 | 20 | 20 |
And maxvar here would be 6. I mean, I'm not even sure if maxvar will be of any use but I've got it in my dataset so I was thinking about using it.
Some observations/rows will have like maxvar=60 where some maxvar=4, so I thought of stopping the process when exceeding maxvar
Maybe this code could work.
data want;
set have;
array v _numeric_;
n=1;
nmi=1; *non-missing value index;
do i=1 to dim(v);
if missing(v(i)) then
n=n+1;
else
do;
do j=nmi to i;
v[j]=v[i]/n;
end;
n=1;
nmi=i+1;
end;
end;
drop n nmi i j;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.