Solved: Re: Linear smoothing depending on missing values

freshap · Posted 10-30-2020 10:35 AM

Hello,

Let's say I have this dataset :

data have;
input id $ var1 var2 var3 maxvar;
datalines;
id1 . . 999 3
id2 . 10 10 3
id3 10 . . 1
;
run;

The . are what I called missing values.

I want that, whenever I finda missing value in var1-var3, I go and take the first non-missing value to the right and split it by the amount of columns I went through to find this value.

To help myself doing so, I implemented a column that gives the column in which I can find the last non missing value : maxvar

If maxvar=2, it means that var2 is the last non missing value of all vars (so var3, var4,... will be . )

I guess it will be more clear with an example : using the data set I typed above, here is the result that I would like :

id	var1	var2	var3	maxvar
id1	333	333	333	3
id2	5	5	10	3
id3	10	.	.	1

I know that witch this example I could do something like "if var1=. and var2<>. then var1=var2/2 and var2=var2/2" etc, but the problem is that the number of var colums is variable and can go up to 100 colums !

I thought of using array statement and lag function in a data step, but I'm not sure how to manage the variable amount of columns I'll have to go through.

Don't hesitate to ask if you want me to be more precise.

Thanks !

jarapoch · Posted 10-31-2020 05:51 AM

Maybe this code could work.

data want;
set have;
array v _numeric_;
n=1;
nmi=1; *non-missing value index;

do i=1 to dim(v);

if missing(v(i)) then
n=n+1;
else
do;

do j=nmi to i;
v[j]=v[i]/n;
end;
n=1;
nmi=i+1;
end;
end;
drop n nmi i j;
run;

View solution in original post

Rick_SAS · Posted 10-30-2020 11:07 AM

Do all of the missing values start in the first column? If not, it seems that you would need

StartVar and EndVar rather than MaxVar.

Rick_SAS · Posted 10-30-2020 11:11 AM

As the problem is currently stated, you can try to use arrays, similar to the following. I assume the MaxVar value for ID='id2' was supposed to be 2.

data have;
input id $ var1 var2 var3 maxvar;
datalines;
id1 .  . 999 3
id2 .  10 10 2
id3 10 . . 1
;

data want;
set have;
array v[*] var1-var3;
amount = v[maxvar] / maxvar;
do i = 1 to maxvar;
   v[i] = amount;
end;
run;

proc print;run;

freshap · Posted 10-30-2020 12:18 PM

No, maxvar is not 2 in id2. I understand why you think I mistyped 3 into 2 but it's really a 3, as the last non-missing value is 10 in var3.

Let me give another exemple. If I had a row with var1-var6 such as :

var1	var2	var3	var4	var5	var6
10	.	12	.	.	60

I would like to have this :

var1	var2	var3	var4	var5	var6
10	6	6	20	20	20

And maxvar here would be 6. I mean, I'm not even sure if maxvar will be of any use but I've got it in my dataset so I was thinking about using it.

Some observations/rows will have like maxvar=60 where some maxvar=4, so I thought of stopping the process when exceeding maxvar

jarapoch · Posted 10-31-2020 05:51 AM

Maybe this code could work.

data want;
set have;
array v _numeric_;
n=1;
nmi=1; *non-missing value index;

do i=1 to dim(v);

if missing(v(i)) then
n=n+1;
else
do;

do j=nmi to i;
v[j]=v[i]/n;
end;
n=1;
nmi=i+1;
end;
end;
drop n nmi i j;
run;

freshap · Posted 11-05-2020 05:03 AM

Thank you ! That's exactly what I wanted (I just have to replace the array by the values I want, being var1-var5 in the example I gave).
Learned quite some tricks about data steps thanks to your answer.

Linear smoothing depending on missing values

Re: Linear smoothing depending on missing values

Re: Linear smoothing depending on missing values

Re: Linear smoothing depending on missing values

Re: Linear smoothing depending on missing values

Re: Linear smoothing depending on missing values

Re: Linear smoothing depending on missing values

Registration is open

SAS Training: Just a Click Away