DATA Step, Macro, Functions and more

missing data

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,636
Accepted Solution

missing data

Hi All

How could I get dataset WANT from HAVE?  Thank you!

 

data have;
input id$ count var1;
cards;
aa 1 20
aa 2 .
aa 3 30
bb 1 10
bb 2 .
bb 3 .
bb 4 20
cc 1 10
cc 2 .
cc 3 30
cc 4 .
cc 5 50
;

 

data want;

id count var1
aa 1 20
aa 2 25
aa 3 30
bb 1 10
bb 2 15
bb 3 15
bb 4 20
cc 1 10
cc 2 20
cc 3 30
cc 4 40
cc 5 50


Accepted Solutions
Solution
‎11-04-2015 03:35 PM
Respected Advisor
Posts: 4,930

Re: missing data

A data step solution:

 

data have;
input id$ count var1;
cards;
aa 1 20
aa 2 .
aa 3 30
bb 1 10
bb 2 .
bb 3 .
bb 4 20
cc 1 10
cc 2 .
cc 3 30
cc 4 .
cc 5 50
;

data want0;
retain lastVar1;
set have; by id;
if first.id then call missing(lastVar1);
if missing(var1) then prevVar1 = lastVar1;
else lastVar1 = var1;
drop lastVar1;
run;

proc sort data=want0; by id descending count; run;

data want;
retain lastVar1;
set want0; by id;
if first.id then call missing(lastVar1);
if missing(var1) then var1 = mean(prevVar1, lastVar1);
else lastVar1 = var1;
drop lastVar1 prevVar1;
run;

proc sort data=want; by id count; run;

proc print data=want noobs; run;
PG

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: missing data

Long time no see! I suppose Data step solution would be a lot more efficient/elegant, but I can't figure out one right now, so here is the Ugly.

 

data have;
input id$ count var1;
cards;
aa 1 20
aa 2 .
aa 3 30
bb 1 10
bb 2 .
bb 3 .
bb 4 20
cc 1 10
cc 2 .
cc 3 30
cc 4 .
cc 5 50
;
 
proc sql;
create table want as
select id, count, case when not missing(var1) then var1 else mean((select var1 from have where id=a.id and count < a.count and not missing(var1) having count=max(count)),
(select var1 from have where id=a.id and count > a.count and not missing (var1) having count=min(count))) end as var1
from have a
;
quit;
PROC Star
Posts: 7,489

Re: missing data

Also, long time, no see. Missed you!

 

You appear to want to use two different methods for the 2nd and 3rd IDs. Is that really what you want, or something else?

If you want to use the same method, the following seems to come close:

 

PROC STDIZE data=have out=want method=mean missing=midrange reponly;
  var var1;
  by id;
run;

 

Art, CEO, AnalystFinder.com

 

Solution
‎11-04-2015 03:35 PM
Respected Advisor
Posts: 4,930

Re: missing data

A data step solution:

 

data have;
input id$ count var1;
cards;
aa 1 20
aa 2 .
aa 3 30
bb 1 10
bb 2 .
bb 3 .
bb 4 20
cc 1 10
cc 2 .
cc 3 30
cc 4 .
cc 5 50
;

data want0;
retain lastVar1;
set have; by id;
if first.id then call missing(lastVar1);
if missing(var1) then prevVar1 = lastVar1;
else lastVar1 = var1;
drop lastVar1;
run;

proc sort data=want0; by id descending count; run;

data want;
retain lastVar1;
set want0; by id;
if first.id then call missing(lastVar1);
if missing(var1) then var1 = mean(prevVar1, lastVar1);
else lastVar1 = var1;
drop lastVar1 prevVar1;
run;

proc sort data=want; by id count; run;

proc print data=want noobs; run;
PG
Super Contributor
Posts: 1,636

Re: missing data

[ Edited ]

Hi Haikuo, Art, and PG,

Thank you very much for your great help!! I miss you guys too.

Best wishes!

Linlin

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 365 views
  • 3 likes
  • 4 in conversation