DATA Step, Macro, Functions and more

How can I get what I want?

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,636
Accepted Solution

How can I get what I want?

Thank you for your help!

data have;

input id year bmi;

cards;

1 1995 .

1 1996 6

1 1999 9

2 1997 7

2 1998 8

2 1999 .

3 1995 5

3 1998 .

3 1999 9

;

want:

id year bmi

1 1995 6

1 1996 6

1 1999 9

2 1997 7

2 1998 8

2 1999 8

3 1995 5

3 1998 7

3 1999 9

;


Accepted Solutions
Solution
‎05-23-2013 08:45 PM
Respected Advisor
Posts: 3,156

Re: How can I get what I want?

Hi LinLin,

Long time! Here is my try, It will fill up the gap regardless how big it is, using the average of most adjacent 'bmi' from two directions:

data have;

input id year bmi;

cards;

1 1995 .

1 1996 6

1 1999 9

2 1997 7

2 1998 8

2 1999 .

3 1994 .

3 1995 5

3 1996 .

3 1997 .

3 1998 .

3 1999 .

3 2000 9

3 2001 .

;

proc sql;

select a.id, a.year, coalesce(a.bmi,mean(pre_bmi,post_bmi)) as bmi from

  (select distinct a.*, b.bmi as pre_bmi

                   from have a

                               left join have b

                                    on a.id=b.id

                            and a.year > b.year

                            and not missing(b.bmi)

                             group by a.id,a.year

                             having a.year-b.year=min(a.year-b.year)) a,

  (select distinct a.*, b.bmi as post_bmi

                   from have a

                               left join have b

                                    on a.id=b.id

                            and a.year < b.year

                            and not missing(b.bmi)

                             group by a.id,a.year

                             having b.year-a.year=min(b.year-a.year)) b

                                           where a.id=b.id and a.year=b.year;quit;

                                         

Hope you the best! It is also doable using Hash(), let me know if you are interested.

Haikuo

View solution in original post


All Replies
Valued Guide
Posts: 2,177

Re: How can I get what I want?

Is it just "read ahead"  that you need or are there rules like "never read ahead  on first.id"

Super User
Posts: 19,770

Re: How can I get what I want?

What was the rule that was used for ID 3 and year 1998? It seems to be an interpolation, but carry forward if it was one year and carried back one year for id 1 and 2?

Solution
‎05-23-2013 08:45 PM
Respected Advisor
Posts: 3,156

Re: How can I get what I want?

Hi LinLin,

Long time! Here is my try, It will fill up the gap regardless how big it is, using the average of most adjacent 'bmi' from two directions:

data have;

input id year bmi;

cards;

1 1995 .

1 1996 6

1 1999 9

2 1997 7

2 1998 8

2 1999 .

3 1994 .

3 1995 5

3 1996 .

3 1997 .

3 1998 .

3 1999 .

3 2000 9

3 2001 .

;

proc sql;

select a.id, a.year, coalesce(a.bmi,mean(pre_bmi,post_bmi)) as bmi from

  (select distinct a.*, b.bmi as pre_bmi

                   from have a

                               left join have b

                                    on a.id=b.id

                            and a.year > b.year

                            and not missing(b.bmi)

                             group by a.id,a.year

                             having a.year-b.year=min(a.year-b.year)) a,

  (select distinct a.*, b.bmi as post_bmi

                   from have a

                               left join have b

                                    on a.id=b.id

                            and a.year < b.year

                            and not missing(b.bmi)

                             group by a.id,a.year

                             having b.year-a.year=min(b.year-a.year)) b

                                           where a.id=b.id and a.year=b.year;quit;

                                         

Hope you the best! It is also doable using Hash(), let me know if you are interested.

Haikuo

Respected Advisor
Posts: 4,919

Re: How can I get what I want?

For the sake of variety... Using specialized tools :

data have;
input id year bmi;
cards;
1 1995 .
1 1996 6
1 1999 9
2 1997 7
2 1998 8
2 1999 .
3 1994 .
3 1995 5
3 1996 .
3 1997 .
3 1998 .
3 1999 .
3 2000 9
3 2001 .
4 1995 .
4 1996 5
5 1993 .
;


proc reg data=have plots=none noprint;
by id;
model bmi = year;
output out=got p=predBmi;
run;


/* disregard warning and error messages */


data want;
set got;
bmi = coalesce(bmi, round(predBmi), mod(year, 10));
drop predBmi;
run;

proc print; id id year; run;

PG

Message was edited by: PG added MOD function to provide a bmi for single year ids.

PG
Contributor
Posts: 60

Re: How can I get what I want?

Hi,

This is missing imputation by average of surrounded values. I had this situation like this before for my analysis.

1. sort by year Retain the non - missing values in a new variable

2. Sort descending year create a another new variable with retain values

3. use the Mean function and replace the missing values using the 2 newly created variables

This will work 100% I did it.

Sudhakar

Regular Contributor
Posts: 195

Re: How can I get what I want?

Hi,

Based on your sample data provided, i have prepared one SAS Code to achive desire output by using ARRAY as follow:

data have;

  input year income;

  cards4;

1990 .

1991 2000

1992 3000

1993 .

1994 4000

1995 .

1996 5000

;;;;

proc transpose data = have

               out  = want

               prefix = year_;

    id year;

    var income;

run;

data want(keep = year income);

  set want;

  array _miss (*) _numeric_;

  do i = 1 to dim(_miss);

     if _miss(1) = . then _miss(1) = _miss(1+1);

     if _miss(i) = . then _miss(i) = _miss(i-1);

     year = substr(vname(_miss(i)),6);

     income = _miss(i);

     output;

  end;

run;

This code will fill the missing values as per its previous value of variable...And if the very first value is missing then it will fill missing value as per its  very next observation which is excatly same as your output posted...

But in your output, i still have confusion about how you want to fill missing value for year when its value is 1998...

Request you to please share some rule to handle missing value for year = 1998...

-Urvish

Super Contributor
Posts: 1,636

Re: How can I get what I want?

Thank you all for your help!

For now each person only has three observations and one missing value. If first or third one is missing then use the second one, if the second one is missing then use the average of the first and the third. It will be more complicated as more data are added in the future. I will definitely need to try all the solutions. 

Haikuo, I have forgotten all the hash methods learned a year ago. Thank you for your good wishes! Happy Memorial Day Smiley Happy!

Linlin

Respected Advisor
Posts: 3,156

Re: How can I get what I want?

Hi LinLin,

Here is a Hash() approach hopefully to fresh your memory up:

data have;

input id year bmi;

cards;

1 1995 .

1 1996 6

1 1999 9

2 1997 7

2 1998 8

2 1999 .

3 1994 .

3 1995 5

3 1996 .

3 1997 .

3 1998 .

3 1999 .

3 2000 9

3 2001 .

;

data want_hash;

  if _n_=1 then do;

    if 0 then set have(rename=(id=_id year=_year bmi=_bmi));

      declare hash h(dataset:'have(rename=(id=_id year=_year bmi=_bmi))', multidata:'y', ordered:'a');

      h.definekey('_id', '_year');

      h.definedata(all:'y');

      h.definedone();

      declare hiter hi('h');

  end;

  set have;

       do rc=hi.setcur(key:id,key:year) by 0 while (rc=0);

        rc=hi.prev();

            if _id=id then pre_bmi=_bmi;

        if _id ne id or not missing(_bmi) then leave;

     end;

      do rc=hi.setcur(key:id,key:year) by 0 while (rc=0);

        rc=hi.next();

            if _id=id then post_bmi=_bmi;

        if _id ne id or not missing(_bmi) then leave;

     end;

       bmi=coalesce(bmi,mean(pre_bmi,post_bmi));

       keep id year bmi;

   run;


Haikuo

     

    

Super Contributor
Posts: 1,636

Re: How can I get what I want?

Too bad I have to learn Hash now:smileysilly:. Thank you again! - Linlin

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 360 views
  • 6 likes
  • 7 in conversation