Help using Base SAS procedures

How to fill in missing values

Accepted Solution Solved
Reply
Super Contributor
Posts: 338
Accepted Solution

How to fill in missing values

Hi SAS Forum,

I have this data set which has 3 variables (Orig_date,   Curr_QTR,  Village), of them two variables are having missing values.

Orig_date   Curr_QTR  Village

03/01/1993

03/31/2008

A

.

06/30/2008

.

09/30/2008

.

12/31/2008

06/01/1995

03/31/2008

.

06/30/2008

12/01/1995

03/31/2008

B

.

06/30/2008

.

09/30/2008

.

12/31/2008

03/01/1996

03/31/2008

.

06/30/2008

.

09/30/2008

.

12/31/2008

.

03/31/2009

.

06/30/2009

.

09/30/2009

.

12/31/2009

Please see below SAS code which readily produces the above data set.

data have;

informat orig_date curr_QTR date9.;

input village $ 1-1 orig_date curr_QTR;

format orig_date curr_QTR MMDDYY10.;

cards;

A 01MAR1993 31MAR2008

  . 30JUN2008

  . 30SEP2008

  . 31DEC2008

  01JUN1995 31MAR2008

  . 30JUN2008

B 01DEC1995 31MAR2008

  . 30JUN2008

  . 30SEP2008

  . 31DEC2008

  01MAR1996 31MAR2008

  . 30JUN2008

  . 30SEP2008

  . 31DEC2008

  . 31MAR2009

  . 30JUN2009

  . 30SEP2009

  . 31DEC2009

  ; run;

Question:

I need to fill in the missing values of variables “orig_date” and “village” with the upper value that would be found at the top of a given missing block.

Below is the table that I want.

Orig_date   Curr_QTR  Village

03/01/1993

03/31/2008

A

03/01/1993

06/30/2008

A

03/01/1993

09/30/2008

A

03/01/1993

12/31/2008

A

06/01/1995

03/31/2008

A

06/01/1995

06/30/2008

A

12/01/1995

03/31/2008

B

12/01/1995

06/30/2008

B

12/01/1995

09/30/2008

B

12/01/1995

12/31/2008

B

03/01/1996

03/31/2008

B

03/01/1996

06/30/2008

B

03/01/1996

09/30/2008

B

03/01/1996

12/31/2008

B

03/01/1996

03/31/2009

B

03/01/1996

06/30/2009

B

03/01/1996

09/30/2009

B

03/01/1996

12/31/2009

B

I have tried this but no hope.

proc sort data=have;

by curr_qtr;

run;

data want;

set have;

retain oldorig_date;

by curr_qtr;

if first.curr_qtr then oldorig_date = orig_date;

else orig_date = oldorig_date;

drop oldorig_date;

run;

Could anyone help me.

Thanks

Mirisage


Accepted Solutions
Solution
‎06-06-2013 02:01 PM
Regular Contributor
Posts: 180

Re: How to fill in missing values

Try this:

data want (drop= ori vill);

  retain ori 0 Vill ' ';

  set have;

  if orig_date = .

     then orig_date=ori;

     else ori=orig_date;

  if village=''

     then village=vill;

    else vill=village;

run;

Regards,

View solution in original post


All Replies
Solution
‎06-06-2013 02:01 PM
Regular Contributor
Posts: 180

Re: How to fill in missing values

Try this:

data want (drop= ori vill);

  retain ori 0 Vill ' ';

  set have;

  if orig_date = .

     then orig_date=ori;

     else ori=orig_date;

  if village=''

     then village=vill;

    else vill=village;

run;

Regards,

Respected Advisor
Posts: 4,919

Re: How to fill in missing values

Do not sort; the values are carried over according to the original data order. One way to do this is :

data want;

length lastVillage $1;

retain lastVillage lastOrig;

set have;

village = coalescec(village, lastVillage);

orig_date = coalesce(orig_date, lastOrig);

lastVillage = village;

lastOrig = orig_date;

drop lastVillage lastOrig;

run;

PG

PG
Respected Advisor
Posts: 3,156

Re: How to fill in missing values

Or you can try a trick I learned from DN:

data have;

informat orig_date curr_QTR date9.;

input village $ 1-1 orig_date curr_QTR;

format orig_date curr_QTR MMDDYY10.;

retain dummy '1';

cards;

A 01MAR1993 31MAR2008

  . 30JUN2008

  . 30SEP2008

  . 31DEC2008

  01JUN1995 31MAR2008

  . 30JUN2008

B 01DEC1995 31MAR2008

  . 30JUN2008

  . 30SEP2008

  . 31DEC2008

  01MAR1996 31MAR2008

  . 30JUN2008

  . 30SEP2008

  . 31DEC2008

  . 31MAR2009

  . 30JUN2009

  . 30SEP2009

  . 31DEC2009

;

data want;

  update have (obs=0) have;

  by dummy;

output;

drop dummy;

run;

Haikuo

Contributor
Posts: 22

Re: How to fill in missing values

Respected Advisor
Posts: 3,156

Re: How to fill in missing values

Posted in reply to yeshwanth

@yeshwanth,

If you can afford SAS/ETS, Proc expand/Proc timeseries are very powerful tools. However, in this case, I doubt either will work. They generally required data to be sorted by a time/date variable in general or within  groups; and if there are groups, group variable can NOT be missing.

I could be very wrong since I haven't used these tools very often. I am therefore hoping you or some expert like    / @data null  can provide some code to prove me wrong.

Thanks,

Haikuo

Contributor
Posts: 32

Re: How to fill in missing values

I posted this questions few minutes ago

Perform ANOVA and multiple comparisons with PROC surveymeans

I am not sure if I posted it correctly. could you please let me know I it is posted correctly.

with many thanks

Stoline

Super Contributor
Posts: 338

Re: How to fill in missing values

Hi sstoline,

I think you have not posted it correctly. You have posted your query as a response to my query.

Just post as a new separate posting.

Thanks

Mirisage

Contributor
Posts: 32

Re: How to fill in missing values

Dear Mirisage:

Did you see this question posted before?

I posted it few times, but no answers available yet. So I am not sure if others able to see it.

many thanks

abou

Perform ANOVA and multiple comparisons with PROC surveymeans

This question is Not Answered.(Mark as assumed answered)

  

Dear All:


I do need helps with how to use both PROC surveyfreq and PROC surveymeans to do the following analysis.

I really thank you all for your helps.


I have the following variables:


Stratum1 (1,2)


Months (0,1,2)


X1 (1=yes,2=No) ---- nominal categorical


X2  ------ interval continuous


Weight1  ---- weight variable


I need


(1) to test the equality of proportions of ones in months (0,1,2), and conduct a multiple comparison test for these proportions.


(2) to test the equality of the averages in months(0,1,2) and conduct a multiple comparison test for these means.


(3) test for trend: for example if there is increasing or decreasing in averages (X2), if there is increasing or decreasing in proportions of ones (X1).

Any helps will be appreciated.


Please email me copy of your answer.


With many thanks

Steve

Email: sstoline@gmail.com


data1--- as an example

=============================

X2    X1    stratum1    months    weight1

29.4822    2    2    0    16.61921705

33.2849    2    2    0    16.61921705

16.9370    1    1    0    2.09612124

19.0055    1    1    0    2.09612124

21.1151    1    2    0    16.61921705

25.4055    2    2    0    16.61921705

24.4164    1    1    0    1.56167161

19.0767    1    1    0    1.56167161

23.1562    2    1    0    2.09612124

23.3479    1    1    0    2.09612124

21.7370    1    1    0    2.09612124

24.9726    2    2    0    16.61921705

20.3836    1    2    0    16.61921705

20.0575    1    2    0    20.88637405

17.6603    1    2    0    16.61921705

25.2274    2    2    0    16.61921705

20.9644    1    1    0    2.09612124

32.8055    2    2    0    14.51497509

39.5233    2    1    0    1.56167161

17.7288    1    1    1    2.09612124

18.3096    1    2    1    16.61921705

25.8055    2    2    1    16.61921705

30.0904    1    2    1    16.61921705

27.8082    2    1    1    2.09612124

37.3863    2    1    1    2.09612124

26.2548    2    1    1    2.09612124

17.8795    1    2    1    20.88637405

19.8000    1    2    1    16.61921705

28.6932    2    1    1    1.56167161

27.1041    2    2    1    16.61921705

36.1836    2    2    1    16.61921705

19.8192    1    1    1    2.09612124

31.9644    2    2    1    20.88637405

27.0740    1    1    1    2.09612124

23.5288    2    1    1    2.09612124

21.9068    1    2    1    14.51497509

20.3534    1    2    1    16.61921705

29.4767    2    2    1    14.51497509

22.8274    1    2    1    16.61921705

21.2685    1    2    1    20.88637405

31.8932    1    1    1    1.56167161

31.8795    2    1    1    2.09612124

21.2630    2    1    1    2.09612124

18.7562    2    1    1    2.09612124

16.8822    1    2    1    20.88637405

22.0164    2    2    2    20.88637405

27.4959    2    1    2    2.09612124

27.4904    2    1    2    1.56167161

2.7096    2    2    2    14.51497509

23.0027    1    2    2    16.61921705

19.2767    1    2    2    16.61921705

30.4466    1    2    2    16.61921705

36.5425    2    2    2    16.61921705

32.2521    2    1    2    1.56167161

36.9534    1    2    2    14.51497509

30.4164    2    2    2    14.51497509

20.2767    2    1    2    2.09612124

17.2356    1    2    2    20.88637405

20.8247    2    2    2    16.61921705

20.8795    1    2    2    16.61921705

27.5260        2    2    16.61921705

19.2493    1    2    2    16.61921705

17.8521    1    2    2    16.61921705

19.6822    1    2    2    16.61921705

36.8356    2    1    2    1.5616716

Super User
Super User
Posts: 7,035

Re: How to fill in missing values

This is again a good case for using the UPDATE trick that I have seen Data_null_ post many times.

In this case you need to create a variable with a constant value for all observations as UPDATE requires a BY variable and you do not have one.

data want;

  set have;

  dummy=1;

run;

data want ;

  update want(obs=0) want;

  by dummy;

  output;

  drop dummy;

run;

Respected Advisor
Posts: 4,919

Re: How to fill in missing values

Hi Tom, can you explain why OUTPUT is required in that last datastep? Thanks. PG

PG
Respected Advisor
Posts: 3,156

Re: How to fill in missing values

By default, update statement will only output at the end of each by variable. In this case, it will only output one record if not adding "output".

Haikuo

Respected Advisor
Posts: 4,919

Re: How to fill in missing values

So UPDATE alters the default behaviour of the datastep to have an implicit OUTPUT at the end if there isn't an explicit one. Are there other cases where that happens?

PG
Respected Advisor
Posts: 3,799

Re: How to fill in missing values

There is "always" an implicit OUTPUT unless you get MODIFY involved.  Other exceptions I reckon but I don't know any off hand.

UPDATE "applies" all transaction records to the uniquely keyed master records before the implicit output is executed.

like IF LAST.key THEN OUTPUT;

In this atypical usage we are only interested in the transaction records and the special UPDATE feature "missing check".

Super Contributor
Posts: 338

Re: How to fill in missing values

Posted in reply to data_null__

Hi Torres, PGStats, Hai.kuo and Tom,

Many thanks to each one of you.

Each one of your codes generates idenitical resutls for my larger dataset as well.

Thanks again!

Hi yeshwanth and data_null_;

Thanks both of you for the contribution.

Regards

Mirisage

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 376 views
  • 6 likes
  • 8 in conversation