Perform ANOVA and multiple comparisons with PROC surveymeans

Mirisage · Posted 06-06-2013 01:39 PM

Hi SAS Forum,

I have this data set which has 3 variables (Orig_date, Curr_QTR, Village), of them two variables are having missing values.

Orig_date Curr_QTR Village

03/01/1993	03/31/2008	A
.	06/30/2008
.	09/30/2008
.	12/31/2008
06/01/1995	03/31/2008
.	06/30/2008
12/01/1995	03/31/2008	B
.	06/30/2008
.	09/30/2008
.	12/31/2008
03/01/1996	03/31/2008
.	06/30/2008
.	09/30/2008
.	12/31/2008
.	03/31/2009
.	06/30/2009
.	09/30/2009
.	12/31/2009

Please see below SAS code which readily produces the above data set.

data have;

informat orig_date curr_QTR date9.;

input village $ 1-1 orig_date curr_QTR;

format orig_date curr_QTR MMDDYY10.;

cards;

A 01MAR1993 31MAR2008

. 30JUN2008

. 30SEP2008

. 31DEC2008

01JUN1995 31MAR2008

. 30JUN2008

B 01DEC1995 31MAR2008

. 30JUN2008

. 30SEP2008

. 31DEC2008

01MAR1996 31MAR2008

. 30JUN2008

. 30SEP2008

. 31DEC2008

. 31MAR2009

. 30JUN2009

. 30SEP2009

. 31DEC2009

; run;

Question:

I need to fill in the missing values of variables “orig_date” and “village” with the upper value that would be found at the top of a given missing block.

Below is the table that I want.

Orig_date Curr_QTR Village

03/01/1993	03/31/2008	A
03/01/1993	06/30/2008	A
03/01/1993	09/30/2008	A
03/01/1993	12/31/2008	A
06/01/1995	03/31/2008	A
06/01/1995	06/30/2008	A
12/01/1995	03/31/2008	B
12/01/1995	06/30/2008	B
12/01/1995	09/30/2008	B
12/01/1995	12/31/2008	B
03/01/1996	03/31/2008	B
03/01/1996	06/30/2008	B
03/01/1996	09/30/2008	B
03/01/1996	12/31/2008	B
03/01/1996	03/31/2009	B
03/01/1996	06/30/2009	B
03/01/1996	09/30/2009	B
03/01/1996	12/31/2009	B

I have tried this but no hope.

proc sort data=have;

by curr_qtr;

run;

data want;

set have;

retain oldorig_date;

by curr_qtr;

if first.curr_qtr then oldorig_date = orig_date;

else orig_date = oldorig_date;

drop oldorig_date;

run;

Could anyone help me.

Thanks

Mirisage

CTorres · Posted 06-06-2013 02:01 PM

Try this:

data want (drop= ori vill);

retain ori 0 Vill ' ';

set have;

if orig_date = .

then orig_date=ori;

else ori=orig_date;

if village=''

then village=vill;

else vill=village;

run;

Regards,

View solution in original post

CTorres · Posted 06-06-2013 02:01 PM

Try this:

data want (drop= ori vill);

retain ori 0 Vill ' ';

set have;

if orig_date = .

then orig_date=ori;

else ori=orig_date;

if village=''

then village=vill;

else vill=village;

run;

Regards,

PGStats · Posted 06-06-2013 02:30 PM

Do not sort; the values are carried over according to the original data order. One way to do this is :

data want;

length lastVillage $1;

retain lastVillage lastOrig;

set have;

village = coalescec(village, lastVillage);

orig_date = coalesce(orig_date, lastOrig);

lastVillage = village;

lastOrig = orig_date;

drop lastVillage lastOrig;

run;

PG

Haikuo · Posted 06-06-2013 02:41 PM

Or you can try a trick I learned from DN:

data have;

informat orig_date curr_QTR date9.;

input village $ 1-1 orig_date curr_QTR;

format orig_date curr_QTR MMDDYY10.;

retain dummy '1';

cards;

A 01MAR1993 31MAR2008

. 30JUN2008

. 30SEP2008

. 31DEC2008

01JUN1995 31MAR2008

. 30JUN2008

B 01DEC1995 31MAR2008

. 30JUN2008

. 30SEP2008

. 31DEC2008

01MAR1996 31MAR2008

. 30JUN2008

. 30SEP2008

. 31DEC2008

. 31MAR2009

. 30JUN2009

. 30SEP2009

. 31DEC2009

;

data want;

update have (obs=0) have;

by dummy;

output;

drop dummy;

run;

Haikuo

yeshwanth · Posted 06-06-2013 03:54 PM

Hello,

For simpler approach you can try PROC EXPAND

http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/viewer.htm#etsug_expand_sect006...

http://www.ats.ucla.edu/stat/sas/faq/creating_tiimeseries_variables_proc_expand.htm

Try if this helps

Haikuo · Posted 06-06-2013 04:34 PM

@yeshwanth,

If you can afford SAS/ETS, Proc expand/Proc timeseries are very powerful tools. However, in this case, I doubt either will work. They generally required data to be sorted by a time/date variable in general or within groups; and if there are groups, group variable can NOT be missing.

I could be very wrong since I haven't used these tools very often. I am therefore hoping you or some expert like / @data null can provide some code to prove me wrong.

Thanks,

Haikuo

sstoline · Posted 06-07-2013 05:15 PM

I posted this questions few minutes ago

Perform ANOVA and multiple comparisons with PROC surveymeans

I am not sure if I posted it correctly. could you please let me know I it is posted correctly.

with many thanks

Stoline

Mirisage · Posted 06-08-2013 12:39 AM

Hi sstoline,

I think you have not posted it correctly. You have posted your query as a response to my query.

Just post as a new separate posting.

Thanks

Mirisage

sstoline · Posted 06-08-2013 07:57 AM

Dear Mirisage:

Did you see this question posted before?

I posted it few times, but no answers available yet. So I am not sure if others able to see it.

many thanks

abou

Perform ANOVA and multiple comparisons with PROC surveymeans

This question is Not Answered.(Mark as assumed answered)

Dear All:

I do need helps with how to use both PROC surveyfreq and PROC surveymeans to do the following analysis.

I really thank you all for your helps.

I have the following variables:

Stratum1 (1,2)

Months (0,1,2)

X1 (1=yes,2=No) ---- nominal categorical

X2 ------ interval continuous

Weight1 ---- weight variable

I need

(1) to test the equality of proportions of ones in months (0,1,2), and conduct a multiple comparison test for these proportions.

(2) to test the equality of the averages in months(0,1,2) and conduct a multiple comparison test for these means.

(3) test for trend: for example if there is increasing or decreasing in averages (X2), if there is increasing or decreasing in proportions of ones (X1).

Any helps will be appreciated.

Please email me copy of your answer.

With many thanks

Steve

Email: sstoline@gmail.com

data1--- as an example

=============================

X2 X1 stratum1 months weight1

29.4822 2 2 0 16.61921705

33.2849 2 2 0 16.61921705

16.9370 1 1 0 2.09612124

19.0055 1 1 0 2.09612124

21.1151 1 2 0 16.61921705

25.4055 2 2 0 16.61921705

24.4164 1 1 0 1.56167161

19.0767 1 1 0 1.56167161

23.1562 2 1 0 2.09612124

23.3479 1 1 0 2.09612124

21.7370 1 1 0 2.09612124

24.9726 2 2 0 16.61921705

20.3836 1 2 0 16.61921705

20.0575 1 2 0 20.88637405

17.6603 1 2 0 16.61921705

25.2274 2 2 0 16.61921705

20.9644 1 1 0 2.09612124

32.8055 2 2 0 14.51497509

39.5233 2 1 0 1.56167161

17.7288 1 1 1 2.09612124

18.3096 1 2 1 16.61921705

25.8055 2 2 1 16.61921705

30.0904 1 2 1 16.61921705

27.8082 2 1 1 2.09612124

37.3863 2 1 1 2.09612124

26.2548 2 1 1 2.09612124

17.8795 1 2 1 20.88637405

19.8000 1 2 1 16.61921705

28.6932 2 1 1 1.56167161

27.1041 2 2 1 16.61921705

36.1836 2 2 1 16.61921705

19.8192 1 1 1 2.09612124

31.9644 2 2 1 20.88637405

27.0740 1 1 1 2.09612124

23.5288 2 1 1 2.09612124

21.9068 1 2 1 14.51497509

20.3534 1 2 1 16.61921705

29.4767 2 2 1 14.51497509

22.8274 1 2 1 16.61921705

21.2685 1 2 1 20.88637405

31.8932 1 1 1 1.56167161

31.8795 2 1 1 2.09612124

21.2630 2 1 1 2.09612124

18.7562 2 1 1 2.09612124

16.8822 1 2 1 20.88637405

22.0164 2 2 2 20.88637405

27.4959 2 1 2 2.09612124

27.4904 2 1 2 1.56167161

2.7096 2 2 2 14.51497509

23.0027 1 2 2 16.61921705

19.2767 1 2 2 16.61921705

30.4466 1 2 2 16.61921705

36.5425 2 2 2 16.61921705

32.2521 2 1 2 1.56167161

36.9534 1 2 2 14.51497509

30.4164 2 2 2 14.51497509

20.2767 2 1 2 2.09612124

17.2356 1 2 2 20.88637405

20.8247 2 2 2 16.61921705

20.8795 1 2 2 16.61921705

27.5260 2 2 16.61921705

19.2493 1 2 2 16.61921705

17.8521 1 2 2 16.61921705

19.6822 1 2 2 16.61921705

36.8356 2 1 2 1.5616716

Tom · Posted 06-06-2013 05:25 PM

This is again a good case for using the UPDATE trick that I have seen Data_null_ post many times.

In this case you need to create a variable with a constant value for all observations as UPDATE requires a BY variable and you do not have one.

data want;

set have;

dummy=1;

run;

data want ;

update want(obs=0) want;

by dummy;

output;

drop dummy;

run;

PGStats · Posted 06-06-2013 05:46 PM

Hi Tom, can you explain why OUTPUT is required in that last datastep? Thanks. PG

PG

Haikuo · Posted 06-06-2013 06:13 PM

By default, update statement will only output at the end of each by variable. In this case, it will only output one record if not adding "output".

Haikuo

PGStats · Posted 06-06-2013 06:25 PM

So UPDATE alters the default behaviour of the datastep to have an implicit OUTPUT at the end if there isn't an explicit one. Are there other cases where that happens?

PG

data_null__ · Posted 06-06-2013 06:46 PM

There is "always" an implicit OUTPUT unless you get MODIFY involved. Other exceptions I reckon but I don't know any off hand.

UPDATE "applies" all transaction records to the uniquely keyed master records before the implicit output is executed.

like IF LAST.key THEN OUTPUT;

In this atypical usage we are only interested in the transaction records and the special UPDATE feature "missing check".

Mirisage · Posted 06-07-2013 02:48 PM

Hi Torres, PGStats, Hai.kuo and Tom,

Many thanks to each one of you.

Each one of your codes generates idenitical resutls for my larger dataset as well.

Thanks again!

Hi yeshwanth and data_null_;

Thanks both of you for the contribution.

Regards

Mirisage

How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Perform ANOVA and multiple comparisons with PROC surveymeans

Re: How to fill in missing values

Re: How to fill in missing values

Perform ANOVA and multiple comparisons with PROC surveymeans

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Re: How to fill in missing values

Registration is open

SAS Training: Just a Click Away