Solved: Re: show row with missing data in a summary table

michtka · Posted 08-27-2012 08:07 AM

Hi everyone, I got the next code that generate adv1 (a summary table):

data new;
length subjid leg_sort aeterm $20. trtrel 8.;
input subjid leg_sort aeterm trtrel;
datalines;
100 drug drymouth 1
101 drug drymouth 1
102 placebo drymount 0
;
run;

proc sql noprint;
create table adv1 as
select count(distinct subjid) as n, leg_sort, 'Subjects evaluable for AE' as text
from new
where trtrel=1
group by leg_sort;
quit;

my problem is that I would like to include the row placebo with N= .

i.e, I want to generate a dataset including all the possible values of leg_sort considering the values with missing data (placebo in my example).

Thnaks in advance.

V

mkeintz · Posted 08-28-2012 11:57 AM

OK, I understand now.

Then how about a union of the "where trtel" where you count the subject id's with the "leg_sort eq 'placebo'" case with N specified as missing. I only tested this SQL code with your 3 observations, but it worked in that case (I prefer proc summary). I moved the respective WHERE clauses into data set name parameters so that I didn't have to figure out their influence as sql expressions. SQL mavens might have a much more elegant way to do this.

proc sql;

select count(distinct subjid) as n, leg_sort, 'Subjects Available for AE' as text

from new (where=(trtrel eq 1))

union

select . as n, leg_sort, 'Subject Available for AE' as text

from new (where=(leg_sort eq 'placebo'))

group by leg_sort;

quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 08-27-2012 09:06 AM

Dear Mitchka:

You have not demonstrated a problem of dealing with missing data. You've shown the effect of using "where trtel=1".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

michtka · Posted 08-27-2012 10:33 AM

The option where is neccessary, because i am interested in the records trtrel=1,

I know that there is not record of Placebo, but instead of not showing it, I am interested to consider not missing data (it is wrong, sorry, my mistake in the tittle), I am interested to consider the placebo raw with 0 value.

Something like that:

leg_sort N

drug 2

Placebo 0

art297 · Posted 08-27-2012 11:08 AM

There are more combinations than shown in your desired output. Will you get what you want by using proc freq? e.g.:

proc freq data=new;

tables trtrel*leg_sort*aeterm;

run;

MikeZdeb · Posted 08-27-2012 12:35 PM

hi ... three PROCs support preloading a format ... MEANS (SUMMARY), TABULATE, REPORT ... that make groups in your output dependent on a format rather than on your data

so, you can use SUMMARY with one or more CLASS variables to make a data set of counts, for example ...

proc format;

value $leg_sort 'drug' = 'drug' 'placebo' = 'placebo';

run;

data new;

input subjid (leg_sort aeterm) (:$20.) trtrel;

datalines;

100 drug drymouth 1

101 drug drymouth 1

102 placebo drymount 0

;

proc summary data=new nway completetypes;

class leg_sort / preloadfmt;

output out=counts (drop=_type_ rename=(_freq_ = N));

where trtrel eq 1;

format leg_sort $leg_sort.;

run;

data set COUNTS ...

Obs leg_sort N

1 drug 2

2 placebo 0

mkeintz · Posted 08-27-2012 04:29 PM

Could you not then just add "or leg_sort='placebo'" to your where clause?

As in:

where trtel=1 or leg_sort='placebo'

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

michtka · Posted 08-28-2012 09:49 AM

Thnak you mkeintz for making this dataset with proc summary,...but there one way to do the same using proc sql?

the reason I insisted with proc sql is because it is part of a macro, and the macro consider always 2 rows, this the reason i need to create that extra row of placebo with 0 values.

thnakls.

mkeintz · Posted 08-28-2012 10:13 AM

I did not demonstrate the proc summary - that was Mike Zdeb. My suggestion was explicitly for your PROC SQL which currently has where trtel=1 and could easily be changed to where trtel=1 or leg_sort='placebo'

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

MikeZdeb · Posted 08-28-2012 11:04 AM

Hi ... yes, I suggested SUMMARY. I don't think that the suggestion

where trtel=1 changed to where trtel=1 or leg_sort='placebo'

proc sql;

select count(distinct subjid) as n, leg_sort, 'Subjects evaluable for AE' as text

from new

where trtrel eq 1 or leg_sort eq 'placebo'

group by leg_sort;

quit;

n leg_sort text

2 drug Subjects evaluable for AE

1 placebo Subjects evaluable for AE

does what you want in that you get a count of 1 for placebo since rather than the 0 that you want.

Given my meager SQL knowledge, I cannot think of an easy SQL approach since you are asking PROC SQL to produce a count for something that is not in your data set.

That is why I suggested SUMMARY so you could have a format rather than your data dictate the groups that appear in your table (including zero counts).

michtka · Posted 08-28-2012 11:25 AM

Yers, you right, this "or upcase(leg_sort)=PLACEBO" i am not interested in....

In the way of proc sql you are right too, I was expected someone to help me with some condition to consider all

the possible values of leg_sort.

I dont think there any option can cover it using proc sql....but i think we can create a final dataset considering this extra record placebo with 0 value.

michtka · Posted 08-28-2012 11:09 AM

I'm sorry for my confussion...but still not right...because

abviously It generate a record with placebo n=1, and I am interesting in a dataset just as Mike Zdeb did (counts), but using proc sql.

data set COUNTS ...

Obs leg_sort N

1 drug 2

2 placebo 0

Astounding · Posted 08-28-2012 11:45 AM

Assuming you change the WHERE clause to add "or upcase(leg_sort)='PLACEBO'" you have two ways to fix the result.

You could always use a DATA step before printing to change N for Placebo to 0. Alternatively, you could select a more complex statistic that doesn't require a subsequent DATA step. You'll have to check the syntax on this because SQL is a secondary language for me:

select min( count(distinct subjid), sum(trtrel) ) as N,

Good luck.

mkeintz · Posted 08-28-2012 11:57 AM

OK, I understand now.

Then how about a union of the "where trtel" where you count the subject id's with the "leg_sort eq 'placebo'" case with N specified as missing. I only tested this SQL code with your 3 observations, but it worked in that case (I prefer proc summary). I moved the respective WHERE clauses into data set name parameters so that I didn't have to figure out their influence as sql expressions. SQL mavens might have a much more elegant way to do this.

proc sql;

select count(distinct subjid) as n, leg_sort, 'Subjects Available for AE' as text

from new (where=(trtrel eq 1))

union

select . as n, leg_sort, 'Subject Available for AE' as text

from new (where=(leg_sort eq 'placebo'))

group by leg_sort;

quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

michtka · Posted 08-29-2012 04:51 AM

Thanks mkeintz, this exactly like I wanted.

I need to add an if consition to your code, because not always n(placebo)=.

SAS Training: Just a Click Away