Solved: Re: Creating a summary table from long list of variables

cb23_york · Posted 04-14-2015 08:28 AM

I'm sure this is very basic, but I'm having a mind block and would appreciate some help.

Suppose I had data like this which is a list of names, dates and fruit and veg consumption recorded by dummy variablese

Name	Day	Banana	Carrot	Apple
Bob	1	0	0	0
Bob	2	1	0	0
Bob	5	1	1	0
Claire	1	0	0	1
Claire	2	0	0	1
Claire	3	1	0	1
Claire	4	1	0	1

And I wished to produce a summary table like this

Name	Days	Banana	Carrot	Apple
Bob	3	0.67	0.33	0
Claire	4	0.5	0	1

What would be the best way about it (and imagine I had 50 more types of fruit and veg consumption and don't want to type in all their names).

Many thanks, Chris

Ksharp · Posted 04-14-2015 09:28 AM

Are there some missing days or duplicated days for a name ?


data have;
input name $ day banana carrot apple;
cards;
Bob     1     0     0     0
Bob     2     1     0     0
Bob     5     1     1     0
Claire     1     0     0     1
Claire     2     0     0     1
Claire     3     1     0     1
Claire     4     1     0     1
;
run;
proc sql;
 select cat('sum(',strip(name),')/count(*) as ',strip(name)) into : list separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE' and upcase(name) not in ('NAME' 'DAY');
 
 create table want as
  select name,count(*) as days,&list
   from have
    group by name;
quit;

Xia Keshan

View solution in original post

RW9 · Posted 04-14-2015 08:41 AM

What does the summary table mean? Why does Bob have 3 days and 0.67 for instance? If you don't want to type each one, then use arrays and numeric suffix variables:

data tmp;

array fruit{3} 8.;

do i=1 to 3;

...

end;

run;

cb23_york · Posted 04-14-2015 08:46 AM

To clarify, the table should contain each individuals name, a column counting the number of distinct days that we have an observation for that individual and the following columns should summarise the proportion of times that individual consumed that fruit or veg. Bob has 3 data entries ( days 1,2 and 5) and consumed a banana on 66.6% of those days a carrot on 33.3% of those days and an apple on 0% of those days. Hope that is clearer, thanks.

Ksharp · Posted 04-14-2015 09:28 AM

Are there some missing days or duplicated days for a name ?


data have;
input name $ day banana carrot apple;
cards;
Bob     1     0     0     0
Bob     2     1     0     0
Bob     5     1     1     0
Claire     1     0     0     1
Claire     2     0     0     1
Claire     3     1     0     1
Claire     4     1     0     1
;
run;
proc sql;
 select cat('sum(',strip(name),')/count(*) as ',strip(name)) into : list separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE' and upcase(name) not in ('NAME' 'DAY');
 
 create table want as
  select name,count(*) as days,&list
   from have
    group by name;
quit;

Xia Keshan

cb23_york · Posted 04-14-2015 09:43 AM

That's fantastic Xia, much appreciated. In answer to your question there indeed some missing days, but never any duplication of days for a name.

Ksharp · Posted 04-14-2015 10:23 AM

So you want count this missing day or not ?

If you don't want count missing day , then change it as

proc sql;

select cat('sum(',strip(name),')/count(day) as ',strip(name)) into : list separated by ','

from dictionary.columns

where libname='WORK' and memname='HAVE' and upcase(name) not in ('NAME' 'DAY');

create table want as

select name,count(day) as days,&list

from have

group by name;

quit;

Xia Keshan

cb23_york · Posted 04-14-2015 11:09 AM

Thanks Xia,

I do indeed not want to count the missing days, but the first code works fine as well. In the example Bob has missing days 3 and 4, but the original code does fine as counting that he has 3 days of observed data.

Ksharp · Posted 04-15-2015 08:08 AM

Nope. I mean

Bob . 0 0 0

Bob . 1 0 0

Bob 5 1 1 0

cb23_york · Posted 04-15-2015 08:12 AM

Ok that's clear. No, there are no observations with missing data on the day. But thank you for the amended code which would work under such circumstances.

Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Re: Creating a summary table from long list of variables

Registration is open

Call for Content EXTENDED

Registration is open

Call for Content EXTENDED

SAS Training: Just a Click Away