Re: Running frequency

dr2014 · Posted 11-21-2018 09:09 AM

Hi all,

I have this dataset. I need help urgently on this.

id	characteristic
id1	RDD
id1	TSS
id2	RDD
id2	TSS

I need the final result table as:

Characteristic	N	%
RDD	2	100
TSS	2	100

how can I accomplish this

I tried by sorting the data-set by characteristic and it doesn't work.

proc freq data=have
by characteristic;
tables id/out=want
run;

Thanks.

Kurt_Bremser · Posted 11-21-2018 09:20 AM

Your percent value looks redundant and therefore useless to me.

Calculate the count like that:

data have;
input id $ characteristic $;
cards;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
;
run;

proc sql;
create table want as
select
  characteristic,
  count(*) as N
from have
group by characteristic;
quit;

Note how I put your data in a data step with datalines for easy recreation by others. Do so yourself in the future, don't offload your work on somebody else.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

dr2014 · Posted 11-21-2018 09:36 AM

Hi @Kurt_Bremser I was looking to summarize the count but since there was a condition that could be repeated for more than an id I was concerned. Also in your code how can I ensure the counts of the condition(N) is among all ids. I mean I don't see id mentioned in the code. or that is not needed.

proc sql;
create table want as select
characteristic,
count(*) as N
from have
group by characteristic;
quit;

Kurt_Bremser · Posted 11-21-2018 09:44 AM

If you add

proc print data=want noobs;
run;

to my previous code, the result

characteristic    N

     RDD          2
     TSS          2

clearly matches what you posted as the expected result, minus the irrelevant percent values.

If you have factors that influences these percent values so that they are not 100 all the time, please show so with an expanded dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

dr2014 · Posted 11-21-2018 10:44 AM

Hi @KurtBremser I am back on this string. I do need help with the complete solution. Right now, this code works for me that is edited from your first code.

proc sql;

create table want as

select characteristic, count(distinct id) as n

from have

group by characteristic;

quit;

However, I need the denominator to be the 'total number of distinct ids' to get the percent. How can I accomplish that as a nested select program in sql?

so basically I need :

proc sql;

create table want2 as

select count (distinct id) as n2

from have /* original dataset*/

quit;

so this count of distinct ids are not grouped by characteristic and my percent needs to be

n/n2*100

Hope this makes sense.

I have my code as this but I know there is something missing in it

proc sql;

create table want as

select a.characteristic, a.n, b.n2

from (select characteristic, count(distinct id) as n from have group by ccs2)

(select count(distinct id) as n2 from have) b;

quit;

So I want the three columns in the final dataset characteristic, n, n2, percent

and just to expand the data:

id	characteristic
id1	RDD
id1	TSS
id2	TSS
id2	RDD
id2	GGG
id2	HHH

So the output dataset will be

Characteristic	N	N2	%
RDD	2	2	100
TSS	2	2	100
GGG	1	2	50
HHH	1	2	50

Hope this makes it all clear. Look forward to your reply.

Kurt_Bremser · Posted 11-22-2018 02:25 AM

Please.supply.example.data.in.a.usable.form.

Take the data step from my post and expand it. I had to edit your copy/paste from the table viewer once to get that step running, I won't do it again.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 11-21-2018 09:21 AM

Post test data in the form of a datastep:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Also explain what is "not working". I am going to assume you mean it is taking the percentage as sum from total obs. Maybe something like:

proc sql;
  create table want as
  select a.characteristic,
         a.n,
         (a.n/b.tot)*100 as perc
  from   (select characteristic,count(distinct id) as n from have group by characteristic) a
  full join (select characteristic,count(*) as tot from have group by characteristic) b
  on     a.characteristic=b.characteristic;
quit;

dr2014 · Posted 11-21-2018 09:46 AM

Hi @RW9 Its obvious by solution of proc freq was not helpful:

It gave an error the characteristic variable is not sorted properly.

And, the denominator for the percent has to be the total number of distinct ids.

Kurt_Bremser · Posted 11-21-2018 09:55 AM

@dr2014 wrote:

Hi @RW9 Its obvious by solution of proc freq was not helpful:

It gave an error the characteristic variable is not sorted properly.

Then sort the dataset.

And, the denominator for the percent has to be the total number of distinct ids.

Do you mean something like this:

data have;
input id $ characteristic $;
cards;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
id2 RDD
;
run;

proc sql;
create table want as
select
  characteristic,
  count(distinct id) as N,
  calculated N / count(*) as percent format=percent6.
from have
group by characteristic;
quit;

proc print data=want noobs;
run;

characteristic    N    percent

     RDD          2      67%  
     TSS          2     100%

Once again, meaningful and usable example data with expected result will help us greatly in helping you.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 11-21-2018 10:05 AM

Then sort the data. Also please refer to the guidance on posting new questions - post test data in the form of a datastep, show what "is not working" etc. It is not for me to type test data, then run code to get a log telling me "what is wrong with the code".

Simply use and alter the code I have provided.

kiranv_ · Posted 11-21-2018 09:47 AM

one more to find counts

data have;
input id $ characteristic $;
datalines;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
;

proc sort data = have out=have1;
by characteristic;
run;

data want;
do until(last.characteristic);
set have1(keep=characteristic);
by characteristic;
count=sum(count,1);;
end;
run;

Registration is open