BookmarkSubscribeRSS Feed
dr2014
Quartz | Level 8

Hi all,

I have this dataset. I need help urgently on this.

id

characteristic

id1

RDD

id1

TSS

id2

RDD

id2

TSS

 

I need the final result table as:

Characteristic

N

%

RDD

2

100

TSS

2

100

 

how can I accomplish this

I tried by sorting the data-set by characteristic and it doesn't work.

proc freq data=have 
by characteristic;
tables id/out=want 
run;

 

Thanks.

10 REPLIES 10
Kurt_Bremser
Super User

Your percent value looks redundant and therefore useless to me.

Calculate the count like that:

data have;
input id $ characteristic $;
cards;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
;
run;

proc sql;
create table want as
select
  characteristic,
  count(*) as N
from have
group by characteristic;
quit;

Note how I put your data in a data step with datalines for easy recreation by others. Do so yourself in the future, don't offload your work on somebody else.

dr2014
Quartz | Level 8

 

Hi @Kurt_Bremser  I was looking to summarize the count but since there was a condition that could be repeated for more than an id I was concerned. Also in your code how can I ensure the counts of the condition(N) is among  all ids. I mean I don't see id mentioned in the code. or that is not needed.

proc sql;
create table want as select
  characteristic,
  count(*) as N
from have
group by characteristic;
quit;

Kurt_Bremser
Super User

If you add

proc print data=want noobs;
run;

to my previous code, the result

characteristic    N

     RDD          2
     TSS          2

clearly matches what you posted as the expected result, minus the irrelevant percent values.

If you have factors that influences these percent values so that they are not 100 all the time, please show so with an expanded dataset.

dr2014
Quartz | Level 8

Hi @KurtBremser I am back on this string. I do need help with the complete solution.  Right now, this code works for me that is edited from your first code.

 

proc sql;

create table want as

select characteristic, count(distinct id) as n

from have

group by characteristic;

quit;

 

However, I need the denominator to be the 'total number of distinct ids' to get the percent. How can I accomplish that as a nested select program in sql?

 

so basically I need :

proc sql;

 create table want2 as

  select count (distinct id) as n2

    from have  /* original dataset*/

quit;

so this count of distinct ids are not grouped by characteristic and my percent needs to be

n/n2*100

 

Hope this makes sense.

I have my code as this but I know there is something missing in it

proc sql;

create table want as

select a.characteristic, a.n, b.n2

from (select characteristic, count(distinct id) as n from have group by ccs2)

(select count(distinct id) as n2 from have) b;

quit;

 So I want the three columns in the final dataset characteristic, n, n2, percent

 

and just to expand the data:

id

characteristic

id1

RDD

id1

TSS

id2

TSS

id2

RDD

id2

GGG

id2

HHH

 So the output dataset will be

Characteristic

N

N2

%

RDD

2

2

100

TSS

2

2

100

GGG

1

2

50

HHH

1

2

50

 

Hope this makes it all clear. Look forward to your reply.

Kurt_Bremser
Super User

Please.supply.example.data.in.a.usable.form.

 

Take the data step from my post and expand it. I had to edit your copy/paste from the table viewer once to get that step running, I won't do it again.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Also explain what is "not working".  I am going to assume you mean it is taking the percentage as sum from total obs.  Maybe something like:

proc sql;
  create table want as
  select a.characteristic,
         a.n,
         (a.n/b.tot)*100 as perc
  from   (select characteristic,count(distinct id) as n from have group by characteristic) a
  full join (select characteristic,count(*) as tot from have group by characteristic) b
  on     a.characteristic=b.characteristic;
quit;
dr2014
Quartz | Level 8

Hi @RW9 Its obvious by solution of proc freq was not helpful:

It gave an error the characteristic variable  is not sorted properly.

 

And, the denominator for the percent has to be the total number of distinct ids.

 

 

Kurt_Bremser
Super User

@dr2014 wrote:

Hi @RW9 Its obvious by solution of proc freq was not helpful:

It gave an error the characteristic variable  is not sorted properly.

Then sort the dataset.


And, the denominator for the percent has to be the total number of distinct ids.

 


Do you mean something like this:

data have;
input id $ characteristic $;
cards;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
id2 RDD
;
run;

proc sql;
create table want as
select
  characteristic,
  count(distinct id) as N,
  calculated N / count(*) as percent format=percent6.
from have
group by characteristic;
quit;

proc print data=want noobs;
run;
characteristic    N    percent

     RDD          2      67%  
     TSS          2     100%  

Once again, meaningful and usable example data with expected result will help us greatly in helping you.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Then sort the data.  Also please refer to the guidance on posting new questions - post test data in the form of a datastep, show what "is not working" etc. It is not for me to type test data, then run code to get a log telling me "what is wrong with the code". 

Simply use and alter the code I have provided. 

kiranv_
Rhodochrosite | Level 12

one more to find counts

data have;
input id $ characteristic $;
datalines;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
;

proc sort data = have out=have1;
by characteristic;
run;

data want;
do until(last.characteristic);
set have1(keep=characteristic);
by characteristic;
count=sum(count,1);;
end;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1295 views
  • 0 likes
  • 4 in conversation