Hi all,
I have this dataset. I need help urgently on this.
id |
characteristic |
id1 |
RDD |
id1 |
TSS |
id2 |
RDD |
id2 |
TSS |
I need the final result table as:
Characteristic |
N |
% |
RDD |
2 |
100 |
TSS |
2 |
100 |
how can I accomplish this
I tried by sorting the data-set by characteristic and it doesn't work.
proc freq data=have
by characteristic;
tables id/out=want
run;
Thanks.
Your percent value looks redundant and therefore useless to me.
Calculate the count like that:
data have;
input id $ characteristic $;
cards;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
;
run;
proc sql;
create table want as
select
characteristic,
count(*) as N
from have
group by characteristic;
quit;
Note how I put your data in a data step with datalines for easy recreation by others. Do so yourself in the future, don't offload your work on somebody else.
Hi @Kurt_Bremser I was looking to summarize the count but since there was a condition that could be repeated for more than an id I was concerned. Also in your code how can I ensure the counts of the condition(N) is among all ids. I mean I don't see id mentioned in the code. or that is not needed.
proc sql;
create table want as select
characteristic,
count(*) as N
from have
group by characteristic;
quit;
If you add
proc print data=want noobs;
run;
to my previous code, the result
characteristic N RDD 2 TSS 2
clearly matches what you posted as the expected result, minus the irrelevant percent values.
If you have factors that influences these percent values so that they are not 100 all the time, please show so with an expanded dataset.
Hi @KurtBremser I am back on this string. I do need help with the complete solution. Right now, this code works for me that is edited from your first code.
proc sql;
create table want as
select characteristic, count(distinct id) as n
from have
group by characteristic;
quit;
However, I need the denominator to be the 'total number of distinct ids' to get the percent. How can I accomplish that as a nested select program in sql?
so basically I need :
proc sql;
create table want2 as
select count (distinct id) as n2
from have /* original dataset*/
quit;
so this count of distinct ids are not grouped by characteristic and my percent needs to be
n/n2*100
Hope this makes sense.
I have my code as this but I know there is something missing in it
proc sql;
create table want as
select a.characteristic, a.n, b.n2
from (select characteristic, count(distinct id) as n from have group by ccs2)
(select count(distinct id) as n2 from have) b;
quit;
So I want the three columns in the final dataset characteristic, n, n2, percent
and just to expand the data:
id |
characteristic |
id1 |
RDD |
id1 |
TSS |
id2 |
TSS |
id2 |
RDD |
id2 |
GGG |
id2 |
HHH |
So the output dataset will be
Characteristic |
N |
N2 |
% |
RDD |
2 |
2 |
100 |
TSS |
2 |
2 |
100 |
GGG |
1 |
2 |
50 |
HHH |
1 |
2 |
50 |
Hope this makes it all clear. Look forward to your reply.
Please.supply.example.data.in.a.usable.form.
Take the data step from my post and expand it. I had to edit your copy/paste from the table viewer once to get that step running, I won't do it again.
Post test data in the form of a datastep:
Also explain what is "not working". I am going to assume you mean it is taking the percentage as sum from total obs. Maybe something like:
proc sql; create table want as select a.characteristic, a.n, (a.n/b.tot)*100 as perc from (select characteristic,count(distinct id) as n from have group by characteristic) a full join (select characteristic,count(*) as tot from have group by characteristic) b on a.characteristic=b.characteristic; quit;
Hi @RW9 Its obvious by solution of proc freq was not helpful:
It gave an error the characteristic variable is not sorted properly.
And, the denominator for the percent has to be the total number of distinct ids.
@dr2014 wrote:
Hi @RW9 Its obvious by solution of proc freq was not helpful:
It gave an error the characteristic variable is not sorted properly.
Then sort the dataset.
And, the denominator for the percent has to be the total number of distinct ids.
Do you mean something like this:
data have;
input id $ characteristic $;
cards;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
id2 RDD
;
run;
proc sql;
create table want as
select
characteristic,
count(distinct id) as N,
calculated N / count(*) as percent format=percent6.
from have
group by characteristic;
quit;
proc print data=want noobs;
run;
characteristic N percent RDD 2 67% TSS 2 100%
Once again, meaningful and usable example data with expected result will help us greatly in helping you.
Then sort the data. Also please refer to the guidance on posting new questions - post test data in the form of a datastep, show what "is not working" etc. It is not for me to type test data, then run code to get a log telling me "what is wrong with the code".
Simply use and alter the code I have provided.
one more to find counts
data have;
input id $ characteristic $;
datalines;
id1 RDD
id1 TSS
id2 RDD
id2 TSS
;
proc sort data = have out=have1;
by characteristic;
run;
data want;
do until(last.characteristic);
set have1(keep=characteristic);
by characteristic;
count=sum(count,1);;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.