I have the following table:
Name | DATE | actual |
1111 | 1/2/2019 | 1 |
1111 | 1/2/2019 | 1 |
1111 | 1/4/2019 | 0 |
1111 | 1/5/2019 | 0 |
1111 | 1/6/2019 | 0 |
2222 | 1/4/2019 | 1 |
2222 | 1/5/2019 | 1 |
2222 | 1/6/2019 | 1 |
3333 | 1/6/2019 | 1 |
3333 | 1/10/2019 | 0 |
3333 | 1/10/2019 | 0 |
3333 | 1/13/2019 | 1 |
3333 | 1/14/2019 | 1 |
3333 | 1/15/2019 | 1 |
I want total counts per Name (if two dates are the same twice, count as twice) and then only 1 for actual (sum)
NAME | TOTAL_COUNTS | ACTUAL_COUNTS |
1111 | 5 | 2 |
2222 | 3 | 3 |
3333 | 6 | 4 |
I tried the nested loop but no luck
proc sql;
select DISTINCT
a.NAME,
count(a.DATE) as Total_counts,
a.actual_counts
from (select distinct a.NAME,
count(a.actual) as actual_counts
from sql.p_IE_ELIGIBILITY a
where a.actual="1"
group by a.NAME)
group by a.NAME as a full join
(select distinct a.NAME, count(*) as total from TEST) as b
on a.NAME = b.NAME;
quit;
No looping is needed here. You don't have to write your own code to perform simple operations like counting. SAS has many many built-in procedures that will do what you want, which is counting. It would certainly be a good idea for you to familarize yourself with many of the basic procedures, such as PROC MEANS/PROC SUMMARY and PROC FREQ, as this will enable you to work a lot faster than writing your own looping code.
PROC SUMMARY solution
proc summary data=have nway;
class name;
var actual;
output out=want n=total_counts sum=actual_counts;
run;
data have;
input Name DATE :mmddyy10. actual;
cards;
1111 1/2/2019 1
1111 1/2/2019 1
1111 1/4/2019 0
1111 1/5/2019 0
1111 1/6/2019 0
2222 1/4/2019 1
2222 1/5/2019 1
2222 1/6/2019 1
3333 1/6/2019 1
3333 1/10/2019 0
3333 1/10/2019 0
3333 1/13/2019 1
3333 1/14/2019 1
3333 1/15/2019 1
;
proc sql;
create table want as
select name,count(name) as TOTAL_COUNTS,sum(actual) as ACTUAL_COUNTS
from have
group by name;
quit;
Hello,
Did you try this:
proc sql;
select DISTINCT
NAME,
count(DATE) as Total_counts,
sum(actual_counts) as actual_counts
from sql.p_IE_ELIGIBILITY
group by name;
quit;
I think it will be easier!
Regards,
JD
Hi,
do you mean something like this:
data have;
input
Name : $ DATE : mmddyy. actual
;
format date yymmdd10;
cards;
1111 1/2/2019 1
1111 1/2/2019 1
1111 1/4/2019 0
1111 1/5/2019 0
1111 1/6/2019 0
2222 1/4/2019 1
2222 1/5/2019 1
2222 1/6/2019 1
3333 1/6/2019 1
3333 1/10/2019 0
3333 1/10/2019 0
3333 1/13/2019 1
3333 1/14/2019 1
3333 1/15/2019 1
;
run;
proc sql;
select
name
, count(date) as TOTAL_COUNTS
, count(distinct case when actual = 1 then date else . end) as ACT_COUNTS
from
have
group by name
;
quit;
output:
Name TOTAL_COUNTS ACT_COUNTS 1111 5 1 2222 3 3 3333 6 4
Hi @monday89
You can do it either through a proc means or a proc sql:
proc means data=have noprint;
var actual;
class Name;
ways 1;
output out=want (drop =_type_ _freq_) n=TOTAL_COUNTS sum=ACTUAL_COUNTS;
run;
proc sql;
create table want as
select name,
count(actual) as TOTAL_COUNTS,
sum(actual) as ACTUAL_COUNTS
from have
group by name;
quit;
Allow me to offer some unsolicited advice, whose bottom line is learn to use the proc summary (identical to "proc means") solution offered by @PaigeMiller and @ed_sas_member .
Sure SQL can be used to do this task, but analysis and tabulation of this sort are the raison d'etre of SAS. In particular proc summary/means can do lots of aggregation and many statistics that would require twisting yourself into knots when using SQL.
Using SQL to solve many common (and not-so-common) tabulation tasks is like treating all problems as nails suitable for the SQL hammer. Yes, it's a nice hammer, but the other tools are there for good reason.
@mkeintz wrote:
Allow me to offer some unsolicited advice, whose bottom line is learn to use the proc summary (identical to "proc means") solution offered by @PaigeMiller and @ed_sas_member .
Sure SQL can be used to do this task, but analysis and tabulation of this sort are the raison d'etre of SAS. In particular proc summary/means can do lots of aggregation and many statistics that would require twisting yourself into knots when using SQL.
Using SQL to solve many common (and not-so-common) tabulation tasks is like treating all problems as nails suitable for the SQL hammer. Yes, it's a nice hammer, but the other tools are there for good reason.
In addition, if you have a case where you need to slice the data two (or more) different ways — for example, slice the data by state to compute state-level statistics; and also slice the data by county to produce county level statistics — SQL requires two PROC SQL blocks and two passes through the data, while PROC SUMMARY requires one PROC SUMMARY block and one pass through the data. Not only that, but SUMMARY handles missing values and computes weighted statistics properly while SQL doesn't always do that; and SUMMARY has many built in statistics that I don't think SQL has built-in (although since I don't use SQL for this purpose, maybe I'm wrong, can SQL compute the 99th percentile statistic?).
Hi @PaigeMiller ,
According to the SAS 9.4M6 doc this are SQL's aggregation functions:
AVG, MEAN | mean or average of values |
COUNT, FREQ, N | number of nonmissing values |
CSS | corrected sum of squares |
CV | coefficient of variation (percent) |
MAX | largest value |
MIN | smallest value |
NMISS | number of missing values |
PRT | probability of a greater absolute value of Student's t |
RANGE | range of values |
STD | standard deviation |
STDERR | standard error of the mean |
SUM | sum of values |
SUMWGT | sum of the WEIGHT variable values |
T | Student's t value for testing the hypothesis that the population mean is zero |
USS | uncorrected sum of squares |
VAR | variance |
so no quantiles 🙂
The function MEDIAN() works in SQL, but not P99 and not MODE. Also, PROC SUMMARY contains ability to compute confidence intervals and t-statistics and their probability.
Hi @monday89 ,
you wrote:
I want total counts per Name (if two dates are the same twice, count as twice) and then only 1 for actual (sum)
in such case should the value of ACTUAL_COUNTS for 1111 be 2? since the first date (1/2/2019) is doubled shouldn't it be 1?
Hi @yabwon I like your diligence and wasn't sure of that part too. The WANT shown by OP seems to suggest a straight forward GROUP BY sum/count respectively.
That line of "specification" was the reason why I went into:
count(distinct case when actual = 1 then date else . end) as ACT_COUNTS
instead classic aggregation procedure like summary/means or tabulate. Lack of handling "twisted" aggregation logic is the reason why sometimes SQL "outperforms" good classic procedures. I wish I could use something like `variable * n(distinct)` in tabulate.
Fully agree. I might place my bet on your understanding to be the right comprehension of the requirement. After all, looking at OP's use of SQL syntax, it appears OP does seems know SQL and may not be the straight forward summary stat assumption of mine and others. Hmm time for a coffee break? Oh rather a corona beer to beat the corona virus. 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.