Hello SAS community,
I am attempting to create a table with PROC SQL and need to find the distict count of several things. I am using SAS ver9.3.
I have a dataset I wish to manipulate as shown below:
Drink Mocha Americano Cappuccino Solo Doppio Earl Grey Latte London Fog Iced Green Tea Iced Sweet Tea Chocolate Milkshake Vanilla Milkshake Strawberry Milkshake Mango Smoothie Strawberry Smoothie Banana Smoothie | Type Coffee Coffee Coffee Coffee Coffee Tea Tea Tea Tea Blended Blended Blended Blended Blended Blended | Dairy Yes No Yes No No Yes Yes No No Yes Yes Yes No No No | Price 4.00 3.00 4.00 1.00 2.00 4.00 4.00 2.00 3.00 5.00 5.00 5.00 4.00 3.00 3.00 |
I would like to create a table that looks something like this:
Label Total drinks Price >=3
Drinks without dairy Price >=3
Drinks with dairy Price >=3 | Coffee 5(100.0) 3(60.0)
3(60.0) 3(60.0)
2(40.0) 0(0.0) | Tea 4(100.0) 2(50.0)
2(50.0) 2(50.0)
2(50.0) 0(0.0) | Blended 6(100.0) 2(33.3)
3(50.0) 2(33.3)
3(50.0) 0(0.0) | Combined 15(100.0) 7(46.7)
8(53.3) 7(46.7)
7(46.7) 0(0.0) |
Basically, I am looking to build a table with distinct counts based on a specification. The Types are split into the variables, along with a combination of all 3. The numbers in the parenthesis are the percentages from the total, as seen in the first row.
I have created a table of the Labels using proc format:
proc format ;
value row
1 ='Total drinks'
1.1 =' Price >=3'
2 ='Drinks without dairy'
2.1 =' Price >=3'
3 ='Drinks with dairy'
3.1 =' Price >=3'
;
quit;
proc format library=work cntlout=temp (keep=fmtname start end label) ;
quit;
I understand that I can use PROC SQL to find distinct counts, but I am unsure how to begin. Each PROC SQL only produces a table with 1 row and 1 column:
proc sql;
create table test as
select count(distinct Drink) as Coffee
from work.dataset where Type='Coffee' and Price>=3;
quit;
Is there a way to use PROC SQL to select more than one observation? I would really appreciate a push in the right direction.
Thank you,
Yawen
Do you want a data set or report table (html, pdf, rtf or text of some form)?
If you want a data set values like: 5(100.0) would have to be text and likely not to sort or such properly. And is there actually a requirement to place them in a single cell? I find data like that a tad difficult to read/comprehend at first glance, especially when there is no column indicator that one is count and the other a percentage (?).
Depending on your answer there are a number of approaches.
Also to post data it is best to use a data step. From existing data set Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
What ever format you pasted that data as I more work than I want to do to reformat and write a data step.
Hello Ballardw! Thank you for the tip. Yes, a dataset output is all that is needed. The formatting on this website is just to clearly show the differences in data. I also had issues with inserting tables so it was copied from a word document.
For sake of time, the output should be a sas dataset that looks like this (input dataset is the same format):
Is there a way to get distinct count without using PROC SQL? All sources online show PROC SQL only.
Thank you,
Yawen
Actually your input data is part of the problem. Word tables filtered through the forum do not paste well into the SAS editor for creating data step code. Better is something like:
data have; infile datalines dlm=','; informat drink $25. type $10. Dairy $3. Price best8.; input drink type Dairy price; datalines; Mocha,Coffee,Yes,4.00 Strawberry Milkshake,Blended,Yes,5.00 ; run;
as one way to handle data with embedded spaces.
And a data set in the form you are showing is sub-optimal for almost any purpose other than printing. Which is why I asked if you actually want a report.
1. Please provide usable sample data as @ballardw suggested
2- This code computes the same numbers as what you want. Would this suit your needs?
proc format ;
value $fname (multilabel) 'A'-'B'='AB'
'B'-'C'='BC'
'A'-'Z'=' All';
run;
proc tabulate data=SASHElP.CLASS out=T;
class SEX ;
class NAME / mlf;
format NAME $fname.;
table NAME=' ', SEX=' '*(n colpctn);
run;
F | M | |||
---|---|---|---|---|
N | ColPctN | N | ColPctN | |
All | 9 | 100.00 | 10 | 100.00 |
AB | 1 | 11.11 | 1 | 10.00 |
BC | 1 | 11.11 | . | . |
Hi,
Depending on my undestanding on your need, here is the code. This code can be made even simpler when making it as a macro, due to time constraint I didn't recode it into a macro. You can try by understanding my code.
DATA one;
infile datalines dlm=",";
input Drink :$20. Type $ Dairy $ Price;
datalines;
Mocha,Coffee,Yes,4
Americano,Coffee,No,3
Cappuccino,Coffee,Yes,4
Solo,Coffee,No,1
Doppio,Coffee,No,2
Earl Grey Latte,Tea,Yes,4
London Fog,Tea,Yes,4
Iced Green Tea,Tea,No,2
Iced Sweet Tea,Tea,No,3
Chocolate Milkshake,Blended,Yes,5
Vanilla Milkshake,Blended,Yes,5
Strawberry Milkshake,Blended,Yes,5
Mango Smoothie,Blended,No,4
Strawberry Smoothie,Blended,No,3
Banana Smoothie,Blended,No,3
;
run;
Data one;
Set one;
run;
proc sort data=one;
by type price ;
run;
DATA One_(Keep=Total price_3 type);
retain Total price_3;
set one;
by type;
If first.type then do;
Total=1;
price_3=0;
end;
else Total+1;
if price>=3 then price_3+1;
If last.type then output;
Label Total="Total Drinks" price_3="Price >=3";
run;
proc transpose data=One_ Out=need_1(drop=_Name_) Label=Label;
ID type;
run;
DATA need1(drop=coffee_ Tea_ Blended_);
FORMAT Label $40. coffee $8. Tea $8. Blended $8. Combined $8.;
set need_1(rename=(coffee=coffee_ Tea=Tea_ Blended=Blended_));
IF _N_=1 then DO;
COFFEE=CAT(coffee_,"(100)");
Tea=CAT(Tea_,"(100)");
Blended=CAT(Blended_,"(100)");
combined=CAT(SUM(Blended_,tea_,coffee_),"(100)");
CALL SYMPUT("Coffee_Total",Coffee_);
CALL SYMPUT("Tea_Total",Tea_);
CALL SYMPUT("Blended_Total",Blended_);
CALL SYMPUT("Combined_Total",SUM(Blended_,tea_,coffee_));
END;
ELSE DO;
Coffee=CAT(Coffee_,'(',(Coffee_*100/SYMGET("Coffee_Total")),')');
Tea= CAT(Tea_,'(',(Tea_*100/SYMGET("Tea_Total")),')');
Blended=CAT(Blended_,'(',(Blended_*100/SYMGET("Blended_Total")),')');
Combined=CAT(SUM(Blended_,tea_,coffee_),'(',SUM(Blended_,tea_,coffee_)*100/SYMGET("Combined_Total"),')');
END;
RUN;
data two;
set one;
IF Dairy="No";
RUN;
proc sort data=two;
by type price ;
run;
DATA Two_(Keep=Total price_3 type);
retain Total price_3;
set two;
by type;
If first.type then do;
Total=1;
price_3=0;
end;
else Total+1;
if price>=3 then price_3+1;
If last.type then output;
Label Total="Drinks Without Dairy" price_3="Price >=3";
run;
proc transpose data=Two_ Out=need_2 (drop=_Name_) Label=Label;
ID type;
run;
DATA need2(drop=coffee_ Tea_ Blended_);
FORMAT Label $40. coffee $8. Tea $8. Blended $8. Combined $8.;
set need_2(rename=(coffee=coffee_ Tea=Tea_ Blended=Blended_));
Coffee=CAT(Coffee_,'(',(Coffee_*100/SYMGET("Coffee_Total")),')');
Tea= CAT(Tea_,'(',(Tea_*100/SYMGET("Tea_Total")),')');
Blended=CAT(Blended_,'(',(Blended_*100/SYMGET("Blended_Total")),')');
Combined=CAT(SUM(Blended_,tea_,coffee_),'(',ROUND(SUM(Blended_,tea_,coffee_)*100/SYMGET("Combined_Total"),0.11),')');
RUN;
data Three;
set one;
IF Dairy="Yes";
RUN;
proc sort data=Three;
by type price ;
run;
DATA Three_(Keep=Total price_3 type);
retain Total price_3;
set Three;
by type;
If first.type then do;
Total=1;
price_3=0;
end;
else Total+1;
if price>=3 then price_3+1;
If last.type then output;
Label Total="Drinks With Dairy" price_3="Price >=3";
run;
proc transpose data=Three_ Out=need_3 (drop=_Name_) Label=Label;
ID type;
run;
DATA need3(drop=coffee_ Tea_ Blended_);
FORMAT Label $40. coffee $8. Tea $8. Blended $8. Combined $8.;
set need_3(rename=(coffee=coffee_ Tea=Tea_ Blended=Blended_));
Coffee=CAT(Coffee_,'(',(Coffee_*100/SYMGET("Coffee_Total")),')');
Tea= CAT(Tea_,'(',(Tea_*100/SYMGET("Tea_Total")),')');
Blended=CAT(Blended_,'(',(Blended_*100/SYMGET("Blended_Total")),')');
Combined=CAT(SUM(Blended_,tea_,coffee_),'(',ROUND(SUM(Blended_,tea_,coffee_)*100/SYMGET("Combined_Total"),0.11),')');
RUN;
Data Final;
SET need1 need2 need3;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.