turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to create a table with distinct count

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2017 06:20 PM

Hello SAS community,

I am attempting to create a table with PROC SQL and need to find the distict count of several things. I am using SAS ver9.3.

I have a dataset I wish to manipulate as shown below:

Mocha Americano Cappuccino Solo Doppio Earl Grey Latte London Fog Iced Green Tea Iced Sweet Tea Chocolate Milkshake Vanilla Milkshake Strawberry Milkshake Mango Smoothie Strawberry Smoothie Banana Smoothie |
Coffee Coffee Coffee Coffee Coffee Tea Tea Tea Tea Blended Blended Blended Blended Blended Blended |
Yes No Yes No No Yes Yes No No Yes Yes Yes No No No |
4.00 3.00 4.00 1.00 2.00 4.00 4.00 2.00 3.00 5.00 5.00 5.00 4.00 3.00 3.00 |

I would like to create a table that looks something like this:

Total drinks Price >=3
Drinks without dairy Price >=3
Drinks with dairy Price >=3 |
5(100.0) 3(60.0)
3(60.0) 3(60.0)
2(40.0) 0(0.0) |
4(100.0) 2(50.0)
2(50.0) 2(50.0)
2(50.0) 0(0.0) |
6(100.0) 2(33.3)
3(50.0) 2(33.3)
3(50.0) 0(0.0) |
15(100.0) 7(46.7)
8(53.3) 7(46.7)
7(46.7) 0(0.0) |

Basically, I am looking to build a table with distinct counts based on a specification. The Types are split into the variables, along with a combination of all 3. The numbers in the parenthesis are the percentages from the total, as seen in the first row.

I have created a table of the Labels using proc format:

**proc** **format** ;

value row

** 1** ='Total drinks'

** 1.1** =' Price >=3'

** 2** ='Drinks without dairy'

** 2.1** =' Price >=3'

** 3** ='Drinks with dairy'

** 3.1** =' Price >=3'

;

**quit**;

**proc** **format** library=work cntlout=temp (keep=fmtname start end label) ;

**quit**;

I understand that I can use PROC SQL to find distinct counts, but I am unsure how to begin. Each PROC SQL only produces a table with 1 row and 1 column:

**proc** **sql**;

create table test as

select count(distinct Drink) as Coffee

from work.dataset where Type='Coffee' and Price>=**3**;

**quit**;

Is there a way to use PROC SQL to select more than one observation? I would really appreciate a push in the right direction.

Thank you,

Yawen

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yawenyu

08-03-2017 06:53 PM

Do you want a data set or report table (html, pdf, rtf or text of some form)?

If you want a data set values like: 5(100.0) would have to be text and likely not to sort or such properly. And is there actually a requirement to place them in a single cell? I find data like that a tad difficult to read/comprehend at first glance, especially when there is no column indicator that one is count and the other a percentage (?).

Depending on your answer there are a number of approaches.

Also to post data it is best to use a data step. From existing data set Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

What ever format you pasted that data as I more work than I want to do to reformat and write a data step.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

08-03-2017 07:03 PM

Hello Ballardw! Thank you for the tip. Yes, a dataset output is all that is needed. The formatting on this website is just to clearly show the differences in data. I also had issues with inserting tables so it was copied from a word document.

For sake of time, the output should be a sas dataset that looks like this (input dataset is the same format):

Is there a way to get distinct count without using PROC SQL? All sources online show PROC SQL only.

Thank you,

Yawen

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yawenyu

08-03-2017 07:24 PM

Actually your input data is part of the problem. Word tables filtered through the forum do not paste well into the SAS editor for creating data step code. Better is something like:

data have; infile datalines dlm=','; informat drink $25. type $10. Dairy $3. Price best8.; input drink type Dairy price; datalines; Mocha,Coffee,Yes,4.00 Strawberry Milkshake,Blended,Yes,5.00 ; run;

as one way to handle data with embedded spaces.

And a data set in the form you are showing is sub-optimal for almost any purpose other than printing. Which is why I asked if you actually want a report.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yawenyu

08-03-2017 11:24 PM

1. Please provide usable sample data as @ballardw suggested

2- This code computes the same numbers as what you want. Would this suit your needs?

```
proc format ;
value $fname (multilabel) 'A'-'B'='AB'
'B'-'C'='BC'
'A'-'Z'=' All';
run;
proc tabulate data=SASHElP.CLASS out=T;
class SEX ;
class NAME / mlf;
format NAME $fname.;
table NAME=' ', SEX=' '*(n colpctn);
run;
```

F | M | |||
---|---|---|---|---|

N | ColPctN | N | ColPctN | |

All | 9 | 100.00 | 10 | 100.00 |

AB | 1 | 11.11 | 1 | 10.00 |

BC | 1 | 11.11 | . | . |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yawenyu

08-04-2017 11:19 PM

Hi,

Depending on my undestanding on your need, here is the code. This code can be made even simpler when making it as a macro, due to time constraint I didn't recode it into a macro. You can try by understanding my code.

```
DATA one;
infile datalines dlm=",";
input Drink :$20. Type $ Dairy $ Price;
datalines;
Mocha,Coffee,Yes,4
Americano,Coffee,No,3
Cappuccino,Coffee,Yes,4
Solo,Coffee,No,1
Doppio,Coffee,No,2
Earl Grey Latte,Tea,Yes,4
London Fog,Tea,Yes,4
Iced Green Tea,Tea,No,2
Iced Sweet Tea,Tea,No,3
Chocolate Milkshake,Blended,Yes,5
Vanilla Milkshake,Blended,Yes,5
Strawberry Milkshake,Blended,Yes,5
Mango Smoothie,Blended,No,4
Strawberry Smoothie,Blended,No,3
Banana Smoothie,Blended,No,3
;
run;
Data one;
Set one;
run;
proc sort data=one;
by type price ;
run;
DATA One_(Keep=Total price_3 type);
retain Total price_3;
set one;
by type;
If first.type then do;
Total=1;
price_3=0;
end;
else Total+1;
if price>=3 then price_3+1;
If last.type then output;
Label Total="Total Drinks" price_3="Price >=3";
run;
proc transpose data=One_ Out=need_1(drop=_Name_) Label=Label;
ID type;
run;
DATA need1(drop=coffee_ Tea_ Blended_);
FORMAT Label $40. coffee $8. Tea $8. Blended $8. Combined $8.;
set need_1(rename=(coffee=coffee_ Tea=Tea_ Blended=Blended_));
IF _N_=1 then DO;
COFFEE=CAT(coffee_,"(100)");
Tea=CAT(Tea_,"(100)");
Blended=CAT(Blended_,"(100)");
combined=CAT(SUM(Blended_,tea_,coffee_),"(100)");
CALL SYMPUT("Coffee_Total",Coffee_);
CALL SYMPUT("Tea_Total",Tea_);
CALL SYMPUT("Blended_Total",Blended_);
CALL SYMPUT("Combined_Total",SUM(Blended_,tea_,coffee_));
END;
ELSE DO;
Coffee=CAT(Coffee_,'(',(Coffee_*100/SYMGET("Coffee_Total")),')');
Tea= CAT(Tea_,'(',(Tea_*100/SYMGET("Tea_Total")),')');
Blended=CAT(Blended_,'(',(Blended_*100/SYMGET("Blended_Total")),')');
Combined=CAT(SUM(Blended_,tea_,coffee_),'(',SUM(Blended_,tea_,coffee_)*100/SYMGET("Combined_Total"),')');
END;
RUN;
data two;
set one;
IF Dairy="No";
RUN;
proc sort data=two;
by type price ;
run;
DATA Two_(Keep=Total price_3 type);
retain Total price_3;
set two;
by type;
If first.type then do;
Total=1;
price_3=0;
end;
else Total+1;
if price>=3 then price_3+1;
If last.type then output;
Label Total="Drinks Without Dairy" price_3="Price >=3";
run;
proc transpose data=Two_ Out=need_2 (drop=_Name_) Label=Label;
ID type;
run;
DATA need2(drop=coffee_ Tea_ Blended_);
FORMAT Label $40. coffee $8. Tea $8. Blended $8. Combined $8.;
set need_2(rename=(coffee=coffee_ Tea=Tea_ Blended=Blended_));
Coffee=CAT(Coffee_,'(',(Coffee_*100/SYMGET("Coffee_Total")),')');
Tea= CAT(Tea_,'(',(Tea_*100/SYMGET("Tea_Total")),')');
Blended=CAT(Blended_,'(',(Blended_*100/SYMGET("Blended_Total")),')');
Combined=CAT(SUM(Blended_,tea_,coffee_),'(',ROUND(SUM(Blended_,tea_,coffee_)*100/SYMGET("Combined_Total"),0.11),')');
RUN;
data Three;
set one;
IF Dairy="Yes";
RUN;
proc sort data=Three;
by type price ;
run;
DATA Three_(Keep=Total price_3 type);
retain Total price_3;
set Three;
by type;
If first.type then do;
Total=1;
price_3=0;
end;
else Total+1;
if price>=3 then price_3+1;
If last.type then output;
Label Total="Drinks With Dairy" price_3="Price >=3";
run;
proc transpose data=Three_ Out=need_3 (drop=_Name_) Label=Label;
ID type;
run;
DATA need3(drop=coffee_ Tea_ Blended_);
FORMAT Label $40. coffee $8. Tea $8. Blended $8. Combined $8.;
set need_3(rename=(coffee=coffee_ Tea=Tea_ Blended=Blended_));
Coffee=CAT(Coffee_,'(',(Coffee_*100/SYMGET("Coffee_Total")),')');
Tea= CAT(Tea_,'(',(Tea_*100/SYMGET("Tea_Total")),')');
Blended=CAT(Blended_,'(',(Blended_*100/SYMGET("Blended_Total")),')');
Combined=CAT(SUM(Blended_,tea_,coffee_),'(',ROUND(SUM(Blended_,tea_,coffee_)*100/SYMGET("Combined_Total"),0.11),')');
RUN;
Data Final;
SET need1 need2 need3;
run;
```

Thanks,

Suryakiran

Suryakiran