Hello,
I have a sample data set as below. I want to perform more than one condition in one column. I think I need to use PROC SQL but I just can't handle it. My desired output is at the end of this message.
Can someone help me, please?
Data Have;
Length Date 8 Turnover 8 Customer_ID 8;
Infile Datalines Missover;
Input Date Turnover Customer_ID;
Format Date Date9.;
Datalines;
20393 0 001
20423 100 002
20454 1000 003
20485 100 004
20514 100 005
20545 10 006
20575 0 007
20636 1000 008
20667 100 009
20698 100 010
20728 10 011
20759 100 012
20789 1000 013
20820 1000 014
20851 0 015
20879 10 016
20910 100 017
20940 1000 018
20971 100 019
21001 10 020
;
Run;
Thank you
Well, this works. Basically createa category, then count by that:
proc sql; create table WANT as select CAT, count(distinct CUSTOMER_ID) as RESULT from (select case when TURNOVER = 0 then "Equal 0 ->" when 0 < TURNOVER <=10 then "0> And <=10" when 10 < TURNOVER <=100 then "10> And <=100" when 100 < TURNOVER <=1000 then "100> And <=1000" else "" end as CAT, CUSTOMER_ID from HAVE) group by CAT; quit;
Use a PROC FORMAT + FREQ.
The first example in this paper is similar to your example.
Well, this works. Basically createa category, then count by that:
proc sql; create table WANT as select CAT, count(distinct CUSTOMER_ID) as RESULT from (select case when TURNOVER = 0 then "Equal 0 ->" when 0 < TURNOVER <=10 then "0> And <=10" when 10 < TURNOVER <=100 then "10> And <=100" when 100 < TURNOVER <=1000 then "100> And <=1000" else "" end as CAT, CUSTOMER_ID from HAVE) group by CAT; quit;
Format and proc freq. You only need to dapt the proc format to make changes:
proc format;
value turnover
0 = 'Equal 0'
0 <- 10 = '0> and <=10'
10 <- 100 = '10> and <=100'
100 <- 1000 = '100> and <=1000'
;
run;
data have;
length date 8 turnover 8 customer_ID 8;
infile datalines missover;
input date turnover customer_ID;
format
date date9.
turnover turnover.
;
datalines;
;
run;
proc freq data=have;
tables turnover /out=want (drop=percent) nopercent nocum;
run;
Thank you all of them,
I know I got my response but what if I want to add cumulative and percentage columns by using PROC SQL, how can I do this?
Here is my desired output,
Thank you
Yes, thank you very much but I just wonder how can I do it by using PROC SQL 🙂
Well, percentage is pretty easy as that is a calculation. Cumulative however is not easy, in fact it is very difficult. The reason is that SQL does not relate code to observations in a particular order. So for percentage:
proc sql; create table WANT as select CAT, count(distinct CUSTOMER_ID) as RESULT,
(CALCULATED RESULT / (select count(disinct CUSTOMER_ID from HAVE)) * 100 as PCENT from (select case when TURNOVER = 0 then "Equal 0 ->" when 0 < TURNOVER <=10 then "0> And <=10" when 10 < TURNOVER <=100 then "10> And <=100" when 100 < TURNOVER <=1000 then "100> And <=1000" else "" end as CAT, CUSTOMER_ID from HAVE) group by CAT; quit;
And then for cumulative, I would just add a datastep after that, sorted the way you want, and retain the cumulative as you go. Doing it in teh SQL, the simplest way would just be to select each of the four categories, then union all together - i.e. the above code would be multipled by four rather than having the cat and grouping by that.
Thank you very much.
On the other hand, for Proc tabulate procedure which statement should I add for cumulative calculation column?
For example, following procedure->
PROC TABULATE DATA=HAVE;
CLASS Turnover /ORDER=UNFORMATTED MISSING;
TABLE /* Row Dimension */Turnover,
/* Column Dimension */N ColPctN ;
RUN;
Thank you
Proc tabulate does not do a running cumulative. It will do a TOTAL.
Okay, so here is my final question, what if I want to create following table which procedure should I use it?
Thank you
PROC FREQ is 3 lines of code...
Your table is the definition of the output.
proc freq data=have noprint;
tables turnover /out=want outpct outcum ;
run;
proc print data=want label noobs;
run;
@turcay wrote:
Okay, so here is my final question, what if I want to create following table which procedure should I use it?
Thank you
Just omit the nopercent and nocum option in the tables statement, and remove the drop=percent dataset option. You then get the default output of proc freq.
PS you really need to develop a knack for reading the SAS documentation. Google "sas proc freq tables", and you get the correct page; all the options are thoroughly described there.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.