BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
turcay
Lapis Lazuli | Level 10

Hello everyone,

 

I would like to ask my question to you. Again, it includes to get the data set output of some procedures. I know, SAS doesn’t let us to get the data set which has alredy had same structure with Report output but I need to create my desired output with some other methods.

 

I have a sample code as below(My real data is too big). Proc Tabulate almost can provide my desired output but I need some additional columns on my desired output.

 

Sample Data Set;

 

Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;

 

Here is my desired output;

DesiredCAP.png

 My first problem is to show values as zero even they don’t exist in the data set. For Example, in my sample data set there is no “1” values for "RankVariable" so “PROC TABULATE” doesn’t bring them in its report output, I also need this values as “0”.

Secondly, I need to add cumulative column but this column needs to show the values being percentage values.

 

Current Proc Tabulate Statement;

Proc Tabulate Data=Have Out=Have2;
Class  RankVariable / Order=Unformatted Missing;
Class Variable1 / Order=Unformatted Missing;
Table RankVariable , Variable1;
Run;

  I also need this report output as data set. It shouldn’t be same structure, I need the following data set.

 

Data Set Desired;

DesiredOutput.png 

Actually, this following steps could help me to create the data set but it is too manual, I think there should be more simple method to do the foreigoing demands. At this point, I would like to get your helps.

 

Proc Sql;
Create table ZeroCount AS
Select 
Count(Case When Variable1=0 And RankVariable=0 Then 0 Else . End) AS N0
,Count(Case When Variable1=0 And RankVariable=1 Then 1 Else . End) AS N1
,Count(Case When Variable1=0 And RankVariable=2 Then 2 Else . End) AS N2
From Have;
Create Table OneCount AS
Select
Count(Case When Variable1=1 And RankVariable=0 Then 0 Else . End) AS N0
,Count(Case When Variable1=1 And RankVariable=1 Then 1 Else . End) AS N1
,Count(Case When Variable1=1 And RankVariable=2 Then 2 Else . End) AS N2
From Have;
Quit;

PROC SQL;
CREATE TABLE CountTable AS 
SELECT * FROM ZEROCOUNT
OUTER UNION CORR 
SELECT * FROM ONECOUNT;
Quit;
 
 
PROC TRANSPOSE DATA=CountTable
     OUT=CountTable2
     PREFIX=Col
     NAME=Count;
     VAR N0 N1 N2;
RUN;
QUIT;
 
Data Cumulative;
Set CountTable2;
CumulativeColumn + Col2;
Run;

 Thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Here is :





Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;
Data Level;
RankVariable =0;Output;
RankVariable =1;Output;
RankVariable =2;Output;
Run;
Proc Sql ;
Create Table Want As
Select Data1.*, Sum(Data1.Variable1_1) As Total From
(
select b.RankVariable,coalesce(a.Variable1_0,0) as Variable1_0
                     ,coalesce(a.Variable1_1,0) as Variable1_1 from
(Select RankVariable
,Sum(Variable1=0) As Variable1_0 Label='0'
,Sum(Variable1=1) As Variable1_1 Label='1'
From Have
Group By RankVariable) as a right join level as b 
 on a.RankVariable =b.RankVariable 
)
 AS Data1;
Quit;

Data Want2;
Set Want;
Lag_Variable1_0=Lag(Variable1_0);
Lag_Variable1_1=Lag(Variable1_1);
If Lag_Variable1_0=. Then Lag_Variable1_0=0;
If Lag_Variable1_1=. Then Lag_Variable1_1=0;
Cumulative+Variable1_1;
CumulativePercent=(Cumulative/Total);
Run;

View solution in original post

20 REPLIES 20
Cynthia_sas
Diamond | Level 26
Hi:
Have you looked into the use of PRELOADFMT with PROC TABULATE? You will need a user-defined format, but then you can show all possible categories even when the category is not present in the data. There are examples in the doc for how to use PRELOADFMT.
cynthia
Cynthia_sas
Diamond | Level 26
Hi:
It seems like this question is related to this previous post: https://communities.sas.com/t5/Base-SAS-Programming/Get-Number-of-Zero-Values-and-Get-Percentage-of-...

was there something about that post that has changed since you marked it as solved?

cynthia
turcay
Lapis Lazuli | Level 10

Hello Cynthia,

 

After you mentioned the PRELOADFMT in your previous post , I remembered and checked my previous question which you shared with me as above. Now, I'm checking that post to reach my aim.

 

Actually, because I used Reeza's code, I didn't investigate PRELOADFMT sample deeply. I'm looking your sample PRELOADFMT code then I will try to create my desired output report and data set by using PRELOADFMT.

 

Thank you,

Ksharp
Super User

It is more suitable for PROC REPORT.  

PROC TABULATE don't have statistical like CUMPCT.

 

 

Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;

proc format;
value fmt
 0='0'
 1='1'
 2='2';
run;


proc report data=have nowd completerows ;
columns RankVariable Variable1 CumulativePercent ;
define RankVariable/group format=fmt. preloadfmt;
define Variable1 /across;
define CumulativePercent /computed format=percent8.2;
compute before;
 cumpct=0;
 sum=sum(_c2_,_c3_);
endcomp;
compute CumulativePercent ;
 cumpct+sum(_c2_,_c3_)/sum;
 CumulativePercent=cumpct;
endcomp;
run;

 

x.png

turcay
Lapis Lazuli | Level 10

Xia,

 

Thank you very much for providing foregoing code 🙂

 

But is %44.44 right? shouldn't be it %40.00 instead of %44.44? I'm not sure.

 

Thank you,

turcay
Lapis Lazuli | Level 10

Xia,

 

For this sample data set, yes. But in my real data set, I have 80.000 obs and it seems to me it takes to much time, I'm still waiting.

 

Thank you,

 

Ksharp
Super User

This could get you a little faster .

 

Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;

proc format;
value fmt
 0='0'
 1='1'
 2='2';
run;

%let dsid=%sysfunc(open(have));
%let nobs=%sysfunc(attrn(&dsid,nlobs));
%let dsid=%sysfunc(close(&dsid));

proc report data=have nowd completerows ;
columns RankVariable Variable1 CumulativePercent ;
define RankVariable/group format=fmt. preloadfmt;
define Variable1 /across;
define CumulativePercent /computed format=percent8.2;
compute CumulativePercent ;
 cumpct+sum(_c2_,_c3_)/&nobs;
 CumulativePercent=cumpct;
endcomp;
run;
turcay
Lapis Lazuli | Level 10

Xia,

 

Thank you very much for your attention and providing this code.

 

But it seems that it takes too much time for 80.000 obs. I think for 80.000 obs, it can be better to do the calculation outside of PROC REPORT. What do you think?

 

Thank you

 

turcay
Lapis Lazuli | Level 10

Cynthia,

 

Am I writing syntax incorrect?

 

proc format;
value fmt
 0="0"
 1="1"
 2="2";
run;
Proc Tabulate Data=Have Missing Out=Have2;
Class  RankVariable Variable1 / Preloadfmt ;
Table RankVariable , Variable1 / Printmiss;
Format RankVariable Variable1 $fmt.;
Run;

Thank you

Ksharp
Super User

Sure.

 

Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;

data level;
RankVariable=0;output;
RankVariable=1;output;
RankVariable=2;output;
run;

proc sql noprint;
select count(*) into : nobs from have;

create table temp as
select b.RankVariable,a.Variable1 
 from have as a full join level as b on
  a.RankVariable=b.RankVariable;
  
create table want as
select RankVariable,sum(Variable1=0) as Variable1_0 label='0',
                    sum(Variable1=1) as Variable1_1 label='1'
 from temp
  group by RankVariable;
quit;
data want;
 set want ;
 CumulativePercent+(Variable1_0+Variable1_1)/&nobs;
 format CumulativePercent percent8.2;
run;
proc report data=want nowd;
columns RankVariable ('Variable1' Variable1_0 Variable1_1) CumulativePercent ;
define RankVariable /display;
run;
turcay
Lapis Lazuli | Level 10

Thank you very much Xia,

 

Your sample code helped me so much, however I still didn't reach my desired output.

 

Let's say that my RankVariable has 10 values, to create my desired output, I modified your sample data as below . It is almost done. In the percentage column, I need to add one additional thing to reach my goal.

 

Data Level;
RankVariable =0;Output;
RankVariable =1;Output;
RankVariable =2;Output;
RankVariable =3;Output;
RankVariable =4;Output;
RankVariable =5;Output;
RankVariable =6;Output;
RankVariable =7;Output;
RankVariable =8;Output;
RankVariable =9;Output;
RankVariable =10;Output;
Run;
 
Proc Sql Noprint;
Select Count(*) Into : Nobs From Have;
 
Create Table Want As
Select B.RankVariable 
,Sum(Variable1=0) As Variable1_0 Label='0'
,Sum(Variable1=1) As Variable1_1 Label='1'
From Have As A
Full Join Level As B
On A.RankVariable =B.RankVariable 
Group By B.RankVariable ;
Quit;
Data Want2;
Set Want;
Variable1_0=Lag(Variable1_0);
Variable1_1=Lag(Variable1_1);
If Variable1_0=. Then Variable1_0=0;
If Variable1_1=. Then Variable1_1=0;
Run;

Data Want3;
Set Want2 ;
CumulativePercent+Variable1_1/&Nobs;
Format CumulativePercent Percent8.2;
Run;
 
 
Proc Report Data=Want2 Nowd;
Columns RankVariable  ('Variable1_1' Variable1_0 Variable1_1) CumulativePercent ;
Define RankVariable  / Display;
Run;

 

My current report output,

Current.png

 

Desired Report;

Desired.png

 

PS: I need to get cumulative percentage values just for NPL_F_1, it is "Variable1_1" in my sample data set.

 

Can somebody still help me?

 

Thank you

Cynthia_sas
Diamond | Level 26
Hi:
Since you are creating the Cumulative Percent yourself, you have 2 choices: multiply by 100 in the DATA step where you calculate Cumulative Percent OR do it in PROC REPORT. What is your DEFINE statement for the CumulativePercentage variable. The SAS PERCENT format does an automatic multiply by 100, why not try that? Or, just multiply by 100 in the DATA step if you don't want a Percent sign.

cynthia
turcay
Lapis Lazuli | Level 10

Thank you Cynthia,

 

So I have two choices,

 

I can do it in Data step or I can do it in Proc Report.

 

Actually, Xia prepared codes for both Data Step and Proc Report statements.

 

Because I have 80.000 obs in my real Data Set, it takes to much time when we use Proc Report statement. So I decided to create in Data Set.

 

Xia's code seems correct, but there are some diferences between Excel and SAS output. I think the following step needs to change. I multiplied with 100, but it didn't seem correct. I need to make some changes on Cumulative percentage step.

 

Data Want3;
Set Want2 ;
CumulativePercent+Variable1_1/&Nobs;
Format CumulativePercent Percent8.2;
Run;

I also tried to use PRELOADFMT. in my one of the previos post, but I got an error.

 

I'm sorry, I'm little bit confused 😞 I'm going to make some search more detailed. Let me check again.

 

Thank you

 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 3128 views
  • 5 likes
  • 3 in conversation