DATA Step, Macro, Functions and more

How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Accepted Solution Solved
Reply
Super Contributor
Posts: 395
Accepted Solution

How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

[ Edited ]

Hello everyone,

 

I would like to ask my question to you. Again, it includes to get the data set output of some procedures. I know, SAS doesn’t let us to get the data set which has alredy had same structure with Report output but I need to create my desired output with some other methods.

 

I have a sample code as below(My real data is too big). Proc Tabulate almost can provide my desired output but I need some additional columns on my desired output.

 

Sample Data Set;

 

Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;

 

Here is my desired output;

DesiredCAP.png

 My first problem is to show values as zero even they don’t exist in the data set. For Example, in my sample data set there is no “1” values for "RankVariable" so “PROC TABULATE” doesn’t bring them in its report output, I also need this values as “0”.

Secondly, I need to add cumulative column but this column needs to show the values being percentage values.

 

Current Proc Tabulate Statement;

Proc Tabulate Data=Have Out=Have2;
Class  RankVariable / Order=Unformatted Missing;
Class Variable1 / Order=Unformatted Missing;
Table RankVariable , Variable1;
Run;

  I also need this report output as data set. It shouldn’t be same structure, I need the following data set.

 

Data Set Desired;

DesiredOutput.png 

Actually, this following steps could help me to create the data set but it is too manual, I think there should be more simple method to do the foreigoing demands. At this point, I would like to get your helps.

 

Proc Sql;
Create table ZeroCount AS
Select 
Count(Case When Variable1=0 And RankVariable=0 Then 0 Else . End) AS N0
,Count(Case When Variable1=0 And RankVariable=1 Then 1 Else . End) AS N1
,Count(Case When Variable1=0 And RankVariable=2 Then 2 Else . End) AS N2
From Have;
Create Table OneCount AS
Select
Count(Case When Variable1=1 And RankVariable=0 Then 0 Else . End) AS N0
,Count(Case When Variable1=1 And RankVariable=1 Then 1 Else . End) AS N1
,Count(Case When Variable1=1 And RankVariable=2 Then 2 Else . End) AS N2
From Have;
Quit;

PROC SQL;
CREATE TABLE CountTable AS 
SELECT * FROM ZEROCOUNT
OUTER UNION CORR 
SELECT * FROM ONECOUNT;
Quit;
 
 
PROC TRANSPOSE DATA=CountTable
     OUT=CountTable2
     PREFIX=Col
     NAME=Count;
     VAR N0 N1 N2;
RUN;
QUIT;
 
Data Cumulative;
Set CountTable2;
CumulativeColumn + Col2;
Run;

 Thank you

 


Accepted Solutions
Solution
‎06-09-2016 04:08 AM
Super User
Posts: 10,030

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Here is :





Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;
Data Level;
RankVariable =0;Output;
RankVariable =1;Output;
RankVariable =2;Output;
Run;
Proc Sql ;
Create Table Want As
Select Data1.*, Sum(Data1.Variable1_1) As Total From
(
select b.RankVariable,coalesce(a.Variable1_0,0) as Variable1_0
                     ,coalesce(a.Variable1_1,0) as Variable1_1 from
(Select RankVariable
,Sum(Variable1=0) As Variable1_0 Label='0'
,Sum(Variable1=1) As Variable1_1 Label='1'
From Have
Group By RankVariable) as a right join level as b 
 on a.RankVariable =b.RankVariable 
)
 AS Data1;
Quit;

Data Want2;
Set Want;
Lag_Variable1_0=Lag(Variable1_0);
Lag_Variable1_1=Lag(Variable1_1);
If Lag_Variable1_0=. Then Lag_Variable1_0=0;
If Lag_Variable1_1=. Then Lag_Variable1_1=0;
Cumulative+Variable1_1;
CumulativePercent=(Cumulative/Total);
Run;

View solution in original post


All Replies
SAS Super FREQ
Posts: 8,866

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Hi:
Have you looked into the use of PRELOADFMT with PROC TABULATE? You will need a user-defined format, but then you can show all possible categories even when the category is not present in the data. There are examples in the doc for how to use PRELOADFMT.
cynthia
SAS Super FREQ
Posts: 8,866

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Posted in reply to Cynthia_sas
Hi:
It seems like this question is related to this previous post: https://communities.sas.com/t5/Base-SAS-Programming/Get-Number-of-Zero-Values-and-Get-Percentage-of-...

was there something about that post that has changed since you marked it as solved?

cynthia
Super Contributor
Posts: 395

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Posted in reply to Cynthia_sas

Hello Cynthia,

 

After you mentioned the PRELOADFMT in your previous post , I remembered and checked my previous question which you shared with me as above. Now, I'm checking that post to reach my aim.

 

Actually, because I used Reeza's code, I didn't investigate PRELOADFMT sample deeply. I'm looking your sample PRELOADFMT code then I will try to create my desired output report and data set by using PRELOADFMT.

 

Thank you,

Super User
Posts: 10,030

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

It is more suitable for PROC REPORT.  

PROC TABULATE don't have statistical like CUMPCT.

 

 

Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;

proc format;
value fmt
 0='0'
 1='1'
 2='2';
run;


proc report data=have nowd completerows ;
columns RankVariable Variable1 CumulativePercent ;
define RankVariable/group format=fmt. preloadfmt;
define Variable1 /across;
define CumulativePercent /computed format=percent8.2;
compute before;
 cumpct=0;
 sum=sum(_c2_,_c3_);
endcomp;
compute CumulativePercent ;
 cumpct+sum(_c2_,_c3_)/sum;
 CumulativePercent=cumpct;
endcomp;
run;

 

x.png

Super Contributor
Posts: 395

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

[ Edited ]

Xia,

 

Thank you very much for providing foregoing code Smiley Happy

 

But is %44.44 right? shouldn't be it %40.00 instead of %44.44? I'm not sure.

 

Thank you,

Super User
Posts: 10,030

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

But you only have 9 obs , right ?
Super Contributor
Posts: 395

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Xia,

 

For this sample data set, yes. But in my real data set, I have 80.000 obs and it seems to me it takes to much time, I'm still waiting.

 

Thank you,

 

Super User
Posts: 10,030

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

This could get you a little faster .

 

Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;

proc format;
value fmt
 0='0'
 1='1'
 2='2';
run;

%let dsid=%sysfunc(open(have));
%let nobs=%sysfunc(attrn(&dsid,nlobs));
%let dsid=%sysfunc(close(&dsid));

proc report data=have nowd completerows ;
columns RankVariable Variable1 CumulativePercent ;
define RankVariable/group format=fmt. preloadfmt;
define Variable1 /across;
define CumulativePercent /computed format=percent8.2;
compute CumulativePercent ;
 cumpct+sum(_c2_,_c3_)/&nobs;
 CumulativePercent=cumpct;
endcomp;
run;
Super Contributor
Posts: 395

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Xia,

 

Thank you very much for your attention and providing this code.

 

But it seems that it takes too much time for 80.000 obs. I think for 80.000 obs, it can be better to do the calculation outside of PROC REPORT. What do you think?

 

Thank you

 

Super Contributor
Posts: 395

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Cynthia,

 

Am I writing syntax incorrect?

 

proc format;
value fmt
 0="0"
 1="1"
 2="2";
run;
Proc Tabulate Data=Have Missing Out=Have2;
Class  RankVariable Variable1 / Preloadfmt ;
Table RankVariable , Variable1 / Printmiss;
Format RankVariable Variable1 $fmt.;
Run;

Thank you

Super User
Posts: 10,030

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Sure.

 

Data Have;
Length Variable1 8 RankVariable 8;
Infile Datalines Missover;
Input Variable1 RankVariable;
Datalines;
0 0
1 2
0 0
1 0
0 2
1 0
1 2
1 2
0 2
;
Run;

data level;
RankVariable=0;output;
RankVariable=1;output;
RankVariable=2;output;
run;

proc sql noprint;
select count(*) into : nobs from have;

create table temp as
select b.RankVariable,a.Variable1 
 from have as a full join level as b on
  a.RankVariable=b.RankVariable;
  
create table want as
select RankVariable,sum(Variable1=0) as Variable1_0 label='0',
                    sum(Variable1=1) as Variable1_1 label='1'
 from temp
  group by RankVariable;
quit;
data want;
 set want ;
 CumulativePercent+(Variable1_0+Variable1_1)/&nobs;
 format CumulativePercent percent8.2;
run;
proc report data=want nowd;
columns RankVariable ('Variable1' Variable1_0 Variable1_1) CumulativePercent ;
define RankVariable /display;
run;
Super Contributor
Posts: 395

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

[ Edited ]

Thank you very much Xia,

 

Your sample code helped me so much, however I still didn't reach my desired output.

 

Let's say that my RankVariable has 10 values, to create my desired output, I modified your sample data as below . It is almost done. In the percentage column, I need to add one additional thing to reach my goal.

 

Data Level;
RankVariable =0;Output;
RankVariable =1;Output;
RankVariable =2;Output;
RankVariable =3;Output;
RankVariable =4;Output;
RankVariable =5;Output;
RankVariable =6;Output;
RankVariable =7;Output;
RankVariable =8;Output;
RankVariable =9;Output;
RankVariable =10;Output;
Run;
 
Proc Sql Noprint;
Select Count(*) Into : Nobs From Have;
 
Create Table Want As
Select B.RankVariable 
,Sum(Variable1=0) As Variable1_0 Label='0'
,Sum(Variable1=1) As Variable1_1 Label='1'
From Have As A
Full Join Level As B
On A.RankVariable =B.RankVariable 
Group By B.RankVariable ;
Quit;
Data Want2;
Set Want;
Variable1_0=Lag(Variable1_0);
Variable1_1=Lag(Variable1_1);
If Variable1_0=. Then Variable1_0=0;
If Variable1_1=. Then Variable1_1=0;
Run;

Data Want3;
Set Want2 ;
CumulativePercent+Variable1_1/&Nobs;
Format CumulativePercent Percent8.2;
Run;
 
 
Proc Report Data=Want2 Nowd;
Columns RankVariable  ('Variable1_1' Variable1_0 Variable1_1) CumulativePercent ;
Define RankVariable  / Display;
Run;

 

My current report output,

Current.png

 

Desired Report;

Desired.png

 

PS: I need to get cumulative percentage values just for NPL_F_1, it is "Variable1_1" in my sample data set.

 

Can somebody still help me?

 

Thank you

SAS Super FREQ
Posts: 8,866

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Hi:
Since you are creating the Cumulative Percent yourself, you have 2 choices: multiply by 100 in the DATA step where you calculate Cumulative Percent OR do it in PROC REPORT. What is your DEFINE statement for the CumulativePercentage variable. The SAS PERCENT format does an automatic multiply by 100, why not try that? Or, just multiply by 100 in the DATA step if you don't want a Percent sign.

cynthia
Super Contributor
Posts: 395

Re: How to - Get Desired Data Set Output & Report Output By Using Data Step or Proc Sql

Posted in reply to Cynthia_sas

Thank you Cynthia,

 

So I have two choices,

 

I can do it in Data step or I can do it in Proc Report.

 

Actually, Xia prepared codes for both Data Step and Proc Report statements.

 

Because I have 80.000 obs in my real Data Set, it takes to much time when we use Proc Report statement. So I decided to create in Data Set.

 

Xia's code seems correct, but there are some diferences between Excel and SAS output. I think the following step needs to change. I multiplied with 100, but it didn't seem correct. I need to make some changes on Cumulative percentage step.

 

Data Want3;
Set Want2 ;
CumulativePercent+Variable1_1/&Nobs;
Format CumulativePercent Percent8.2;
Run;

I also tried to use PRELOADFMT. in my one of the previos post, but I got an error.

 

I'm sorry, I'm little bit confused Smiley Sad I'm going to make some search more detailed. Let me check again.

 

Thank you

 

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 457 views
  • 5 likes
  • 3 in conversation