BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saurabh291989
Calcite | Level 5

Hi!

 

I'm the new user of Base SAS Programming. 

I have got a stuck at a problem.

 

Question: Which Year Maximum standard deviation from the mean compared to the standard deviation from the mean whole dataset.

 

Note: This has to be done only using proc tabulate instead of using proc means, proc freq, proc sql or marco.

 

 

data xyz;

infile datalines missover turncate;

input year casualties @@;

datalines;

1980 1    1989 23

1981 3    1990 103

1985 45  1991 56

1980 85  1983 78

1989 23  1983 85

1990 21  1980 56

1984 22  1984 35

1989 56  1981 14

....

1985 26   1989 12

;

/* individual calculation */

proc tabulate data=xyz out=abc;

class year;

var casualties;

tables year, casualties*std;

run;

 

/* whole dataset calculation*/

proc tabulate data=xyz out=pqr;

var casualties;

tables  casualties*std;

run;

 

I don't know what to do next to compare these two datasets (abc or pqr) to get my final required output. I need your assistance.

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ

Hi:

  Well, here's what TABULATE would give on a limited number of years:

tab_using_all_limited_years.png

 

The line with "All" above is the overall for these years. 2006 and 2009 and a few other years are higher than the overall number.  What would you envision as the FINAL report? I don't understand what you mean when you say you: "want a report that shows the comparison of each year std to the overall Std and also display the greater values in the output."

 

  What would your "comparison of each year to the overall" look like? Is there a formula, how would this look in a small table such as the one I show above, where you can see a few years and then see the overall. And, "display the greater values in the output" which greater values? Only show the highest std values?

 

Cynthia

View solution in original post

13 REPLIES 13
Cynthia_sas
SAS Super FREQ

Hi:

 I'm not sure you need 2 PROC TABULATE steps. You could just do this:

tables year all='Overall', casualties*std;

 

which would put the overall STD at the bottom of the rows for each year.

 

  But I don't understand what your desired output is. Do you want a report that shows the comparison of each year to the overall STD?

 

  Also, I would not expect that you get any output at all, because I would expect you are seeing this error message in the log:

ERROR 23-2: Invalid option name TURNCATE.

 

because "turncate" is not a valid option for the INFILE statement.

 

Cynthia

Saurabh291989
Calcite | Level 5
Hi Cynthia!



Yes,I want a report that shows the comparison of each year std to the overall Std and also display the greater values in the output. And Yes, you're right turncate option throw an error due to spell mistake.
Cynthia_sas
SAS Super FREQ

Hi:

  Well, here's what TABULATE would give on a limited number of years:

tab_using_all_limited_years.png

 

The line with "All" above is the overall for these years. 2006 and 2009 and a few other years are higher than the overall number.  What would you envision as the FINAL report? I don't understand what you mean when you say you: "want a report that shows the comparison of each year std to the overall Std and also display the greater values in the output."

 

  What would your "comparison of each year to the overall" look like? Is there a formula, how would this look in a small table such as the one I show above, where you can see a few years and then see the overall. And, "display the greater values in the output" which greater values? Only show the highest std values?

 

Cynthia

Saurabh291989
Calcite | Level 5

Yes, Only shows the highest std values with their respective year.

I posting my actual sas code may you get more idea what I'm looking for.

Saurabh291989
Calcite | Level 5
options dlcreatedir;
proc options option=dlcreatedir ;
run;

filename File "/folders/myfolders/raw_files/Airplane_Crashes_and_Fatalities_Since_1908.csv" ;
libname Lib "/folders/myfolders/DataSet";

/*Fetching Data from external file*/
data Lib.Airplanes(label="DETAILS:Airplanes Crashes since 1908");
infile File  missover dsd lrecl=32767 firstobs=2 dlm=',' ;

input Date mmddyy10. +1 Time:$10.  Location:$100. Operator: $50.
Flight: $50. Route: $100. Plane_Type: $100. Registration: $20. cn_In:$20. Aboard 
Fatalities  Ground  Summary: $1000. ;
format Date mmddyy10.;

/* Extracting Year, Month and Day from Date */
Year=year(Date);
/* Trimming route */
Route=trim(Route);
/* Calculating Survivors */
Survivor=Aboard-Fatalities;
/* Calculating Decade  */
Decade=floor(Year/10)*10;
drop Date;
run; 
/***** SORTING AIRPLANES DATASET IN ASCENDING ORDER ******/
proc sort data=Lib.Airplanes;
by Year;
run;
/*//-----------------------CASUALTIES--------------// */
/*
 * QUESTION:1) YEAR WHICH HAD THE MEAN NUMBER OF CRASHES CLOSEST TO MEDIAN NUMBER OF 
 * CRASHES ?
*/
/**************		YEARS		********************/ 
ods select none; *NO PRINT FOR LINE 41 STATEMENT;
proc tabulate data=Lib.Airplanes  out=WORK.YearlyCrashes( rename=(N=Crash))  ;
class Year;
table  Year,N;
run;
ods select all; * PRINT FOR LINE 46 STATEMENT ;
proc tabulate data=WORK.YearlyCrashes;
var Crash Year ;
table (Year*Mean=''*F=4.)(Crash='No. of plane Crashes'*Mean*F=4.)(Year*Median=''*F=4.)
(Crash='No. of plane Crashes'*Median*F=4.)/rts=20.;
title3 color=RED bold 'Year which had the Mean number of crashes closest to Median number of 
crashes';
run;
/* QUESTION :2)  WHICH YEAR HAD THE MAXIMUM STANDARD DEVIATION FROM THE MEAN COMPARED TO THE STANDARD
*          	  DEVIATION FROM THE MEAN WHOLE DATASET?
*/
proc tabulate data=Lib.Airplanes out=Work.Std(rename= Fatalities_std=SYstd);
var Fatalities;
class Year;
tables (Year='' All='Overall'),Fatalities='Causalties'* Std='Standard Deviation'*F=5./box='Year';
run;
/***** calculating std(individual)>std(overall) *********/
data Work.stdfinal;
set Work.std;
retain std_001 Year_00 0;

if(SYstd > last.SYstd) then 
do;
std_001=SYstd;
Year_001=Year;
end;
keep std_001 Year_001;
run;
/******** printing yearly standard deviation(s) > overall standard deviations*******/
proc tabulate data=work.stdfinal;
class Year_001;
var std_001;
tables Year_001='', std_001='highest Standard Deviation'*sum=''*F=3./box='Year';
title color=RED bold ' YEAR WHICH HAD THE MAXIMUM STANDARD DEVIATION FROM THE MEAN COMPARED TO THE STANDARD          	  DEVIATION FROM THE MEAN WHOLE DATASET';
run;
ballardw
Super User

@Saurabh291989 wrote:

Yes, Only shows the highest std values with their respective year.

I posting my actual sas code may you get more idea what I'm looking for.


Code that does not actually work used to tell us "more idea" of what you are looking for still requires mind reading because we would still be guessing at what you might want.

 

Pick a small example that you can do by hand, show the input data and the actual desired result and whether the result should be data set (to be read by machines for further processing) or a report to be read by people.

Saurabh291989
Calcite | Level 5
I just want a report that is read by the people which consist of only those std which are higher than the overall std deviation.
Cynthia_sas
SAS Super FREQ

Hi:

  In this output, there are report rows for 6 years, where the values are higher than the overall amount? Now what?

showing_years_higher_overall.png

What would you envision seeing next? Would you want to have PROC TABULATE highlight these cells automatically (with a user-defined format) instead of doing it manually? Would you ONLY want to see the highlighted rows on the report where the values are higher than the overall value? I still don't understand 1) what you want the final report to look like and 2) how you want the overall number to be used in determining what rows should appear on the report.

 

cynthia

Saurabh291989
Calcite | Level 5

yes, Cynthia, I want only those rows for the years that are highlighted in yellow for the report.

Cynthia_sas
SAS Super FREQ

Hi:

  Based on the code you posted, I don't understand your use of "last.SYstd" in the DATA step or what that logic is doing. I would expect you to get a NOTE in the log that NOTE: Variable last.SYstd is uninitialized. and that the program wouldn't work as you desire.

  Since you have to use a DATA step to generate your STDFinal data, with only the rows you want, I'd probably have taken a different approach and just used a DATA step to make a Macro variable to hold the value of the the overall std and then use that to create WORK.STDFinal.

 

  Do you want to see the final TABULATE with a difference between the overall the the year? Or see the years appearing in descending order from greatest STD to lowest, for the rows selected? Just some other ideas.

 

Cynthia

Saurabh291989
Calcite | Level 5
I used last.SYstd for comparing the last value of std to the whole column value. But it didn't work that why I'm looking for your help.

Yes, I want to see the years appearing in descending order from greatest STD to lowest, for the rows selected in the final report.
Cynthia_sas
SAS Super FREQ

Hi: For last. to work, you have to be using BY group processing, but I don't see that you need BYGroup processing here.

 

  Given the data that you posted, a simple MACRO variable should allow you to create a report like this:

partial_std_gt.png

This is the approach I would take, as shown here:

example_code_partial_std.png

Notice that the output is report #6. I did a few extra steps along the way, to illustrate various approaches and outputs.

 

Here's the final code that creates all the outputs.

** read data;
data xyz;
  length date 8 time $10 location $100 operator $50 flt_num $50
         route $100 type $100 registration $20 cn_In $20
         aboard casualties ground 8 summary $1000;
  infile "<path to file>\Airplane_Crashes_and_Fatalities_Since_1908.csv" dlm=',' dsd firstobs=2;
  input Date : anydtdte. time $ location $ operator $ flt_num $ 
        Route $ Type $ Registration $ cn_In aboard casualties ground summary $;
  if date gt . then year = year(date);
  Survivor=Aboard-casualties;
  Decade=floor(Year/10)*10;
  format Date mmddyy10.;
run;
  
title;
/* individual calculation */
proc tabulate data=xyz out=work.std;
title '1) Original TABULATE makes WORK.STD';
class year;
var casualties;
tables year all, casualties*std / box='Individual Calculation';
run;

title;
proc print data=work.std;
title '2) Data WORK.STD _TYPE_ =1 are year std and _TYPE_ = 0 is overall';
run;

** make a macro variable that represents the overall std;
data _null_;
  set work.std;
  if _type_ = '0' then do;
     call symputx('ovstd',put(casualties_std,8.4),'G');
  end;
run;

%put The overall STD is &ovstd;
    
  
** now make a format to highlight cells gt overall;
proc format;
  value c_std &ovstd-high = 'yellow'
            other = 'white';
run;
    
** repeat original tabulate, and just highlight the years gt overall;
proc tabulate data=xyz f=8.4;
title '3) Highlight desired values to show on final report';
title2 'Show all rows, first highlighting starts in year 1972';
class year;
var casualties;
tables year all='Overall', 
     casualties*std=' '*{s={background=c_std.}}
   / box='Individual Calculation';
run;

** create WORK.STDFINAL from WORK.STD from only _TYPE_=1 rows;
** where casualties_std > overall std;
** using the macro variable created from WORK.STD;
data Work.stdfinal;
set Work.std;
where _type_ = '1';
retain overall &ovstd;

if casualties_std > overall then do;
   difference = casualties_std - overall;
   output;
end;
run;

proc print data=work.stdfinal;
  title '4) After using Macro Variable for Overall STD';
  var year casualties_std overall difference;
run;

** FINAL tabulate, show only the years and the difference;
proc tabulate data=work.stdfinal f=8.4;
title1 color=RED bold 
    "5) STANDARD DEVIATION FOR DATASET WAS: &ovstd";
title2 color=RED bold 
    'YEARS WHICH HAD THE MAXIMUM STANDARD DEVIATION FROM THE MEAN COMPARED TO THE STANDARD DEVIATION FROM THE MEAN WHOLE DATASET';
class Year ;
var casualties_std difference;
tables Year=' ', 
       casualties_std='highest Standard Deviation'*sum=' ' difference*sum=' '
       /box='Year';
run;
 
** show report in descending order of casualties_std value;
proc sort data=stdfinal;
  by descending casualties_std;
run;
   
proc tabulate data=work.stdfinal f=8.4;
title1 color=RED bold 
    "6) STANDARD DEVIATION FOR DATASET WAS: &ovstd";
title2 color=RED bold 
    'YEARS WHICH HAD THE MAXIMUM STANDARD DEVIATION FROM THE MEAN COMPARED TO THE STANDARD DEVIATION FROM THE MEAN WHOLE DATASET';
class Year /order=data;
var casualties_std difference;
tables Year=' ', 
       casualties_std='highest Standard Deviation'*sum=' ' difference*sum=' '
       /box='Year';
run;
title;
   

 

Hope this helps,

Cynthia

Saurabh291989
Calcite | Level 5
Thank you Cynthia for the solution !

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 4177 views
  • 1 like
  • 3 in conversation