Kick-start Your Data Exploration Journey with SAS Information Catalog
Recent Library Articles
Recently in the SAS Community Library: @smanoj introduces you to the SAS Information Catalog, which provides a powerful search engine to find needed data assets.
This is the code ods output Quartiles= median_ci(where=(percent=50));
proc lifetest data=test;
time duration*censor(1);
strata group;
run; It runs, but in results for quartile estimates it only shows complete data (point estimate, 95 CI) for the percent 25. There is no point estimate for 50 or 75. For this percents, it only shows the lower CI. Is this something to do with the data? Should I calculate the median and CI with another proc? The median and CI are to describe survival data. Any help is appreciated.
... View more
proc sql;
create table t1
as
select *
from
t2
where
name like 'PNB%'
and rn in
(select distinct rn from
t4)
and
datepart(DATE) <= '7Dec2002:00:00:00'd
;quit; I have a very big dataset and its taking forever to run and many times connections disrupts and I need to wait for a long again. can you please suggest a better way to do it. Here table T2 is like very big taking almost an hour to complete the query if everything goes well or need to restart many times . Thank you in advance kajal
... View more
Hi Need help to Highlight mismatched rows(compare 2-3 , 4-5,6-7 rows so on dynamically) data Using SAS while generating a multi sheet excel file . SAS is on Unix server. I am using SAS EG 7.1 Below is the data I have to compare every two rows and highlight the mismatched data while generating the Excel and send it via email. Note - there are multiple sheets and columns vary (sometimes there maybe 10 columns, sometimes it may be 18) I have SAS code(used ODS) to generate excel with multiple sheets and send it via email and separate a VBA macro to highlight the mismatched data which I run manually. But I am unable to integrate both in to single code so that no manual work is needed, I really appreciate if anyone can suggest any way to do it in one single code. Below is sample dataset , rows and column are dynamically created based on the project. Need to compare 2-3,4-5, 6-7..so-on rows and highlight(I have given blue to the font but i need whole box as yellow) only when they don’t match. ssn firstname lastname empid address city state zip 000-11-1234 abc d 1001 plotno 123 abcd rd 12345 000-11-1234 abc e 1001 plotno 123 abcd rd 12345 111-12-3456 efgh f 2001 house 897 tyui fg 23456-789 111-12-3456 efgh f 2001 hs 897 tyui fg 23456 444-56-7890 wrt y 3001 xc 12345 444-56-7890 wert y 3002 xc 12345 my VBA code: Sub compare_Auto() Dim LastColumn As Long LastColumn = ActiveSheet.Range("A1").CurrentRegion.Columns.Count Dim rng As Range, rngDiff As Range Set rng = Range(Cells(2, 7), Cells(3, LastColumn)) Do While Application.CountA(rng) > 0 'loop while have content Set rngDiff = Nothing 'reset range On Error Resume Next 'ignore "No cells were found" error if no differences Set rngDiff = rng.ColumnDifferences(Comparison:=rng.Cells(1)) On Error GoTo 0 'stop ignoring errors If Not rngDiff Is Nothing Then rngDiff.Interior.ColorIndex = 6 'color all differences rngDiff.Offset(-1).Interior.ColorIndex = 6 End If Set rng = rng.Offset(2) 'two rows down Loop End Sub
... View more
Hello
Lets say that have data set has wide structure than I know how to calculate -First_Smallest,Second_Smallest,Third_Smallest..
Let's say that have data set has ling structure.
What is the way to calculate-First_Smallest,Second_Smallest,Third_Smallest from this data set?
(Without change the structure from long to wide)?
Note-
First_Smallest is the value in smallest observation
Second_Smallest is the value in second smallest observation
Third_Smallest is the value in third smallest observation
data have1;
input CustID VAR $ mon1 mon2 mon3 mon4 mon5 mon6;
cards;
1 x 10 20 30 40 50 60
1 W 60 50 40 30 20 10
1 Z 30 30 30 0 0 0
2 x 30 20 10 10 40 0
;
Run;
data want1;
set have1;
First_Smallest=smallest(1,mon1,mon2,mon3,mon4,mon5,mon6) ;
Second_Smallest=smallest(2,mon1,mon2,mon3,mon4,mon5,mon6) ;
Third_Smallest=smallest(3,mon1,mon2,mon3,mon4,mon5,mon6) ;
Run;
data have2;
input CustID VAR $ month $ value;
cards;
1 x mon1 10
1 x mon2 20
1 x mon3 30
1 x mon4 40
1 x mon5 50
1 x mon6 60
1 w mon1 60
1 w mon2 50
1 w mon3 40
1 w mon4 30
1 w mon5 20
1 w mon6 10
1 z mon1 30
1 z mon2 30
1 z mon3 30
1 z mon4 0
1 z mon5 0
1 z mon6 0
2 x mon1 30
2 x mon2 20
2 x mon3 10
2 x mon4 10
2 x mon5 40
2 x mon6 0
;
run;
... View more
Hi, I have been working on the Essentials 1 Case Study looking at TSA Claims data. I have reviewed all the documents and am still having troubles with my code. For the total number of date issues, I got 3938 but the document says it should be 4241. For the part with the macro for StateName, every time I run it there is no results/output and the log says: NOTE: No observations were selected from data set TSA.CLAIMS_CLEANED. NOTE: There were 0 observations read from the data set TSA.CLAIMS_CLEANED. WHERE 0 /* an obviously FALSE WHERE clause */ ; I even copied and pasted all of the code from the document and still receive the same number (3938) and same log message from my original code. When the macro is for "C" then there are results that show up, but none for the example with "California" (as per the document code). I noticed in the proc contents that the length of State and StateName was only 1, perhaps that's the problem? I tried changing the length of those in the data steps, but that didn't change anything. Can someone please help me make this work? Below is my code after I set up my library: options validvarname=v7;
proc import datafile="&path/TSAClaims2002_2017.csv"
dbms=csv out=ClaimsImport replace;
guessingrows=max;
run;
/* EXPLORE DATA */
/* Look at first 20 observations */
proc print data=tsa.claimsimport (obs=20);
run;
/* Look at the descriptor portion of the data, sorted bycolumn order. */
proc contents data=tsa.claimsimport varnum;
run;
proc freq data=tsa.claimsimport;
tables Claim_Site Disposition Claim_Type / nocum nopercent;
tables Incident_Date Date_Received / nocum nopercent;
format Incident_Date Date_Received year4.;
run;
/* PREPARE DATA */
/* Remove entirely duplicate rows and create new table */
proc sort data=tsa.claimsimport
out=tsa.Claims_NoDups
nodupkey;
by _all_;
run;
proc sort data=tsa.claims_nodups;
by Incident_Date;
run;
data tsa.Claims_Cleaned;
set tsa.claims_nodups;
/* Clean Claim_Site by replacing missing and - values with Unknown */
if Claim_Site in ('-',"") then Claim_Site = "Unknown";
/*Clean Disposition */
if Disposition in ("-", "") then
Disposition = 'Unknown';
else if Disposition = 'Closed: Canceled' then
Disposition = 'Closed:Canceled';
else if Disposition = 'losed: Contractor Claim' then
Disposition ='Closed:Contractor Claim';
/*Clean Claim_Type*/
if Claim_Type in ("-","") then Claim_Type = "Unknown";
else if Claim_Type = 'Passenger Property Loss/Injury' then Claim_Type = 'Passenger Property Loss';
else if Claim_Type = 'Passenger Property Loss/Injur' then Claim_Type = 'Passenger Property Loss';
else if Claim_Type = 'Property Damage/Personal Injury' then Claim_Type = 'Property Damage';
/* Convert State values to uppercase and all StateName Columns to proper case */
State=upcase(state);
StateName=propcase(StateName);
/* Create columns that identifies rows with issues */
if (Incident_Date > Date_Received or
Incident_Date = . or
Date_Received = . or
year(Incident_Date) < 2002 or
year(Incident_Date) > 2017 or
year(Date_Received) < 2002 or
year(Date_Received) > 2017)
then Date_Issues = "Needs Review";
/* Create permanent labels for columns */
format Incident_Date Date_Received date9. Close_Amount dollar20.2;
label Airport_Code = "Airport Code"
Airport_Name = "Airport Name"
Claim_Number = "Claim Number"
Claim_Site = "Claim Site"
Claim_Type = "Claim Type"
Close_Amount = "Close Amount"
Date_Issues = "Date Issues"
Date_Received = "Date Received"
Incident_Date = "Incident Date"
Item_Category = "Item Category";
/* Drop County and City Columns */
drop County City;
run;
/* ANALYZE */
/* How many date issues are in the overall data? 3938 */
title "Overall Date Issues in the Data";
proc freq data=tsa.Claims_Cleaned;
table Date_Issues / nocum nopercent;
run;
title;
/* How many claims per year of Incident_date are there in the overall data?
Values vary between years. There were 2122 in 2002 and 8340 in 2017 */
/* Include a plot */
ods graphics on;
title "Overall Claims by Year";
proc freq data=tsa.Claims_Cleaned;
table Incident_Date / nocum nopercent plots=freqplot;
format Incident_Date year4.;
where Date_Issues is missing; /* filter where rows DO NOT have any date issues */
run;
/* What are the frequency values for Claim_Type, Claim_Site, and Disposition for */
/* a selected state? NOTE: User should be able to dynamically input a specific state */
%let StateName=California;
proc freq data=tsa.Claims_Cleaned order=freq;
tables Claim_Type Claim_Site Disposition / nocum nopercent;
where StateName="&StateName" and Date_Issues is null; /* Filter for only rows where colunm has missing value */
run;
proc means data=tsa.Claims_Cleaned mean min max sum maxdec=0;
var Close_Amount;
where StateName="&StateName" and Date_Issues is null;
run;
... View more