BookmarkSubscribeRSS Feed
cranberry
Calcite | Level 5

Hi, 

 

I have been working on the Essentials 1 Case Study looking at TSA Claims data. I have reviewed all the documents and am still having troubles with my code. 

 

For the total number of date issues, I got 3938 but the document says it should be 4241.

 

For the part with the macro for StateName, every time I run it there is no results/output and the log says:

 
NOTE: No observations were selected from data set TSA.CLAIMS_CLEANED.
NOTE: There were 0 observations read from the data set TSA.CLAIMS_CLEANED.
WHERE 0 /* an obviously FALSE WHERE clause */ ;
 
I even copied and pasted all of the code from the document and still receive the same number (3938) and same log message from my original code. When the macro is for "C" then there are results that show up, but none for the example with "California" (as per the document code). I noticed in the proc contents that the length of State and StateName was only 1, perhaps that's the problem? I tried changing the length of those in the data steps, but that didn't change anything. 
 
Can someone please help me make this work? Below is my code after I set up my library:
options validvarname=v7;
proc import datafile="&path/TSAClaims2002_2017.csv"
	dbms=csv out=ClaimsImport replace;
	guessingrows=max;
run;

/* EXPLORE DATA */

/* Look at first 20 observations */
proc print data=tsa.claimsimport (obs=20);
run;
/* Look at the descriptor portion of the data, sorted bycolumn order. */
proc contents data=tsa.claimsimport varnum;
run;

proc freq data=tsa.claimsimport;
	tables Claim_Site Disposition Claim_Type / nocum nopercent;
	tables Incident_Date Date_Received / nocum nopercent;
	format Incident_Date Date_Received year4.;
run;

/* PREPARE DATA */ 

/* Remove entirely duplicate rows and create new table */
proc sort data=tsa.claimsimport 
	out=tsa.Claims_NoDups 
		nodupkey;
	by _all_;
run;

proc sort data=tsa.claims_nodups;
	by Incident_Date;
run;


data tsa.Claims_Cleaned;
	set tsa.claims_nodups;
/* Clean Claim_Site by replacing missing and - values with Unknown */
	if Claim_Site in ('-',"") then Claim_Site = "Unknown"; 
 /*Clean Disposition */
	if Disposition in ("-", "") then 
		Disposition = 'Unknown';
	else if Disposition = 'Closed: Canceled' then 
		Disposition = 'Closed:Canceled';
	else if Disposition = 'losed: Contractor Claim' then 
		Disposition ='Closed:Contractor Claim';
/*Clean Claim_Type*/
	if Claim_Type in ("-","") then Claim_Type = "Unknown"; 
	else if Claim_Type = 'Passenger Property Loss/Injury' then Claim_Type = 'Passenger Property Loss';
	else if Claim_Type = 'Passenger Property Loss/Injur' then Claim_Type = 'Passenger Property Loss';
	else if Claim_Type = 'Property Damage/Personal Injury' then Claim_Type = 'Property Damage';
/* Convert State values to uppercase and all StateName Columns to proper case */
	State=upcase(state);
	StateName=propcase(StateName);
/* Create columns that identifies rows with issues	 */
	if (Incident_Date > Date_Received or
		Incident_Date = . or 
		Date_Received = . or
		year(Incident_Date) < 2002 or
		year(Incident_Date) > 2017 or
		year(Date_Received) < 2002 or
		year(Date_Received) > 2017)
		then Date_Issues = "Needs Review";
/* Create permanent labels for columns	 */
format Incident_Date Date_Received date9. Close_Amount dollar20.2;
label Airport_Code = "Airport Code"
	Airport_Name = "Airport Name"
	Claim_Number = "Claim Number"
	Claim_Site = "Claim Site"
	Claim_Type = "Claim Type"
	Close_Amount = "Close Amount"
	Date_Issues = "Date Issues"
	Date_Received = "Date Received"
	Incident_Date = "Incident Date"
	Item_Category = "Item Category";
/* Drop County and City Columns */
drop County City;
run;


/* ANALYZE */ 


/* How many date issues are in the overall data? 3938  */
title "Overall Date Issues in the Data";
proc freq data=tsa.Claims_Cleaned;
	table Date_Issues / nocum nopercent;
run;
title; 

/* How many claims per year of Incident_date are there in the overall data? 
	Values vary between years. There were 2122 in 2002 and 8340 in 2017 */
/* Include a plot */
ods graphics on; 
title "Overall Claims by Year";
proc freq data=tsa.Claims_Cleaned;
	table Incident_Date / nocum nopercent plots=freqplot;
	format Incident_Date year4.;
 	where Date_Issues is missing; /* filter where rows DO NOT have any date issues */
run; 

/* What are the frequency values for Claim_Type, Claim_Site, and Disposition for  */
/* 	a selected state? NOTE: User should be able to dynamically input a specific state */

%let StateName=California;

proc freq data=tsa.Claims_Cleaned order=freq;
	tables Claim_Type Claim_Site Disposition / nocum nopercent;
 	where StateName="&StateName" and Date_Issues is null; /* Filter for only rows where colunm has missing value */
run; 

proc means data=tsa.Claims_Cleaned mean min max sum maxdec=0;
	var Close_Amount;
	where StateName="&StateName" and Date_Issues is null;
run;

5 REPLIES 5
coder1234
Obsidian | Level 7
Did you ensure the length of the states prior to proc content? If proc contents is reading off a sas table with only 1 character length, it will only produce a value of 1 character as output.
PaigeMiller
Diamond | Level 26
NOTE: No observations were selected from data set TSA.CLAIMS_CLEANED.
NOTE: There were 0 observations read from the data set TSA.CLAIMS_CLEANED.
WHERE 0 /* an obviously FALSE WHERE clause */ ;
 
Showing us the ERRORs or NOTEs without showing us the code, and without showing us a portion of the data, is never sufficient. There are a gazillion reasons why you might get these NOTEs in the log. So please show us the entire log for this PROC or for this DATA step. Please show us a portion of the relevant data.
--
Paige Miller
Tom
Super User Tom
Super User

If the lengths are 1 then you probably need to go back to the start where you make the datasets.

 

I notice in the code you posted it is using PROC IMPORT to convert a CSV file into a dataset.  That is not a reliable way to make CONSISTENT dataset structures.  Did the instructions actually tell you to do that?  What do you get when you run that step?  Do the instructions tell you how many observations and variables the PROC IMPORT step should GUESS to create from the CSV file?

 

Also the import step is making a dataset named WORK.CLAIMSIMPORT and then later steps are referencing a different dataset named TSA.CLAIMSIMPORT.  Are those two datasets supposed to be related in some way?

cranberry
Calcite | Level 5

I used PROC IMPORT because the lessons/courses teach you that is how to import the data...log reports everything is fine when imported and the output results show up as expected. Instructions say how many variables but not how many observations.

 

Attached is the entire case study, my code is like the code presented in the manual at the end. I literally copied and pasted their code (changing the file locations) and it still didn't work. 

 

tsa.claimsimport is referencing the tsa library that the dataset was supposed to be in (again, shows this in the manual). 

 

 

Tom
Super User Tom
Super User

If you forget to include the GUESSINGROWS=MAX option then PROC IMPORT will only check a few lines of the file before finalizing its guesses about how to define the variables.

When it sees an empty column it will define it as character variable with length of 1 (since that will use the least amount of storage).

 

The beginning of the CSV file I found on-line at this site

https://github.com/MaxineXiong/TSA_Claims_Data_Analysis/blob/master/TSAClaims2002_2017.csv

(which I assume is the same one) does not have any values for STATE until observation number 9,112.

 

That would explain why STATE was defined as only one character long.

 

Try starting over again and make sure to use GUESSINGROWS=MAX; statement in the PROC IMPORT step.

Or just use a simple data step to read the CSV file, such as the one that the %CSV2DS() macro would have generated:

data tsa;
  infile 'C:\Downloads\TSAClaims2002_2017.csv' dlm=','
    dsd truncover firstobs=2
  ;
  length Claim_Number $13 Date_Received 8 Incident_Date 8 Airport_Code $3
    Airport_Name $48 Claim_Type $39 Claim_Site $15 Item_Category $834
    Close_Amount 8 Disposition $23 StateName $17 State $2 County $20
    City $33
  ;
  input Claim_Number -- City ;
run;

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

LIBNAME 101

Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 4622 views
  • 2 likes
  • 4 in conversation