Hi,
I have been working on the Essentials 1 Case Study looking at TSA Claims data. I have reviewed all the documents and am still having troubles with my code.
For the total number of date issues, I got 3938 but the document says it should be 4241.
For the part with the macro for StateName, every time I run it there is no results/output and the log says:
options validvarname=v7; proc import datafile="&path/TSAClaims2002_2017.csv" dbms=csv out=ClaimsImport replace; guessingrows=max; run; /* EXPLORE DATA */ /* Look at first 20 observations */ proc print data=tsa.claimsimport (obs=20); run; /* Look at the descriptor portion of the data, sorted bycolumn order. */ proc contents data=tsa.claimsimport varnum; run; proc freq data=tsa.claimsimport; tables Claim_Site Disposition Claim_Type / nocum nopercent; tables Incident_Date Date_Received / nocum nopercent; format Incident_Date Date_Received year4.; run; /* PREPARE DATA */ /* Remove entirely duplicate rows and create new table */ proc sort data=tsa.claimsimport out=tsa.Claims_NoDups nodupkey; by _all_; run; proc sort data=tsa.claims_nodups; by Incident_Date; run; data tsa.Claims_Cleaned; set tsa.claims_nodups; /* Clean Claim_Site by replacing missing and - values with Unknown */ if Claim_Site in ('-',"") then Claim_Site = "Unknown"; /*Clean Disposition */ if Disposition in ("-", "") then Disposition = 'Unknown'; else if Disposition = 'Closed: Canceled' then Disposition = 'Closed:Canceled'; else if Disposition = 'losed: Contractor Claim' then Disposition ='Closed:Contractor Claim'; /*Clean Claim_Type*/ if Claim_Type in ("-","") then Claim_Type = "Unknown"; else if Claim_Type = 'Passenger Property Loss/Injury' then Claim_Type = 'Passenger Property Loss'; else if Claim_Type = 'Passenger Property Loss/Injur' then Claim_Type = 'Passenger Property Loss'; else if Claim_Type = 'Property Damage/Personal Injury' then Claim_Type = 'Property Damage'; /* Convert State values to uppercase and all StateName Columns to proper case */ State=upcase(state); StateName=propcase(StateName); /* Create columns that identifies rows with issues */ if (Incident_Date > Date_Received or Incident_Date = . or Date_Received = . or year(Incident_Date) < 2002 or year(Incident_Date) > 2017 or year(Date_Received) < 2002 or year(Date_Received) > 2017) then Date_Issues = "Needs Review"; /* Create permanent labels for columns */ format Incident_Date Date_Received date9. Close_Amount dollar20.2; label Airport_Code = "Airport Code" Airport_Name = "Airport Name" Claim_Number = "Claim Number" Claim_Site = "Claim Site" Claim_Type = "Claim Type" Close_Amount = "Close Amount" Date_Issues = "Date Issues" Date_Received = "Date Received" Incident_Date = "Incident Date" Item_Category = "Item Category"; /* Drop County and City Columns */ drop County City; run; /* ANALYZE */ /* How many date issues are in the overall data? 3938 */ title "Overall Date Issues in the Data"; proc freq data=tsa.Claims_Cleaned; table Date_Issues / nocum nopercent; run; title; /* How many claims per year of Incident_date are there in the overall data? Values vary between years. There were 2122 in 2002 and 8340 in 2017 */ /* Include a plot */ ods graphics on; title "Overall Claims by Year"; proc freq data=tsa.Claims_Cleaned; table Incident_Date / nocum nopercent plots=freqplot; format Incident_Date year4.; where Date_Issues is missing; /* filter where rows DO NOT have any date issues */ run; /* What are the frequency values for Claim_Type, Claim_Site, and Disposition for */ /* a selected state? NOTE: User should be able to dynamically input a specific state */ %let StateName=California; proc freq data=tsa.Claims_Cleaned order=freq; tables Claim_Type Claim_Site Disposition / nocum nopercent; where StateName="&StateName" and Date_Issues is null; /* Filter for only rows where colunm has missing value */ run; proc means data=tsa.Claims_Cleaned mean min max sum maxdec=0; var Close_Amount; where StateName="&StateName" and Date_Issues is null; run;
NOTE: No observations were selected from data set TSA.CLAIMS_CLEANED.NOTE: There were 0 observations read from the data set TSA.CLAIMS_CLEANED.WHERE 0 /* an obviously FALSE WHERE clause */ ;
If the lengths are 1 then you probably need to go back to the start where you make the datasets.
I notice in the code you posted it is using PROC IMPORT to convert a CSV file into a dataset. That is not a reliable way to make CONSISTENT dataset structures. Did the instructions actually tell you to do that? What do you get when you run that step? Do the instructions tell you how many observations and variables the PROC IMPORT step should GUESS to create from the CSV file?
Also the import step is making a dataset named WORK.CLAIMSIMPORT and then later steps are referencing a different dataset named TSA.CLAIMSIMPORT. Are those two datasets supposed to be related in some way?
I used PROC IMPORT because the lessons/courses teach you that is how to import the data...log reports everything is fine when imported and the output results show up as expected. Instructions say how many variables but not how many observations.
Attached is the entire case study, my code is like the code presented in the manual at the end. I literally copied and pasted their code (changing the file locations) and it still didn't work.
tsa.claimsimport is referencing the tsa library that the dataset was supposed to be in (again, shows this in the manual).
If you forget to include the GUESSINGROWS=MAX option then PROC IMPORT will only check a few lines of the file before finalizing its guesses about how to define the variables.
When it sees an empty column it will define it as character variable with length of 1 (since that will use the least amount of storage).
The beginning of the CSV file I found on-line at this site
https://github.com/MaxineXiong/TSA_Claims_Data_Analysis/blob/master/TSAClaims2002_2017.csv
(which I assume is the same one) does not have any values for STATE until observation number 9,112.
That would explain why STATE was defined as only one character long.
Try starting over again and make sure to use GUESSINGROWS=MAX; statement in the PROC IMPORT step.
Or just use a simple data step to read the CSV file, such as the one that the %CSV2DS() macro would have generated:
data tsa;
infile 'C:\Downloads\TSAClaims2002_2017.csv' dlm=','
dsd truncover firstobs=2
;
length Claim_Number $13 Date_Received 8 Incident_Date 8 Airport_Code $3
Airport_Name $48 Claim_Type $39 Claim_Site $15 Item_Category $834
Close_Amount 8 Disposition $23 StateName $17 State $2 County $20
City $33
;
input Claim_Number -- City ;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.
Find more tutorials on the SAS Users YouTube channel.