BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ChadGotU
Calcite | Level 5

I've been working on the TSA Case Study, and when I imported the file, columns such as State, County, and City only have one character. So I thought the next step to fix this, would be to:

 

format State $2.;

format County $20.;

format City $20.;

 

The out put of this code did not produce an error, and the output table remained the same with each column on displaying one character. I also tried the length syntax and that also did not work. 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

SAS will define a variable (set the type and the storage length) based on when it in the code the variable first appears (with some exceptions, like RETAIN statement).

 

So in your code if those variables are in the input dataset their type and length is defined when the SET statement is seen.  Otherwise the length is set by the comparison in the IF statement.

 

If you want to modify the length of a character variable being read from an existing dataset then set the length before the SET statement.

 

So perhaps something like:

data tsa_claims_cleaned_2;
  set tsa_claims_cleaned;
  length Claim_Type Claim_Site Disposition $6  Date_Issues $12 ;

  if Claim_Type=" " then Claim_Type="Unknown";
  if Claim_Site=" " then Claim_Type="Unknown";
  if Disposition=" " then Disposition="Unknown";
  StateName=propcase(StateName);
  State=upcase(State);


  if Incident_Date=. then Date_Issues="Needs Review";
  else if Date_received=. then Date_Issues="Needs Review";
  else Date_Issues="";

  format Date_Received Incident_Date mmddyy10.;
  format _character_ ;

run;

But if Claim_Type, Claim_Site and Disposition already exist in tsa_claims_cleaned and they are shorter than 6 bytes then you need to move the LENGTH statement for them to before the SET statement.

 

Also:

1)  Try not to overwrite your source dataset by using the same dataset name in the DATA and SET statement. 

2) If you change the length of character variable be careful that it does not already have a format attached to it that will only print some of the characters stored in the variable. You can use a FORMAT statement that just lists variables without any format specification after them to clear the formats attached.

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Show us the code you are using.

 

Show us a portion of your input data, as SAS data step code (instructions).

--
Paige Miller
ChadGotU
Calcite | Level 5
data work.tsa_claims_cleaned;
set tsa_claims_cleaned;


if Claim_Type= "" then
Claim_Type="Unknown";

if Claim_Site= "" then
Claim_Type="Unknown";

if Disposition="" then
Disposition="Unknown";

StateName=propcase(StateName);
State=upcase(State);

format Date_Received mmddyy.;
format Incident_Date mmddyy.;


format Date_Issues $12.;
Date_Issues="";
if Incident_Date= "." then
Date_Issues="Needs Review";
if Date_received= "." then
Date_Issues="Needs Review";

run;


proc print data=work.tsa_claims_cleaned (obs=100);
run;
Tom
Super User Tom
Super User

SAS will define a variable (set the type and the storage length) based on when it in the code the variable first appears (with some exceptions, like RETAIN statement).

 

So in your code if those variables are in the input dataset their type and length is defined when the SET statement is seen.  Otherwise the length is set by the comparison in the IF statement.

 

If you want to modify the length of a character variable being read from an existing dataset then set the length before the SET statement.

 

So perhaps something like:

data tsa_claims_cleaned_2;
  set tsa_claims_cleaned;
  length Claim_Type Claim_Site Disposition $6  Date_Issues $12 ;

  if Claim_Type=" " then Claim_Type="Unknown";
  if Claim_Site=" " then Claim_Type="Unknown";
  if Disposition=" " then Disposition="Unknown";
  StateName=propcase(StateName);
  State=upcase(State);


  if Incident_Date=. then Date_Issues="Needs Review";
  else if Date_received=. then Date_Issues="Needs Review";
  else Date_Issues="";

  format Date_Received Incident_Date mmddyy10.;
  format _character_ ;

run;

But if Claim_Type, Claim_Site and Disposition already exist in tsa_claims_cleaned and they are shorter than 6 bytes then you need to move the LENGTH statement for them to before the SET statement.

 

Also:

1)  Try not to overwrite your source dataset by using the same dataset name in the DATA and SET statement. 

2) If you change the length of character variable be careful that it does not already have a format attached to it that will only print some of the characters stored in the variable. You can use a FORMAT statement that just lists variables without any format specification after them to clear the formats attached.

Tom
Super User Tom
Super User

@ChadGotU wrote:

I've been working on the TSA Case Study, and when I imported the file, columns such as State, County, and City only have one character. So I thought the next step to fix this, would be to:

 

format State $2.;

format County $20.;

format City $20.;

 

The out put of this code did not produce an error, and the output table remained the same with each column on displaying one character. I also tried the length syntax and that also did not work. 

 


The FORMAT attached to a variable just impacts how the value is displayed when printed as TEXT, like in a report.  It does not change what is stored in any way.  So attaching the $20. to COUNTY will not change whether COUNTY contains 1 byte or 100 bytes, it just means that you only want it to print the first 20 bytes of the value.

 

How did you "import" the data?  What type of file was it in before?  Or did you download it from some remote database system?

 

Note that if the source is a text file, like a CSV file, then don't bother to "import" it. Just write a data step to READ it and you will have complete control over how the variables are defined, including whether or not any display formats are permanently attached to the variables.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1237 views
  • 0 likes
  • 3 in conversation