SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lbarwick
Quartz | Level 8

Routinely I export data from my company's Jira service desk. I have to export it as a .csv then save it as a .xlsx, otherwise SAS seems to truncate the number of variables that get imported. Anyway, one of the columns contains free text data like this:

 

Ticket # FTE allocation
001 100
002 100%
003 1
004 75-100
005 50%
006 50 to 100%

 

This is how the data appear in Jira and in .csv and the .xlsx prior to import. I'm using a basic import procedure:

 

proc import out=prr 
		datafile = "C:\Users\lbarwick\Desktop\Jira_Data.xlsx"
		dbms=xlsx replace;
	getnames=Yes; 
run;

What gets processed/imported is this:

 

Ticket # FTE allocation
001 100
002 1
003 1
004 75-100
005 0.5
006 50 to 100%

 

You can see that whenever the value contains %, the value is changed to a fraction of a whole number, otherwise SAS seemingly is retaining the native value as text. I know mixed option is implicit, and I've tried sorting the .xlsx differently prior to import so that explicit character data are in the first few rows to see if SAS accurately interprets all rows as character data. Nothing seems to be working! Any thoughts? I've asked my IT dept about changing registry data to update typeguessrows to 0 but it can't be done.

 

1 ACCEPTED SOLUTION

Accepted Solutions
lbarwick
Quartz | Level 8

Here is the solution I ended up going with:

 

*read-in raw Jira csv data WITHOUT names;
%let dl=C:\Users\lbarwick\Desktop;

filename csv "&dl/Jira_Data.csv" lrecl=2000000;
proc import datafile=csv out=jiranames replace dbms=csv;
 getnames=no;
 datarow=2;
run;

*transpose generic VAR names from raw Jira import from above;
proc transpose data=jiranames(obs=0) out=names ;
 var _all_;
run;


*now that we have variable names set, read-in raw Jira data once again and this time correct issues with carriage returns and line feeds so we get
	all records/values lined up with header.;
%let dl=C:\Users\lbarwick\Desktop;
filename fixed temp;
data _null_;
  infile "&dl/Jira_Data.csv" end=eof;
  file "&dl/Jira_Data_cleaned.csv" ; *output file with carriage return/line feed issues fixed;
  input;
  nq+countc(_infile_,'"');
  nq=mod(nq,2);
  put _infile_ @;
  if eof or not nq then put; 
  else put '|' @;
run;

*import the cleaned Jira data but again, we're not going to import variable names;
filename csv 'C:\Users\lbarwick\Desktop\Jira_Data_cleaned.csv';
proc import out=prr /*(where=(substr(issue_key,1,3)='PRR'))*/
		datafile = "C:\Users\lbarwick\Desktop\Jira_Data_cleaned.csv"
		dbms=csv replace;
		delimiter=',';
		guessingrows=1000;
	getnames=no; 
	datarow=2;
run;

*create temp file which will create code to rename generic variables to the cleaned up variables;
filename code temp;
data _null_;
  file code;
  set names end=eof;
  if not eof2 then set new_names1 end=eof2;
  if _n_=1 then put 'rename';
  put _name_ '=' final_new  ;
  if eof then put ';';
run;

*replace generic variable names with cleaned up variable names;
proc datasets nolist lib=work;
  modify prr;
  %include code;
run;
quit;

View solution in original post

24 REPLIES 24
Reeza
Super User
Read it in from the CSV not XLSX and fix it there. You have no control over types/formats with Excel but you do with CSV. For truncation, you likely need to specify LRECL to specify the record length if you have a really wide data set.
lbarwick
Quartz | Level 8
I've tried reading it in as .csv and SAS does not like this. The file that I get out of Jira is very messy when opened as a .csv in text editor, but for some reason is structured just fine when I save the .csv as a .xlsx.
Reeza
Super User
What does "SAS does not like this" mean?
Did it the error message say "SAS doesn't like your data"?

I suspect it means your data didn't import correctly and there's various reasons this can happen but most are easily fixed. Converting your file to XLSX will cause your more issues in the long run - like you're experiencing. You can choose which problem to fix, but in my opinion it makes more sense to fix your data import from CSV than to try and decode anything after the fact.

lbarwick
Quartz | Level 8

The import happens just fine with .csv - the issue is with how the data are structured. Example below - .csv from Jira contains many hard returns:

 

lbarwick_0-1623347966575.png

 

The result from importing into SAS is something like this where you can see values shifted and in the wrong columns (see second and last rows - that value should not be in that column):

 

lbarwick_1-1623348178790.png

 

I'm trying to avoid as much as possible having to manually manipulate files before importing into SAS.

jimbarbour
Meteorite | Level 14

Well, pick your "poison."  You can

  1. Save the .csv as an Excel file, but Excel is going to automatically butcher things like numbers with a percent sign.  You may be able to do something with Excel to handle the columns as text, but you'll have to play with it.
    jimbarbour_0-1623349169557.png
  2. Use the .csv file directly, but you'll have to do something like using the TRANSLATE function to clean out special characters.  The good news here is that you can use SAS code to clean the data; you don't have to do this manually.

So, pick your "poison."  I don't think there's a way you can do it that doesn't involve some "fiddling" in conjunction with importing the data into SAS.

 

Jim

lbarwick
Quartz | Level 8
I'm sort of coming to the same conclusion, my issue is I would need to manually manipulate my csv before importing it into SAS. In the excel, changing that column to text has the same exact effect I'm experiencing when importing the excel into SAS.
Tom
Super User Tom
Super User

@lbarwick wrote:
I'm sort of coming to the same conclusion, my issue is I would need to manually manipulate my csv before importing it into SAS. In the excel, changing that column to text has the same exact effect I'm experiencing when importing the excel into SAS.

Don't try to fix it after EXCEL has tried to convert the text that is in the file into numbers/dates/percents.  Just make sure you tell EXCEL to read everything as TEXT to begin with.

lbarwick
Quartz | Level 8
The .csv formatting issue is due to records exported from Jira that contain hard returns (e.g. a freetext field where user added a hard return) that essentially corrupts the exported file.
jimbarbour
Meteorite | Level 14

Do you know the specific Hexidecimal value?  If so, you can use a Data step to cleanse these.  Presumably, a "hard return" is '0A'x or '0D'x (probably '0A'x).  You should be able to use TRANSLATE or COMPRESS to clean these out.  You'd probably have to read the data as a Byte stream and apply COMPRESS(VARNAME, , 'kw')' which should delete any non display characters.

 

Jim

Tom
Super User Tom
Super User

It is  generally not hard to fix a delimited text file that has embedded end of line characters so that SAS can read it.

 

You might not even need to change the text file at all.  If JIRA has generated the lines with CR+LF at the end and used only LF or only CR when inserting line breaks into the middle of the values of some of the fields.  Or if the lines always end with LF and the embedded line break character is always a CR (or the reverse).  Just use the TERMSTR= option on the INFILE or FILENAME statement.

 

Otherwise if the text file is using proper quoting (or even proper escaping) of the values with embedded line breaks then it is not hard to convert the file into one that SAS can read. For example by replacing all of the embedded CR and/or LF characters with some other character.  If Excel can read the files then you should be able to write a SAS program to read it and fix it.

 

If you do want to read the file into EXCEL and then save it back out just force EXCEL to read each field as CHARACTER instead of letting it guess which cells should be considered numbers.  So instead of double-clicking the CSV file to open it in Excel instead you can first start Excel and then using the data import (or where ever they have moved that option in the current release).  That way you can tell it to treat every column as TEXT.

jimbarbour
Meteorite | Level 14

Check your SAS log.  Proc import should be generating SAS code for a DATA step behind the scenes.  What you might do is to copy that DATA step code and modify it so that the importation goes according to your rules rather than the default.  However, 100% is probably represented in Excel as a number with a format, that number being 1 in the case of 100%.  You may have to play with it.  Importing it from .csv is probably going to give you a lot more control than importing it from Excel. Your INFILE statement should probably include DSD DLM=',' for csv.

 

Jim

Tom
Super User Tom
Super User

When you have mixed cells in a single column the variable is created as CHAR.  For any cell that has a numeric value instead of a string the RAW value is translated to a text string.  So any display format the cell might have attached to in the spreadsheet is lost.  SAS does the same thing with date values.

 

Have the creator of the spreadsheet make sure to define all of the cells in the column as strings instead of numbers and you will get the percent signs.

 

lbarwick
Quartz | Level 8
Understood but the 'creator' of the spreadsheet in this case is Jira, I get the .csv as an export and then save it as .xlsx. I tried converting all of the values in this column to text and Excel will turn 100% into 1 and 50% into 0.5. The problem with this is that there could be other rows with a legitimate value of 1 (1%) so in this conversion I cannot distinguish between what was originally 100% and what was originally 1%.
ballardw
Super User

@lbarwick wrote:
Understood but the 'creator' of the spreadsheet in this case is Jira, I get the .csv as an export and then save it as .xlsx. I tried converting all of the values in this column to text and Excel will turn 100% into 1 and 50% into 0.5. The problem with this is that there could be other rows with a legitimate value of 1 (1%) so in this conversion I cannot distinguish between what was originally 100% and what was originally 1%.

So EXCEL is the culprit. This is a well know issue. Excel makes a lot of assumptions and converts stuff without giving you much option without going through a lot of work. Just yesterday I had an age range exported from the US census website that should be 1-21 that opening the exported file became a DATE of Jan 2021 because of EXCEL.

 

Read the CSV into SAS instead of that conversion step.

If the file has the same structure then you can set all the variable properties as you want and only change in the input file name and output data set for each new file.

Plus code to read CSV can be modified to read a bunch of files at one time.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 24 replies
  • 3481 views
  • 9 likes
  • 8 in conversation