Hi, the case is that I have a data (one column csv file), but some values ending with a trailing space.
I use proc import to get it in to SAS.
Issue is that I need to use the strings with space, but it always gets imported with no space.
It guesses the max length properly, but when retrieving the data, it always trim the space.
Also, it's saved to the dataset without the trailing space.
anyone can help here?
Welcome to the SAS Community.
Use a data step to import CSV files instead. Do a Google search or search this community. Plenty of code examples out there 🙂
Thanks for welcoming me aboard:)
I tried dataset as well, and same issue, for two days i'm exploring the internet for a solution, so I decided to ask here..
below is my code,
sample data "ABDCEF" that got processed normally, but like "ABCDEF " (with trailing space) it got trimmed.
data IMPORT;
infile "/folders/myfolders/ecsql193/DM.csv"
delimiter = ','
missover
firstobs = 2
DSD
lrecl = 32767;
Length File_Name $12.;
Length Worksheet_Name $12.;
Length Start_Position $2.;
Length End_Position $4.;
Length Structure $8;
Length File_Process_Flag $8;
Length KPI_Process_Flag $31;
Length Fact_Name $103;
Length F_DIM_Category $35;
Length F_DIM_Product_Group $6;
Length F_DIM_Product $21;
Length F_DIM_Subject_Area $22;
Length DIM_Month $1;
Length DIM_RatePlan_Type $1;
Length DIM_Cust_Seg $1;
Length DIM_Traffic_Type $1;
Length DIM_BB_Spd $1;
Length DIM_BB_Technology $1;
Length DIM_TER_NW__FLAG_Y_N_ $1;
Length DIM_USG_TIME__Flag_Y_N_ $1;
Length DIM_BNDL_Flag__PAYG_Flag_Y_N_ $1;
Length DIM_REV_CAT $1;
Length DIM_ROAM_CNTY $1;
Length DIM_DATA_BUNDLE $1;
Length Measures $26;
Length Conditions $17;
input
File_Name
Worksheet_Name
Start_Position
End_Position
Structure
File_Process_Flag
KPI_Process_Flag
Fact_Name
F_DIM_Category
F_DIM_Product_Group
F_DIM_Product
F_DIM_Subject_Area
DIM_Month
DIM_RatePlan_Type
DIM_Cust_Seg
DIM_Traffic_Type
DIM_BB_Spd
DIM_BB_Technology
DIM_TER_NW__FLAG_Y_N_
DIM_USG_TIME__Flag_Y_N_
DIM_BNDL_Flag__PAYG_Flag_Y_N_
DIM_REV_CAT
DIM_ROAM_CNTY
DIM_DATA_BUNDLE
Measures
Conditions
;
Run;
Once a character variable is defined with a certain length, it will always be padded up to that length with blanks. A missing character variable contains only blanks.
So if your character column has the correct length, everything's OK.
What if data looks like:
ABC
DEF
XYZ (two trailing spaces)
How to handle this?
See this code as an example:
data test;
infile datalines dlm="," dsd;
input number string:$char12.;
test_of_string = put(string,$hex24.);
datalines4;
1,ABC
2, DEF
3,XYZ
;;;;
proc print data=test noobs;
run;
Result:
number string test_of_string 1 ABC 414243202020202020202020 2 DEF 202044454620202020202020 3 XYZ 58595A202020202020202020
As you can see, all values are padded with blanks to the defined length, and the use of the $CHARw. informat preserves leading blanks. With the standard $w. informat, leading blanks would be discarded.
Thanks man, but if you add extra space after row#3, you'll get my point exactly. spaces are trimmed, and I need to preserve them!
data test;
infile datalines dlm="," dsd;
input number string:$char12.;
test_of_string = put(string,$hex24.);
datalines4;
1,ABC
2, ABC
3,ABC
;;;;
proc print data=test noobs;
%PUT
Look at the test variable. Hex 20's ARE blanks.
Thanks buddy anyways, i got stuck and I think I should handle it out of SAS, on the data level.
@AhmedAccordBGro wrote:
Thanks man, but if you add extra space after row#3, you'll get my point exactly. spaces are trimmed, and I need to preserve them!
data test;
infile datalines dlm="," dsd;
input number string:$char12.;
test_of_string = put(string,$hex24.);
datalines4;
1,ABC
2, ABC
3,ABC
;;;;proc print data=test noobs;
%PUT
PS what is that %PUT doing in there?
yes I need to preserve the blank space,
I even executed below query, and found the space is trimmed!
Proc SQL;
Select Length(string) From test;
Quit;
Results in 3, 4, 3
Read the documentation of the length() function:
Returns the length of a non-blank character string, excluding trailing blanks, and returns 1 for a blank character string.
(emphasis by me)
Hi @AhmedAccordBGro,
I think I see your point: You need to preserve the blanks contained in your CSV file because they carry valuable information, e.g., 'XYZ', ' XYZ', 'XYZ ' and 'XYZ ' must later be recognized as different character values in the SAS dataset. Normally this would fail because of the padding (trailing) blanks which are contained in every character variable of sufficient length: You couldn't distinguish between these blanks and the original blanks from the CSV file as they would look exactly the same (characters with ASCII code [decimal] 32, i.e. hex 20).
For a possible solution you could replace the original blanks by a suitable replacement character which does not occur in the CSV file, perhaps the "non-breakable space" with ASCII code 160 (hex A0), while reading the CSV file.
Here's an example:
/* Create a text file for demonstration */
filename test temp;
data _null_;
file test;
put 'ABC' / ' DEF' / 'XYZ ' / '2 words';
run;
/* Read the text file preserving the blanks contained
(as protected blanks, hex A0) */
data want;
infile test;
input @;
_infile_=translate(_infile_,'A0'x,' ');
input c :$6.;
run;
/* Show the result */
proc print data=want;
format c $hex12.;
run;
Result:
Obs c 1 414243202020 2 A04445462020 3 58595AA0A020 4 32A0776F7264
As you can see, leading blanks (obs. 2), trailing blanks (obs. 3) and embedded blanks (obs. 4) can still be recognized in character variable C (by their hex code A0) and distinguished from those trailing blanks (hex 20) which SAS introduced as usual (such as the last blank in obs. 3 and the trailing blanks in obs. 1 and 2).
This approach should work for a comma-delimited text file as well.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.