I have a complex database that I am building for work. It requires me to join several data sources that come in either .csv or .txt files. The .txt files are nicely formatted with .sas files that help you pull them into tables. The .csv files are poorly formatted and require some pre-formatting with excel to remove extraneous lines.
There is a common unique variable "a combination of text, numbers, and symbols" that each data set shares. It would look something like 'B3245_010_2' without the quotes. I am using this to perform all of the joins for my main table.
Every file that comes from a .txt file joins appropriately. The data that comes from the .csv file acts as if there are no matching variables when I attempt to left join in SAS EG, and I get null results for each of those values.
I have confirmed that the variables are of the same data type and the same length. Furthermore, when I run the exact same code using PC SAS, the correct values show up in my results.
I've been tearing my hair out over this, and I am curious if this is a known issue. I am working on a Windows 10 computer. My SAS EG version is 7.15 (64-bit), and my PC SAS version is 9.4 (64-bit). I do PC File Server installed on my computer, and it does not run on the server to my knowledge. This is the only thing I can think of, but I wouldn't expect this to affect csv files. Also, the data looks exactly the same.
I attempted to run the datastep macro to get real data, but the connection was refused. This is probably due to IT restrictions. An example of a simple join is shown below. In my example, Total_Cost would show up as all blank despite the fact that Total_Cost and Max_Payment each have IDs that match.
If I were to run this as a Right join keeping everything else the same, Total_Cost would shop up, but the other columns would be blank.
proc sql;
Create Table Test as
Select Distinct main.ID, LS.Total_Cost, main.Max_Payment
From Main
left join LS
on Main.ID=LS.ID
Order by ID
;
quit;
I finally solved the root cause for this issue when it manifested again.
It turns out that when importing files into SAS EG, I needed to check "Fix line-ending characters for text files (between Windows and UNIX)". This is the very last check box in the Copy Files task.
Hopefully, this helps someone in the future.
SAS EG is just the client. The SAS code gets executed on the server and it's there where something must be different.
The code you've posted is so simple that it should return the same result for "any" SAS version - so something must be different in your data.
If using PC SAS: Are you executing on the same server via rsubmit or do you have a local SAS installed. If it's a local install: Do you see server Local if using EG and have you tried to execute the code there (=using the same compute server for both PC SAS and SAS EG).
"I do PC File Server installed on my computer, and it does not run on the server to my knowledge."
So what do you read into SAS? .csv or .xlsx? And how do you read these files? Are you using Proc Import? If so: Is guessingrows the same on your local PC and on the server?
Is your SAS server (the one you access with EG) also Windows or is it Unix/Linux? And if so: How do you transfer your source files to the server?
I use proc import with a data step. I have pasted an example below.
SAS EG
For Base SAS Software ...
Custom version information: 9.3_M2
PC SAS
For Base SAS Software ...
Custom version information: 9.4_M5
proc import
/*Update*/ datafile = "C:\filename.csv"
out = work.LS(rename=(
var1=ST
var2=Cnty
Var3=Org
Var4=Pln
Var5=Pln_Typ
Var6=Prem
Var7=Yrly
Var8=Drug
Var9=Add
Var10=Ben
Var11=Ctr
Var12=Pln_ID
Var13=Seg_ID
Var14=MP
Var15=Rtg
Var16=ID))
dbms = csv replace;
getnames=no;
datarow=2;
run;
data WORK.LS
(rename=(
var1=ST
var2=Cnty
Var3=Org
Var4=Pln
Var5=Pln_Typ
Var6=Prem
Var7=Yrly
Var8=Drug
Var9=Add
Var10=Ben
Var11=Ctr
Var12=Pln_ID
Var13=Seg_ID
Var14=MP
Var15=Rtg
Var16=ID));
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
/*Update*/ infile "C:\filename.csv"
delimiter = ',' MISSOVER DSD lrecl=13106 firstobs=2;
informat VAR1 $30. ;
informat VAR2 $30. ;
informat VAR3 $60. ;
informat VAR4 $80. ;
informat VAR5 $20. ;
informat VAR6 dollar10.2 ;
informat VAR7 dollar10.2 ;
informat VAR8 $15. ;
informat VAR9 $5. ;
informat VAR10 $10. ;
informat VAR11 $5. ;
informat VAR12 $3. ;
informat VAR13 $3. ;
informat VAR14 dollar10. ;
informat VAR15 $40. ;
informat VAR16 $13. ;
format VAR1 $30. ;
format VAR2 $30. ;
format VAR3 $60. ;
format VAR4 $80. ;
format VAR5 $20. ;
format VAR6 dollar10.2 ;
format VAR7 dollar10.2 ;
format VAR8 $15. ;
format VAR9 $5. ;
format VAR10 $10. ;
format VAR11 $5. ;
format VAR12 $3. ;
format VAR13 $3. ;
format VAR14 dollar10. ;
format VAR15 $40. ;
format VAR16 $13. ;
input
VAR1 $
VAR2 $
VAR3 $
VAR4 $
VAR5 $
VAR6 $
VAR7 $
VAR8 $
VAR9 $
VAR10 $
VAR11 $
VAR12 $
VAR13 $
VAR14 $
VAR15 $
VAR16 $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Run this data step:
data WORK.LS;
infile "C:\filename.csv"
delimiter = ','
MISSOVER
DSD
lrecl=13106
firstobs=2
;
informat
ST $30.
Cnty $30.
Org $60.
Pln $80.
Pln_Typ $20.
Prem dollar10.2
Yrly dollar10.2
Drug $15.
Add $5.
Ben $10.
Ctr $5.
Pln_ID $3.
Seg_ID $3.
MP dollar10.
Rtg $40.
ID $13.
;
format
Prem dollar10.2
Yrly dollar10.2 ;
MP dollar10.
;
input
ST
Cnty
Org
Pln
Pln_Typ
Prem
Yrly
Drug
Add
Ben
Ctr
Pln_ID
Seg_ID
MP
Rtg
ID
;
run;
in both environments (replace the infile name accordingly), and compare the logs, as those should be identical.
By comparing what you posted with what I created from it, you get how you need to adapt other import steps you may be running. Once you have made sure that the imports are consistent, the join should also give you an identical result.
"require some pre-formatting with excel to remove extraneous lines"
That's most probably where your problems start. csv files are text files, and to properly manipulate them, use a good text editor like notepad++. NEVER use Excel for that.
You also might introduce changes in the data by the way EG uploads the files to SAS, or you uploaded the files to the SAS server.
If you only need to remove certain lines, this can be done in the data step.
And do not use proc import if you expect consistent results.
I will try making my edits with Notepad++ or a data step to see if I can get them to import correctly. Manipulating it with excel seemed like a very simple solution to the problem, but I can see that it might not have been the most appropriate.
Thank you for the suggestion.
I tried running an import without making any changes to the csv file. I simply opened the file with Notepad++ to make sure I had the correct starting line. In this case it was line 12.
I then updated my data step to start reading on line 12 and create the ID column. I originally created this column by concatenating 3 other columns in the file and then value pasting them into the ID column.
As a result of these changes, my data now joins properly. I am still not sure why the data can look exactly the same in the two files without being the same, but it all works now.
In case anyone is interested, my new import looks like below.
proc import
/*Update*/ datafile = "C:\filename.csv"
out = work.LS(rename=(
var1=ST
var2=Cnty
Var3=Org
Var4=Pln
Var5=Pln_Typ
Var6=Prem
Var7=Yrly
Var8=Drug
Var9=Add
Var10=Ben
Var11=Ctr
Var12=Pln_ID
Var13=Seg_ID
Var14=MP
Var15=Rtg
/*Var16=ID*/))
dbms = csv replace;
getnames=no;
datarow=12;
run;
data WORK.LS
(rename=( var1=ST var2=Cnty Var3=Org Var4=Pln Var5=Pln_Typ Var6=Prem Var7=Yrly
Var8=Drug Var9=Add Var10=Ben Var11=Ctr Var12=Pln_ID Var13=Seg_ID Var14=MP Var15=Rtg /*Var16=ID*/));
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
/*Update*/ infile "C:\filename.csv"
delimiter = ',' MISSOVER DSD lrecl=13106 firstobs=12;
informat VAR1 $30. ;
informat VAR2 $30. ;
informat VAR3 $60. ;
informat VAR4 $80. ;
informat VAR5 $20. ;
informat VAR6 dollar10.2 ;
informat VAR7 dollar10.2 ;
informat VAR8 $15. ;
informat VAR9 $5. ;
informat VAR10 $10. ;
informat VAR11 $5. ;
informat VAR12 $3. ;
informat VAR13 $3. ;
informat VAR14 dollar10. ;
informat VAR15 $40. ;
/*informat VAR16 $13. ;*/
format VAR1 $30. ;
format VAR2 $30. ;
format VAR3 $60. ;
format VAR4 $80. ;
format VAR5 $20. ;
format VAR6 dollar10.2 ;
format VAR7 dollar10.2 ;
format VAR8 $15. ;
format VAR9 $5. ;
format VAR10 $10. ;
format VAR11 $5. ;
format VAR12 $3. ;
format VAR13 $3. ;
format VAR14 dollar10. ;
format VAR15 $40. ;
/*format VAR16 $3. ;*/
input
VAR1 $
VAR2 $
VAR3 $
VAR4 $
VAR5 $
VAR6 $
VAR7 $
VAR8 $
VAR9 $
VAR10 $
VAR11 $
VAR12 $
VAR13 $
VAR14 $
VAR15 $
/*VAR16 $*/
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
ID = catx('_',Var11,Var12,Var13);
run;
If you're dealing with "gnarly" text files, a really good SAS facility to know about is the _infile_ automatic variable. When you input data, it gets set to the contents of a line. If you want to see what your input data looks like to SAS, run the following:
Tom
data RawText;
length InputRecord $32767;
infile "C:\data.txt" lrecl=32767;
input;
InputRecord = _infile_;
run;
The difference could be whitespace characters. For example, tabs in the DATALINES statement of a DATA step can cause problems (from doc, "Note: If you insert tabs while entering data in the DATALINES statement, you might get unexpected results when using columnar input."). When I suspect that, I usually turn on "Show All Characters" (toggle button in main toolbar) in Notepad++, to see exact characters used for whitespace (tabs, spaces, line endings).
Casey
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
I finally solved the root cause for this issue when it manifested again.
It turns out that when importing files into SAS EG, I needed to check "Fix line-ending characters for text files (between Windows and UNIX)". This is the very last check box in the Copy Files task.
Hopefully, this helps someone in the future.
@DGBK wrote:
I finally solved the route cause for this issue when it manifested again.
It turns out that when importing files into SAS EG, I needed to check "Fix line-ending characters for text files (between Windows and UNIX)". This is the very last check box in the Copy Files task.
Hopefully, this helps someone in the future.
This can also be dealt with in the data step by using the termstr= option in the infile statement.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.