BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DGBK
Obsidian | Level 7

I have a complex database that I am building for work. It requires me to join several data sources that come in either .csv or .txt files. The .txt files are nicely formatted with .sas files that help you pull them into tables. The .csv files are poorly formatted and require some pre-formatting with excel to remove extraneous lines.

 

There is a common unique variable "a combination of text, numbers, and symbols" that each data set shares. It would look something like 'B3245_010_2' without the quotes. I am using this to perform all of the joins for my main table.

 

Every file that comes from a .txt file joins appropriately. The data that comes from the .csv file acts as if there are no matching variables when I attempt to left join in SAS EG, and I get null results for each of those values.

 

I have confirmed that the variables are of the same data type and the same length. Furthermore, when I run the exact same code using PC SAS, the correct values show up in my results.

 

I've been tearing my hair out over this, and I am curious if this is a known issue. I am working on a Windows 10 computer. My SAS EG version is 7.15 (64-bit), and my PC SAS version is 9.4 (64-bit). I do PC File Server installed on my computer, and it does not run on the server to my knowledge. This is the only thing I can think of, but I wouldn't expect this to affect csv files. Also, the data looks exactly the same. 

 

I attempted to run the datastep macro to get real data, but the connection was refused. This is probably due to IT restrictions. An example of a simple join is shown below. In my example, Total_Cost would show up as all blank despite the fact that Total_Cost and Max_Payment each have IDs that match.

 

If I were to run this as a Right join keeping everything else the same, Total_Cost would shop up, but the other columns would be blank.

 

proc sql;
Create Table Test as
Select Distinct main.ID, LS.Total_Cost, main.Max_Payment
	

From Main

left join LS
	on Main.ID=LS.ID

	Order by ID
	;
	quit;

 

1 ACCEPTED SOLUTION

Accepted Solutions
DGBK
Obsidian | Level 7

I finally solved the root cause for this issue when it manifested again.

 

It turns out that when importing files into SAS EG, I needed to check "Fix line-ending characters for text files (between Windows and UNIX)". This is the very last check box in the Copy Files task. 

 

Hopefully, this helps someone in the future.

View solution in original post

11 REPLIES 11
Patrick
Opal | Level 21

SAS EG is just the client. The SAS code gets executed on the server and it's there where something must be different.

The code you've posted is so simple that it should return the same result for "any" SAS version - so something must be different in your data.

 

If using PC SAS: Are you executing on the same server via rsubmit or do you have a local SAS installed. If it's a local install: Do you see server Local if using EG and have you tried to execute the code there (=using the same compute server for both PC SAS and SAS EG).

 

"I do PC File Server installed on my computer, and it does not run on the server to my knowledge."

So what do you read into SAS? .csv or .xlsx? And how do you read these files? Are you using Proc Import? If so: Is guessingrows the same on your local PC and on the server? 

 

Is your SAS server (the one you access with EG) also Windows or is it Unix/Linux? And if so: How do you transfer your source files to the server?

 

 

Reeza
Super User
How did you import the CSV files? Did you use PROC IMPORT or write the data step? What are the exact versions of SAS on each computer?

Check the versions using PROC PRODUCT_STATUS and check in the log.
DGBK
Obsidian | Level 7

I use proc import with a data step. I have pasted an example below.

 

SAS EG

For Base SAS Software ...
Custom version information: 9.3_M2

 

PC SAS

For Base SAS Software ...
Custom version information: 9.4_M5

 

proc import 
	
/*Update*/	datafile = "C:\filename.csv"
	out = work.LS(rename=(
		var1=ST
		var2=Cnty
		Var3=Org
		Var4=Pln
		Var5=Pln_Typ
		Var6=Prem
		Var7=Yrly
		Var8=Drug
		Var9=Add
		Var10=Ben
		Var11=Ctr
		Var12=Pln_ID
		Var13=Seg_ID
		Var14=MP
		Var15=Rtg
		Var16=ID))
	dbms = csv replace;
	getnames=no;
	datarow=2;
run;
 data WORK.LS 
		(rename=(
var1=ST
var2=Cnty
Var3=Org
Var4=Pln
Var5=Pln_Typ
Var6=Prem
Var7=Yrly Var8=Drug
Var9=Add
Var10=Ben
Var11=Ctr
Var12=Pln_ID
Var13=Seg_ID
Var14=MP
Var15=Rtg
Var16=ID)); %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ /*Update*/ infile "C:\filename.csv" delimiter = ',' MISSOVER DSD lrecl=13106 firstobs=2; informat VAR1 $30. ; informat VAR2 $30. ; informat VAR3 $60. ; informat VAR4 $80. ; informat VAR5 $20. ; informat VAR6 dollar10.2 ; informat VAR7 dollar10.2 ; informat VAR8 $15. ; informat VAR9 $5. ; informat VAR10 $10. ; informat VAR11 $5. ; informat VAR12 $3. ; informat VAR13 $3. ; informat VAR14 dollar10. ; informat VAR15 $40. ; informat VAR16 $13. ; format VAR1 $30. ; format VAR2 $30. ; format VAR3 $60. ; format VAR4 $80. ; format VAR5 $20. ; format VAR6 dollar10.2 ; format VAR7 dollar10.2 ; format VAR8 $15. ; format VAR9 $5. ; format VAR10 $10. ; format VAR11 $5. ; format VAR12 $3. ; format VAR13 $3. ; format VAR14 dollar10. ; format VAR15 $40. ; format VAR16 $13. ; input VAR1 $ VAR2 $ VAR3 $ VAR4 $ VAR5 $ VAR6 $ VAR7 $ VAR8 $ VAR9 $ VAR10 $ VAR11 $ VAR12 $ VAR13 $ VAR14 $ VAR15 $ VAR16 $ ; if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ run;
Kurt_Bremser
Super User

Run this data step:

data WORK.LS;
infile "C:\filename.csv"
  delimiter = ','
  MISSOVER
  DSD
  lrecl=13106
  firstobs=2
;
informat
  ST $30.
  Cnty $30.
  Org $60.
  Pln $80.
  Pln_Typ $20.
  Prem dollar10.2
  Yrly dollar10.2
  Drug $15.
  Add $5.
  Ben $10.
  Ctr $5.
  Pln_ID $3.
  Seg_ID $3.
  MP dollar10.
  Rtg $40.
  ID $13.
;
format
  Prem dollar10.2
  Yrly dollar10.2 ;
  MP dollar10.
;
input
  ST
  Cnty
  Org
  Pln
  Pln_Typ
  Prem
  Yrly
  Drug
  Add
  Ben
  Ctr
  Pln_ID
  Seg_ID
  MP
  Rtg
  ID
;
run;

in both environments (replace the infile name accordingly), and compare the logs, as those should be identical.

By comparing what you posted with what I created from it, you get how you need to adapt other import steps you may be running. Once you have made sure that the imports are consistent, the join should also give you an identical result.

Kurt_Bremser
Super User

"require some pre-formatting with excel to remove extraneous lines"

That's most probably where your problems start. csv files are text files, and to properly manipulate them, use a good text editor like notepad++. NEVER use Excel for that.

You also might introduce changes in the data by the way EG uploads the files to SAS, or you uploaded the files to the SAS server.

If you only need to remove certain lines, this can be done in the data step.

And do not use proc import if you expect consistent results.

DGBK
Obsidian | Level 7

I will try making my edits with Notepad++ or a data step to see if I can get them to import correctly. Manipulating it with excel seemed like a very simple solution to the problem, but I can see that it might not have been the most appropriate.

DGBK
Obsidian | Level 7

Thank you for the suggestion.

 

I tried running an import without making any changes to the csv file. I simply opened the file with Notepad++ to make sure I had the correct starting line. In this case it was line 12.

 

I then updated my data step to start reading on line 12 and create the ID column. I originally created this column by concatenating 3 other columns in the file and then value pasting them into the ID column. 

 

As a result of these changes, my data now joins properly. I am still not sure why the data can look exactly the same in the two files without being the same, but it all works now. 

 

In case anyone is interested, my new import looks like below.

proc import 
	
/*Update*/	datafile = "C:\filename.csv"
	out = work.LS(rename=(
		var1=ST
		var2=Cnty
		Var3=Org
		Var4=Pln
		Var5=Pln_Typ
		Var6=Prem
		Var7=Yrly
		Var8=Drug
		Var9=Add
		Var10=Ben
		Var11=Ctr
		Var12=Pln_ID
		Var13=Seg_ID
		Var14=MP
		Var15=Rtg
		/*Var16=ID*/))
	dbms = csv replace;
	getnames=no;
	datarow=12;
run;
 data WORK.LS 
		(rename=(                 var1=ST                  var2=Cnty                  Var3=Org                  Var4=Pln                  Var5=Pln_Typ                  Var6=Prem                  Var7=Yrly
		 Var8=Drug                  Var9=Add                  Var10=Ben                  Var11=Ctr                  Var12=Pln_ID                  Var13=Seg_ID                  Var14=MP                  Var15=Rtg                  /*Var16=ID*/));
	%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
/*Update*/    infile "C:\filename.csv"
		delimiter = ',' MISSOVER DSD lrecl=13106 firstobs=12;
		informat VAR1 $30. ;
		informat VAR2 $30. ;
		informat VAR3 $60. ;
		informat VAR4 $80. ;
		informat VAR5 $20. ;
		informat VAR6 dollar10.2 ;
		informat VAR7 dollar10.2 ;
		informat VAR8 $15. ;
		informat VAR9 $5. ;
		informat VAR10 $10. ;
		informat VAR11 $5. ;
		informat VAR12 $3. ;
		informat VAR13 $3. ;
		informat VAR14 dollar10. ;
		informat VAR15 $40. ;
		/*informat VAR16 $13. ;*/
		format VAR1 $30. ;
		format VAR2 $30. ;
		format VAR3 $60. ;
		format VAR4 $80. ;
		format VAR5 $20. ;
		format VAR6 dollar10.2 ;
		format VAR7 dollar10.2 ;
        format VAR8 $15. ;
        format VAR9 $5. ;
        format VAR10 $10. ;
        format VAR11 $5. ;
        format VAR12 $3. ;
        format VAR13 $3. ;
        format VAR14 dollar10. ;
        format VAR15 $40. ;
        /*format VAR16 $3. ;*/
     input
                 VAR1  $
                 VAR2  $
                 VAR3  $
                 VAR4  $
                 VAR5  $
                 VAR6  $
                 VAR7  $
                 VAR8  $
                 VAR9  $
                 VAR10  $
                 VAR11  $
                 VAR12  $
                 VAR13  $
                 VAR14  $
                 VAR15  $
                 /*VAR16 $*/
;
if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
ID = catx('_',Var11,Var12,Var13);
run;

 

TomKari
Onyx | Level 15

If you're dealing with "gnarly" text files, a really good SAS facility to know about is the _infile_ automatic variable. When you input data, it gets set to the contents of a line. If you want to see what your input data looks like to SAS, run the following:

 

Tom

 

data RawText;
	length InputRecord $32767;
	infile "C:\data.txt" lrecl=32767;
	input;
	InputRecord = _infile_;
run;
CaseySmith
SAS Employee

The difference could be whitespace characters.  For example, tabs in the DATALINES statement of a DATA step can cause problems (from doc, "Note: If you insert tabs while entering data in the DATALINES statement, you might get unexpected results when using columnar input.").  When I suspect that, I usually turn on "Show All Characters" (toggle button in main toolbar) in Notepad++, to see exact characters used for whitespace (tabs, spaces, line endings).

 

Casey


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

DGBK
Obsidian | Level 7

I finally solved the root cause for this issue when it manifested again.

 

It turns out that when importing files into SAS EG, I needed to check "Fix line-ending characters for text files (between Windows and UNIX)". This is the very last check box in the Copy Files task. 

 

Hopefully, this helps someone in the future.

Kurt_Bremser
Super User

@DGBK wrote:

I finally solved the route cause for this issue when it manifested again.

 

It turns out that when importing files into SAS EG, I needed to check "Fix line-ending characters for text files (between Windows and UNIX)". This is the very last check box in the Copy Files task. 

 

Hopefully, this helps someone in the future.


This can also be dealt with in the data step by using the termstr= option in the infile statement.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1615 views
  • 3 likes
  • 6 in conversation