BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DavidLie
Obsidian | Level 7

Hi SAS Community,

 

I have a dataset (extension .txt) that look something like this:

 

SR Number|External System Reference No|Source|Product Category|Commercial Offer|Product Type|VAS| ... (76 variables in total)

1-6530390723||Phone|Internet|Streamyx|Streamyx Basic / Soho|| ...

1-6530396195||Phone|Internet|Streamyx|Streamyx Basic / Soho|| ...

 

From above, you can see that the delimiter is "|" and when there is a missing data, it will just show as "||". In the above example, the variable "External System Reference No" and "VAS" are missing for both first and second observation.

 

I'm currently use the following SAS Syntax to import the data:

proc import datafile="&input\Raw Data\TMIDRRPT_MONTHLY_&date.-2010.txt" out=data.&date.2010 dbms=dlm replace;
delimiter='|';
getnames=yes;
run;

Problems with above syntax:

- Some variables are not imported properly (at times, I notice some data were incorrectly placed to other variables).

- SAS often mixed up with the dates - initial format supposed to be DDMMYYYY and SAS automatically transformed this to MMDDYYYY. For example, 1st of February was transformed by SAS to 2nd of January.

 

If anyone can suggest better way of importing this data, that would be awesome. Thanks in advance and stay safe.

 

Best,
David

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You will get better guessing from PROC IMPORT if you tell it to check all of the observations before deciding how to guess the types of the columns.  Also it will attempt to convert the column headers into valid variable names if you make sure you don't have VALIDVARNAME option set to ANY.

options validvarname=v7;
proc import dbms=dlm
 datafile="&input\Raw Data\TMIDRRPT_MONTHLY_&date.-2010.txt" 
 out=data.&date.2010 replace
;
  delimiter='|';
  getnames=yes;
  guessingrows=max;
run;

But to really control how the file is read just write your own data step.

If you are really seeing what looks like misalignment of the values then perhaps the file is poorly formed.  You can do your own checking if you just read the data in as character and evaluate it yourself.

data tall ;
   infile "&input\Raw Data\TMIDRRPT_MONTHLY_&date.-2010.txt" 
    dsd dlm='|' firstobs=2 truncover 
  ;
  row+1;
  do col=1 to 76;
     input value :$200. @;
     len = lengthn(value);
     output;
  end;
run;
proc means data=tall nway ;
   class col;
    var len ;
run;

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

From above, you can see that the delimiter is "|" and when there is a missing data, it will just show as "||". In the above example,
Some variables are not imported properly (at times, I notice some data were incorrectly placed to other variables).

Use the data step and the INFILE statement, with the DSD option.

 

- SAS often mixed up with the dates - initial format supposed to be DDMMYYYY and SAS automatically transformed this to MMDDYYYY.

Proc import uses the LOCALE option to determine this. Again, the easiest is to control the informats in the INFILE statement. 

 

ballardw
Super User

I often helps when posting text data to post the Text into a code box opened on the forum with the </> icon. The code box will not reformat text while the main window will.

Paste entire lines of the data.

 

We cannot tell that:

- Some variables are not imported properly (at times, I notice some data were incorrectly placed to other variables).

- SAS often mixed up with the dates - initial format supposed to be DDMMYYYY and SAS automatically transformed this to MMDDYYYY.

 

Suggestion 1: Add the GETROWS=MAX option to examine more rows of the data before variable types and formats are applied by Proc Import. That might also fix the data issue. By default SAS uses the first 20 rows of data to determine the variable types. If the dates are all such that the month and day are less then than 12 in those rows then your national language setting will set the priority for the date formats that can be confused with the other.

 

Another approach if that doesn't fix issues is that the Proc import creates data step code to read text files. You can copy that from the LOG , paste into the editor and clean up any things like log line numbers. Then you can specify the informat and format for the variables to match your data.

This is also the fix to values that are created as numeric, such as account numbers, Street numbers and such that really should not be numeric to character, or the values that should be numeric (you intend to do arithmetic with them) that get imported as character because of the presence of text values like NULL, N/A, UNKNOWN or similar.

Tom
Super User Tom
Super User

You will get better guessing from PROC IMPORT if you tell it to check all of the observations before deciding how to guess the types of the columns.  Also it will attempt to convert the column headers into valid variable names if you make sure you don't have VALIDVARNAME option set to ANY.

options validvarname=v7;
proc import dbms=dlm
 datafile="&input\Raw Data\TMIDRRPT_MONTHLY_&date.-2010.txt" 
 out=data.&date.2010 replace
;
  delimiter='|';
  getnames=yes;
  guessingrows=max;
run;

But to really control how the file is read just write your own data step.

If you are really seeing what looks like misalignment of the values then perhaps the file is poorly formed.  You can do your own checking if you just read the data in as character and evaluate it yourself.

data tall ;
   infile "&input\Raw Data\TMIDRRPT_MONTHLY_&date.-2010.txt" 
    dsd dlm='|' firstobs=2 truncover 
  ;
  row+1;
  do col=1 to 76;
     input value :$200. @;
     len = lengthn(value);
     output;
  end;
run;
proc means data=tall nway ;
   class col;
    var len ;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 810 views
  • 0 likes
  • 5 in conversation