BookmarkSubscribeRSS Feed
jc3992
Pyrite | Level 9
data GoogleAD;
infile '/folders/myfolders/preprocessedgooglestatstab.txt' firstobs=2 DLM='09'x dsd missover;
input Day_of_week $ 7. +1 Date $ 7. Impressions 4. +1 Clicks 2. @88 Revenue $ 5.2 ;
run;
proc print data=GoogleAD;
run;

Hello everyone,

I have put the raw data as attached.

Any suggestions to my code is much appreciated.

Thanks!

8 REPLIES 8
Astounding
PROC Star

First, decide whether you want REVENUE to be a character variable or a numeric variable.  Then adjust your INPUT statement accordingly.  If it's to be character:

 

@88 revenue $5.

 

If it's to be numeric:

 

@88 revenue 5.

 

While 5.2 would be a valid informat for a numeric variable (not for a character variable), it is likely to be the wrong thing to do.

 

Finally, notice that you are using a tab as a delimiter (dlm='09'x).  In that case, it is very possible that your starting position of @88 is wrong.  You might want to switch to another form of INPUT that doesn't require a starting column.

ballardw
Super User

Try something like this:

data GoogleAD;
   infile '/folders/myfolders/preprocessedgooglestatstab.txt' firstobs=2 DLM='09'x dsd  missover;
   informat Day_of_week $10.  Date date9. Impressions 4.  Clicks 2. Revenue comma10.2 ;
   format date date9. revenue dollar10.2;
   input Day_of_week  Date Impressions  Clicks Revenue ;
run;

 

When you use informats on an input statement they force reading exactly the stated number of columns in the format. So you can end up reading the delimiter when the value is shorter than expected.

 

So having informat to describe the variables, which can be longer than the expected values but will stop reading at a delimiter when encountered, works better. Note the increase in your day_of_week variable length so Wednesday doesn't get truncated, the COMMA format will read things like currency better and doesn't mind the currency indicator and assign the DOLLAR format to show the values as currency. I also show reading the date into a SAS date value with associated format. You can do a lot of things with SAS date values that strings won't let you do.

jc3992
Pyrite | Level 9
Thank you so much! But I have a question, how do you know how many space for the variables? I can only be sure that the first one(Day of week) needs 10 columns, but I cannot be sure of others. Thank you!
Tom
Super User Tom
Super User

Only two of those fields require an INFORMAT to be read properly. And only one needs a format to print in a way that humans can read.

data GoogleAD;
  infile '/folders/myfolders/preprocessedgooglestatstab.txt' 
     firstobs=2 DLM='09'x dsd truncover
  ;
  length Day_of_week $7  Date 8 Impressions 8 Clicks 8 Revenue 8 ;
  input day_of_week -- revenue ;
  informat date date. revenue comma. ;
  format date date9.;
run;

If you want to read a delimited file then make sure to use list mode INPUT statement.  If you do add informats in-line with your INPUT statement then make sure to prefix them with a colon so that SAS will still use list mode input instead of formatted input.

 

Do NOT specify a decimal place value on an informat unless you know what it means and are positive that your source text has removed the periods that indicate the location of the decimal place.  

jc3992
Pyrite | Level 9
 
data GoogleAD;
infile '/folders/myfolders/preprocessedgooglestatstab.txt' firstobs=2 DLM='09'x dsd termstr=CRLF ;

input @'Day_of_week:' Day_of_week $ @'Date:' Date: ANYDTDTE9. Impressions 5. Clicks 2. @'Revenue:' Revenue COMMA6. ; 
run;
proc print data=GoogleAD;
title 'Google_AD';
run;

What if I use this one?

I used a column modifier. Where did I go wrong ?

Tom
Super User Tom
Super User

If you don't know how long to make a character variable make it longer than you think you will ever need and then see what types of values you get.  If it is way too long you can adjust and re-read the file.  SAS stores character strings as fixed length so too long a value might have space considerations, but most of it is relieved by using the compression option on the dataset.

 

I find that it helps to put the format modifiers next the formats/informat instead of next to the variable name.  This is more important now that SAS also uses the colon as a terminal wild card when specifying variable lists.  Your original input statement was looking for exactly 7 characters following the value for the DATE that follows the keyword 'Date:' in the input stream. Is that really what you have or are there spaces between the values?  If the values of IMPRESSIONS and CLICKS are numbers then there is no need to tell SAS how many digits to read.  SAS already knows how to read numbers. 

 

 

input
  @'Day_of_week:' Day_of_week :$50.
  @'Date:' Date :ANYDTDTE9. Impressions Clicks 
  @'Revenue:' Revenue :COMMA6. 
; 

 

For Revenue you could have a couple of problems depending on the your actual data.   Your current INFILE does not have MISSOVER or TRUNCOVER option. So if the value is only 5 instead of 6 characters then SAS will go to the next line to find that 6th character.  If you used MISSOVER then you would get a missing value if the value is less than 6 characters long. That is why you should always the newer (less than 30 years old) TRUNCOVER option instead of the older less useful MISSOVER option.

 

If you add the : colon prefix you might also have a problem if the value has a space between the $ and the value.  That will make SAS read just the $ and miss the actual value.  You could try also adding the & option so that will look for two spaces.  But that might not work. 

 

If you add the TRUNCOVER option to the INFILE statement you might have trouble using the @ 'string' pointer motion control.

 

If you get really desperate you might resort to modifying the _INFILE_ buffer to remove the $ before trying to read the value.  

input @ ;
_infile_=translate(_infile_,' ','$');
input
  @'Day_of_week:' Day_of_week :$50.
  @'Date:' Date :ANYDTDTE9. Impressions Clicks 
  @'Revenue:' Revenue :COMMA6. 
; 
jc3992
Pyrite | Level 9
data GoogleAD;
infile '/folders/myfolders/preprocessedgooglestatstab.txt' firstobs=2 DLM='09'x dsd termstr=CRLF ;

input @'Day_of_week:' Day_of_week $ @'Date:' Date: ANYDTDTE9. Impressions 5. Clicks 2. @'Revenue:' Revenue COMMA6. ; 
run;
proc print data=GoogleAD;
title 'Google_AD';
run;

This is my final attempt. I still could not figure out how to let "Revenue" present in an appropriate command, and I found there are too many ways people can use for one data been run by SAS. I hope I can solve it myself with my code but after one night I still could not make it LOL

ballardw
Super User

Since your lines of data do not contain the values 'Day_of_week:' or any of the other values you put in quotes in the input statement it find s nothing.

 

input @'Day_of_week:' Day_of_week $ @'Date:' Date: ANYDTDTE9. Impressions 5. Clicks 2. @'Revenue:' Revenue COMMA6.

 

input @'Day_of_week:'  tells SAS to look at the input data file line and read the value following that text. The posted example data file does not have that string in it anywhere. Your column headings do not have the : so not found there.

 

If the file you posted is NOT like your input file then don't bother posting it.

 

DID you try the code I posted in my previous post? It read the example file you posted just fine.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1391 views
  • 4 likes
  • 4 in conversation