BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
V168
Calcite | Level 5

Hi there:

I am new sas user. I am doing a project.

Objective:

The attached data is the CRM data of a wireless company for 2 years. The wireless

company would like to investigate the customer distribution and business behaviors, and

then gain insightful understanding about the customers, and to forecast the deactivation

trends for the next 6 months.

Data:

Acctno: account number.

Actdt: account activation date

Deactdt: account deactivation date

DeactReason: reason for deactivation.

GoodCredit: customer’s credit is good or not.

RatePlan: rate plan for the customer.

DealerType: dealer type.

Age: customer age.

Province: province.

Sales: the amount of sales to a customer

 

the first three line of data is like this:

1176913194483 06/20/1999                                          0        1  A1       58    BC    $128.00
1176914599423 10/04/1999 10/15/1999     NEED        1        1 A1        45    AB     $72.00
1176951913656 07/01/2000                                          0        1 A1        57    BC     $593.00

when I try to use the code to read the file into sas table, always give me proble, I want to get help how to read the file proberly.I use sas 9.4. my code is:

data telecom;
infile '\My file path\' dlm=' ';
length Acctno $12 Actdt 10 Deactdt 10 DeactReason $8 GoodCredit 8 RatePlan $8 DealerType $2 Age 8 Province $2 Sales 8;
format Actdt Deactdt date9. Sales dollar8.2;
input Acctno Actdt mmddyy10. Deactdt mmddyy10. DeactReason GoodCredit RatePlan DealerType Age Province Sales dollar8.2;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
/*
OK. Just add one more data step after my code.
*/
data want;
retain
Acctno
Actdt
Deactdt
DeactReason
GoodCredit
RatePlan
DealerType
Age
Province
Sales
;
set telecom;
run;

View solution in original post

29 REPLIES 29
Tom
Super User Tom
Super User

When I copied your example lines and pasted it into the program editor I got this garbled file.

1176913194483 06/20/1999                                          0        1  A1       58    BC    $128.00
1176914599423 10/04/1999 10/15/1999     NEED        1        1 A1        45    AB     $72.00
1176951913656 07/01/2000                                          0        1 A1        57    BC     $593.00

So it is NOT a delimited file and NOT a fixed position file either.

 

Perhaps the forum editor just garbled the file.  Show the example lines using the POP-UP window that appears when you click on the Insert Code icon.  That will prevent the forum editor from re-flowing the lines into paragraphs.

Does your actual file look like this? With fixed positions for the columns of data?

1176913194483 06/20/1999                                          0        1  A1        58    BC    $128.00
1176914599423 10/04/1999 10/15/1999     NEED                      1        1  A1        45    AB     $72.00
1176951913656 07/01/2000                                          0        1  A1        57    BC    $593.00

If so then you will want to use COLUMN MODE or FORMATTED MODE (or a combination of both) to read the variables from the lines.

 

To see the column positions you can try using the LIST statement in a data step. You might start by just looking at the first 10 lines.

data _null_;
  infile '/My_file_directory/my_filename.my_extension' obs=10;
  input;
  list;
run;

Has some thing accidentally inserted TAB characters in place of some of the spaces in the file?  You might want to try using the EXPANDTABS option on the INFILE statement.

 

If the file does not have the data in fixed columns then you cannot use SPACE as the delimiter to treat it as a delimited file because you clearly have multiple spaces between the fields in some of the lines.  So instead you might be able to read it as a LIST MODE file, but that would required that the missing date fields have period to represent their position on the line.

So perhaps something like this?

1176913194483 06/20/1999          .        .                      0        1  A1        58    BC    $128.00
1176914599423 10/04/1999 10/15/1999     NEED                      1        1  A1        45    AB     $72.00
1176951913656 07/01/2000          .        .                      0        1  A1        57    BC    $593.00
V168
Calcite | Level 5

thanks  I run the code and get this result

NOTE: The infile 'D:\Metro College class\Advanced SAS\Mini Project\New_Wireless_Fixed.txt' is:
Filename=D:\Metro College class\Advanced SAS\Mini Project\New_Wireless_Fixed.txt,
RECFM=V,LRECL=32767,File Size (bytes)=9611970,
Last Modified=April 04, 2023 21:51:23,
Create Time=April 04, 2023 21:51:21

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
1 1176913194483 06/20/1999 0 1 A1 58 BC $12
89 8.00 92
2 1176914599423 10/04/1999 10/15/1999 NEED 1 1 A1 45 AB $7
89 2.00 92
3 1176951913656 07/01/2000 0 1 A1 57 BC $59
89 3.00 92
4 1176954000288 05/30/2000 1 2 A1 47 ON $8
89 3.00 92
5 1176969186303 12/13/2000 1 1 C1 82 BC
89 92
6 1176991056273 08/31/1999 09/18/2000 MOVE 1 1 C1 92 QC $104
89 1.00 92
7 1176991866552 05/24/2000 1 1 A1 77 ON
89 92
8 1176992889500 11/28/2000 1 1 C1 68 AB $7
89 2.00 92
9 1177000067271 12/23/1999 0 1 B1 75 ON $13
89 4.00 92
10 1177010940613 12/09/1999 1 2 A1 42 NS $1
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
89 1.00 92
NOTE: 10 records were read from the infile 'D:\Metro College class\Advanced SAS\Mini
Project\New_Wireless_Fixed.txt'.
The minimum record length was 92.
The maximum record length was 92.
NOTE: DATA statement used (Total process time):
real time 0.48 seconds
cpu time 0.04 seconds

Tom
Super User Tom
Super User

Since you AGAIN pasted the text into the BODY of your message instead of using the features of this forum to allow you to share the un-altered text we have to guess what you are trying to show.  But since it looks like ALL of the lines are exact 92 bytes long I suspect that the fields are in in fixed columns.  So use the rulers that the LIST statement drew to help you write the INPUT statement using the right column numbers.

 

First let me GUESS how to put your lines back together and then I can show you how to COUNT.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9--
    1 1176913194483 06/20/1999                                               0 1 A1 58 BC  $128.00 92
    2 1176914599423 10/04/1999 10/15/1999 NEED                               1 1 A1 45 AB   $72.00 92
    3 1176951913656 07/01/2000                                               0 1 A1 57 BC  $593.00 92
    4 1176954000288 05/30/2000                                               1 2 A1 47 ON   $83.00 92
    5 1176969186303 12/13/2000                                               1 1 C1 82 BC          92
    6 1176991056273 08/31/1999 09/18/2000 MOVE                               1 1 C1 92 QC $1041.00 92
    7 1176991866552 05/24/2000                                               1 1 A1 77 ON          92
    8 1176992889500 11/28/2000                                               1 1 C1 68 AB   $72.00 92
    9 1177000067271 12/23/1999                                               0 1 B1 75 ON  $134.00 92
   10 1177010940613 12/09/1999                                               1 2 A1 42 NS   $11.00 92
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9--

So you appear to have 6 fields at the end of the line, one at the start of the line and some unknown number in between.  Looks like the first two of those middle fields have DATE values in (the potentially confusing) MDY order.  Then there is at least one field that has strings like MOVE or NEED.

You would need to look at more of the data to decide. Or better read the DESCRIPTION you got with the data that explains what the fields are and what columns they occupy.

 

input id $ 1-13 @15 date1 mmddyy10. @26 date2 mmddyy10.  36-40 $ motion 
      72 num1 74 num2 76-77 $ char1 79-80 num3 82-83 $ char2 @84 money comma9.
;

Once you look at your real file instead of my attempt to clean up the lines that pasting into the body of a message destroyed you probably need to adjust the column numbers.

 

Notice that for the date fields and the money field you have to use FORMATTED mode input.  You cannot use a special informat with COLUMN MODE input.

 

You will need to attach a date type format to the date variables so that SAS will print the dates in a human readable style.  I recommend using either the DATE9. format or the YYMMDD10. format.  Displaying dates in either MDY or DMY order is confusing.  Which ever one you pick half of the audience will be confused by it.

V168
Calcite | Level 5
NOTE: The infile 'D:\Metro College class\Advanced SAS\Mini Project\New_Wireless_Fixed.txt' is:
      Filename=D:\Metro College class\Advanced SAS\Mini Project\New_Wireless_Fixed.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=9611970,
      Last Modified=April 04, 2023 21:51:23,
      Create Time=April 04, 2023 21:51:21

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
1         1176913194483 06/20/1999                            0        1  A1       58    BC    $12
      89  8.00 92
2         1176914599423 10/04/1999 10/15/1999     NEED        1        1  A1       45    AB     $7
      89  2.00 92
3         1176951913656 07/01/2000                            0        1  A1       57    BC    $59
      89  3.00 92
4         1176954000288 05/30/2000                            1        2  A1       47    ON     $8
      89  3.00 92
5         1176969186303 12/13/2000                            1        1  C1       82    BC
      89       92
6         1176991056273 08/31/1999 09/18/2000     MOVE        1        1  C1       92    QC   $104
      89  1.00 92
7         1176991866552 05/24/2000                            1        1  A1       77    ON
      89       92
8         1176992889500 11/28/2000                            1        1  C1       68    AB     $7
      89  2.00 92
9         1177000067271 12/23/1999                            0        1  B1       75    ON    $13
      89  4.00 92
10        1177010940613 12/09/1999                            1        2  A1       42    NS     $1
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
      89  1.00 92
NOTE: 10 records were read from the infile 'D:\Metro College class\Advanced SAS\Mini
      Project\New_Wireless_Fixed.txt'.
      The minimum record length was 92.
      The maximum record length was 92.
NOTE: DATA statement used (Total process time):
      real time           0.48 seconds
      cpu time            0.04 seconds


Sorry maybe this time is the unulrtered text code

V168
Calcite | Level 5

Thanks for your fast reply. Here I attached the split file, can you tell me the code to read the file.

Tom
Super User Tom
Super User

@V168 wrote:

Thanks for your fast reply. Here I attached the split file, can you tell me the code to read the file.


You can do it yourself.  The file appears to have exactly the fields I identified before.  Only the actual column numbers are slightly different.

----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9--
1176914599423 10/04/1999 10/15/1999     NEED        1        1  A1       45    AB     $72.00

So the first field is in columns 1 to 13.  It looks like an identifier, so read it has CHARACTER even though it only contains digits.

Then there are 2 date fields that are each 10 characters long at columns 15 and 26.

Then that status field or whatever it is that is in columns 41 to 44.

Then there are one character fields in column 53 and 62 and two character fields in columns 65-66, 74-75 and 81-82.  Some of them only have digits so they could be read as numbers.  You will need to know what the strings mean to decide if it makes sense to treat them as a number or a string. 

And finally the dollar amount at the end in columns 82 to 92.  So read it from column 82 using COMMA informat with a width of 11.

V168
Calcite | Level 5

Thank you so much, I will try to figure it out tomorrow

ballardw
Super User

You need to read your log. If you do not understand what the log tells you then copy the text from the log and paste into a text box opened on the forum with the </> icon that appears above the message window. Copy the code and all messages related to data step or procedure. The text box is important to preserve the diagnostic codes that SAS often provides.

 

Brief example using your code

342  data telecom;
343  /*infile '\My file path\' dlm=' ';*/
344  length Acctno $12 Actdt 10 Deactdt 10 DeactReason $8 GoodCredit 8 RatePlan $8 DealerType $2
                             --         --
                             352        352
344! Age 8 Province $2 Sales 8;
ERROR 352-185: The length of numeric variables is 3-8.

345  format Actdt Deactdt date9. Sales dollar8.2;
346  input Acctno Actdt mmddyy10. Deactdt mmddyy10. DeactReason GoodCredit RatePlan DealerType Age
346!  Province Sales dollar8.2;
347  datalines;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TELECOM may be incomplete.  When this step was stopped there were 0
         observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

351  run;

Please paste example data from a text file into a text box as well. The message window has reformatted your text, or any edits you did, so that the rows you show now have TAB characters in them and possibly other artifacts.

 

The first problem is that you are using illegal syntax on the LENGTH statement. SAS finds that sort of error and stops.

I notice when I count characters that your Acctno variable shows 13 characters in your example: 1176913194483. So the length you have assigned will truncate the last digit.

 

Your INFILE options are not set correctly to deal with missing values.

You are attempting to use list input but two of your three example lines Deactdt and DeactReason are not present. So when the variables are not present they attempt to read other values. When you fix your LENGTH statement you should get a lot of invalid data notes because of this. The option that you likely need to add to your INFILE statement is DSD. That will treat two successive delimiters as indicating a missing value so the spaces are for missing are treated correctly.

 

Next, when you specific an informat on the Input statement such as Deactdt mmddyy10. with a delimited file it is a good idea to use the : modifier, as in Deactdt :mmddyy10. so that short values don't cause problems.

 

And lastly you might have an issue with the Dollar8.2 informat. If any of your values are missing the decimal place then the behavior SAS has for such is to imply a decimal point. Just use Dollar8. and things will be fine.

 

Here is an example of this behavior. You can copy this code into your SAS editor and run it. You will see that the first 4 rows are treated as likely intended because there is a decimal in the value even if there is nothing to the right of it. But the last row is read as 1.23 because it has no decimal and the Informat you specified as Dollar8.2 implied one. Change the Informat to Dollar8. and the behavior is as expected.

data junk;
   input amount dollar8.2;
   format amount dollar8.2;
datalines;
$123.45
$123.00
123.0
123.
$123
;

This actually is a hold over feature from the days of punch cards for data entry. The limited number of columns meant that fixed decimal places, such as in currency did not have to actually punch that character saving space.

 

 

 

 

 

V168
Calcite | Level 5
NOTE: The infile 'D:\Metro College class\Advanced SAS\Mini Project\New_Wireless_Fixed.txt' is:
      Filename=D:\Metro College class\Advanced SAS\Mini Project\New_Wireless_Fixed.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=9611970,
      Last Modified=April 04, 2023 21:51:23,
      Create Time=April 04, 2023 21:51:21

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
1         1176913194483 06/20/1999                            0        1  A1       58    BC    $12
      89  8.00 92
2         1176914599423 10/04/1999 10/15/1999     NEED        1        1  A1       45    AB     $7
      89  2.00 92
3         1176951913656 07/01/2000                            0        1  A1       57    BC    $59
      89  3.00 92
4         1176954000288 05/30/2000                            1        2  A1       47    ON     $8
      89  3.00 92
5         1176969186303 12/13/2000                            1        1  C1       82    BC
      89       92
6         1176991056273 08/31/1999 09/18/2000     MOVE        1        1  C1       92    QC   $104
      89  1.00 92
7         1176991866552 05/24/2000                            1        1  A1       77    ON
      89       92
8         1176992889500 11/28/2000                            1        1  C1       68    AB     $7
      89  2.00 92
9         1177000067271 12/23/1999                            0        1  B1       75    ON    $13
      89  4.00 92
10        1177010940613 12/09/1999                            1        2  A1       42    NS     $1
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
      89  1.00 92
NOTE: 10 records were read from the infile 'D:\Metro College class\Advanced SAS\Mini
      Project\New_Wireless_Fixed.txt'.
      The minimum record length was 92.
      The maximum record length was 92.
NOTE: DATA statement used (Total process time):
      real time           0.48 seconds
      cpu time            0.04 seconds


thanks for your fast reply and this the code I run and get some info for the file.  I also attached the split sorce data file

Ksharp
Super User
data telecom;
infile 'c:\temp\xaa.txt' dlm=' ' termstr=crlf truncover;
input 
Acctno $ 1-14 
_Actdt $ 15-25
_Deactdt  $ 26-36
DeactReason $ 41-52
GoodCredit 53-61
RatePlan 62-64
DealerType $ 65-73
Age 74-79
Province $ 80-82
_Sales $ 83-93
;
Actdt=input(_Actdt,mmddyy12.);
Deactdt=input(_Deactdt,mmddyy12.);
Sales=input(_Sales,dollar32.);
format Actdt Deactdt mmddyy10. Sales dollar8.2;
drop _:;
run;

Ksharp_0-1682336562821.png

 

V168
Calcite | Level 5

Thank you so much for this code! it gives me the result, but the sequence of the column is not the original sequence, can you tell me why. Can you show me the code according to this sequence?

Data:

Acctno: account number.

Actdt: account activation date

Deactdt: account deactivation date

DeactReason: reason for deactivation.

GoodCredit: customer’s credit is good or not.

RatePlan: rate plan for the customer.

DealerType: dealer type.

Age: customer age.

Province: province.

Sales: the amount of sales to a customer

 
Ksharp
Super User
/*
OK. Just add one more data step after my code.
*/
data want;
retain
Acctno
Actdt
Deactdt
DeactReason
GoodCredit
RatePlan
DealerType
Age
Province
Sales
;
set telecom;
run;
V168
Calcite | Level 5

Thank you so much, it is woks!

Tom
Super User Tom
Super User

There is no need to make the temporary character variables.

Just read the fields that need special informats using FORMATTED MODE.

input 
Acctno $ 1-14 
@15 Actdt mmddyy10.
@26 Deactdt  mmddyy10.
DeactReason $ 41-52
GoodCredit 53-61
RatePlan 62-64
DealerType $ 65-73
Age 74-79
Province $ 80-82
@83 Sales comma10.
;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 29 replies
  • 2759 views
  • 0 likes
  • 4 in conversation