BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PS1214
Calcite | Level 5

 

Hello,

 

I have an invalid data problem when I am trying to generate sample datasets from this link as I go through the advanced prep guide book.

https://support.sas.com/content/dam/SAS/support/en/books/data/sampledata.txt

For example sasuser.budget2 dataset generates invalid data because the column numbers in the datalines don't match the input mapping. It looks like it has a mixed combination of tab and space delimiters.  I could manually just fix the datalines by rewriting the datalines or remove the blanks, but that shouldn't be the case, and I might have done something wrong. I can't figure it out with my limited knowledge.

I have also attached a log that shows my problem.

I am using SAS Studio UNIVERSITY EDITION 2.8 9.4 M6.

 

Thank you for your time.

 

1 ACCEPTED SOLUTION

Accepted Solutions
sabisw
SAS Employee

@PS1214, You can replace the Sasuser.budget code in your sample data with the following code. Sometimes, because of Enterprise Guide and SAS Studio this sampledata program may cause extra spaces or errors. We are working on coming out with a sample data program that is a lot cleaner so no matter what version of SAS you are using it should run your sample data with no issue.

 

data SASUSER.BUDGET2;
  infile datalines dsd truncover;
  input Manager:$8. Job_Type:32. WageCat:$2. WageRate:32. Yearly_Salary:32. Payroll:32.;
datalines4;
Coxe,3,S,3392,40710,40710
Coxe,50,S,3420,41040,41040
Coxe,50,S,6862,82350,123390
Coxe,240,H,13.65,27300.0,27300.0
Coxe,240,S,4522.50,54270.0,81570.0
Delgado,240,S,2960.00,35520.0,35520.0
Delgado,240,S,5260.00,63120.0,98640.0
Delgado,420,S,1572.50,18870.0,18870.0
Delgado,420,S,3819.20,45830.0,64700.4
Delgado,440,S,1813.30,21759.6,21759.6
Overby,1,S,6855.90,82270.8,82270.8
Overby,5,S,4045.80,48550.2,48549.6
Overby,10,S,4480.50,53766.0,53766.0
Overby,20,S,5910.80,70929.0,70929.0
Overby,20,S,9073.80,108850.0,179815.2
;;;;
run;

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

I am not sure if the tabs are there on purpose as some sort of test of your skills?

I quick and easy way to convert the tabs to spaces is to use the EXPANDTABS options on INFILE statement.

943   data _null_;
944     if eof then put n=;
945     infile 'c:\downloads\sampledata.txt' end=eof;
946     input;
947     if index(_infile_,'09'x) then n+1;
948   run;

NOTE: The infile 'c:\downloads\sampledata.txt' is:
      Filename=c:\downloads\sampledata.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=427585,
      Last Modified=14May2019:09:53:31,
      Create Time=14May2019:09:53:29

n=109
NOTE: 9784 records were read from the infile 'c:\downloads\sampledata.txt'.
      The minimum record length was 0.
      The maximum record length was 132.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


949   data _null_;
950     if eof then put n=;
951     infile 'c:\downloads\sampledata.txt' expandtabs end=eof;
952     input;
953     if index(_infile_,'09'x) then n+1;
954   run;

NOTE: The infile 'c:\downloads\sampledata.txt' is:
      Filename=c:\downloads\sampledata.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=427585,
      Last Modified=14May2019:09:53:31,
      Create Time=14May2019:09:53:29

n=0
NOTE: 9784 records were read from the infile 'c:\downloads\sampledata.txt'.
      The minimum record length was 0.
      The maximum record length was 132.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

So you could use that to fix the file.

data _null_;
  infile 'c:\downloads\sampledata.txt' expandtabs ;
  file 'c:\downloads\sampledata_notabs.txt';
  input;
  put _infile_;
run;
PS1214
Calcite | Level 5

Thank you, that seemed to fix the log errors for invalid data, but some datasets still have incorrect/missing data.

For example:

 

data budget2;
        input Manager $1-7 Job_Type 9-11 WageCat $13-14 WageRate 16-21 Yearly_Salary 24-31 Payroll 33-43;
        datalines;
Coxe    3       S       3392.50 40710.0         40710.0
Coxe    50      S       3420.00 41040.0         41040.0
Coxe    50      S       6862.50 82350.0         123390.0
Coxe    240     H       13.65   27300.0         27300.0
Coxe    240     S       4522.50 54270.0         81570.0
Delgado 240     S       2960.00 35520.0         35520.0
Delgado 240     S       5260.00 63120.0         98640.0
Delgado 420     S       1572.50 18870.0         18870.0
Delgado 420     S       3819.20 45830.0         64700.4
Delgado 440     S       1813.30 21759.6         21759.6
Overby  1       S       6855.90 82270.8         82270.8
Overby  5       S       4045.80 48550.2         48549.6
Overby  10      S       4480.50 53766.0         53766.0
Overby  20      S       5910.80 70929.0         70929.0
Overby  20      S       9073.80 108850.0        179815.2
;
run;

I guess there isn't a fix all line of code, and I would just have to live with it.

Also according to what @sabisw pointed out the inherent flaw of the sampledata when used in SAS Studio.

 

sabisw
SAS Employee

@PS1214, You can replace the Sasuser.budget code in your sample data with the following code. Sometimes, because of Enterprise Guide and SAS Studio this sampledata program may cause extra spaces or errors. We are working on coming out with a sample data program that is a lot cleaner so no matter what version of SAS you are using it should run your sample data with no issue.

 

data SASUSER.BUDGET2;
  infile datalines dsd truncover;
  input Manager:$8. Job_Type:32. WageCat:$2. WageRate:32. Yearly_Salary:32. Payroll:32.;
datalines4;
Coxe,3,S,3392,40710,40710
Coxe,50,S,3420,41040,41040
Coxe,50,S,6862,82350,123390
Coxe,240,H,13.65,27300.0,27300.0
Coxe,240,S,4522.50,54270.0,81570.0
Delgado,240,S,2960.00,35520.0,35520.0
Delgado,240,S,5260.00,63120.0,98640.0
Delgado,420,S,1572.50,18870.0,18870.0
Delgado,420,S,3819.20,45830.0,64700.4
Delgado,440,S,1813.30,21759.6,21759.6
Overby,1,S,6855.90,82270.8,82270.8
Overby,5,S,4045.80,48550.2,48549.6
Overby,10,S,4480.50,53766.0,53766.0
Overby,20,S,5910.80,70929.0,70929.0
Overby,20,S,9073.80,108850.0,179815.2
;;;;
run;
PS1214
Calcite | Level 5

Thank you.

 

Would I have to just manually rewrite the code for each affected dataset then?

sabisw
SAS Employee

@PS1214 , you shouldn't have to. It depends on what is causing the issue. If you use my code you shouldn't have any missing data set for Sasuser.Budget2

PS1214
Calcite | Level 5

There are a bunch of datasets with the same issue.

I am not sure how to figure out what causes the problem either.

I simply just copied the contents of the link and followed the instructions to generate the data.

sabisw
SAS Employee

@PS1214, looking at the log you provided these are the other ones:

  • Sasuser.visit
  • Sasuser.demog
  • Sasuser.y02jan
  • Sasuser.review2010

Can you verify these are the ones? 

sabisw
SAS Employee

Here is the rewritten version of the sample data sets that are causing errors.

 

Sasuser.Visit

 

data Sasuser.Visit;
  infile datalines dsd truncover;
  input ID:$4. Age:32. Visit:32. SysBP:32. DiasBP:32. Weight:32. Date:$8.;
datalines4;
A008,26,1,126,80,182,05/22/09
A005,33,1,136,76,174,02/27/09
A005,31,2,132,78,175,07/11/09
A005,29,3,134,78,176,04/16/09
A004,26,1,143,86,204,03/30/09
A003,38,1,118,68,125,08/12/09
A003,41,2,112,65,123,08/21/09
A002,22,1,121,75,168,04/14/09
A001,23,1,140,85,195,11/05/09
A001,38,2,138,90,198,10/13/09
A001,35,3,145,95,200,07/04/09
;;;;
run;

Sasuser.Demog

data Sasuser.Demog;
  infile datalines dsd truncover;
  input ID:$4. Age:32. Sex:$2. Date:$10.;
datalines4;
A007,39,M,11/11/05
A005,44,F,02/24/05
A004,,,01/27/06
A003,24,F,08/17/07
A002,32,M,06/15/06
A001,21,M,05/22/07
;;;;
run;

Sort Sasuser.Demog and Sasuser.Visit

proc sort data=sasuser.visit out=sasuser.visit; 
	by ID;
run;
proc sort data=sasuser.demog out=sasuser.demog;
	by ID;
run;

Sasuser.y02jan

data sasuser.y02jan;
    infile datalines dsd truncover;
	input Week:32. Sale:32. Day $9.;
	datalines;
1,1869.33,Monday
1,1689.01,Tuesday
1,2655.00,Wednesday
1,1556.23,Thursday
1,3341.11,Friday
2,2212.63,Monday
2,1701.85,Tuesday
2,1005.46,Wednesday
2,1990.86,Thursday
2,3642.53,Friday
3,1775.34,Monday
3,1639.72,Tuesday
3,2335.69,Wednesday
3,2863.82,Thursday
3,3010.17,Friday
4,1398.22,Monday
4,1330.58,Tuesday
4,1458.67,Wednesday
4,1623.42,Thursday
4,2336.00,Friday
5,2034.97,Monday
5,1803.04,Tuesday
5,1953.38,Wednesday
5,2064.67,Thursday
5,2336.44,Friday
6,1046.25,Monday
6,1334.85,Tuesday
6,1455.88,Wednesday
6,2288.30,Thursday
6,3401.68,Friday
7,1652.73,Monday
7,1987.24,Tuesday
7,1773.12,Wednesday
7,2468.81,Thursday
7,3014.25,Friday
8,1996.77,Monday
8,1843.54,Tuesday
8,1268.59,Wednesday
8,1663.84,Thursday
8,2657.44,Friday
9,1699.74,Monday
9,1798.32,Tuesday
9,1973.16,Wednesday
9,2634.84,Thursday
9,3219.98,Friday
10,1883.47,Monday
10,1432.83,Tuesday
10,1803.44,Wednesday
10,2137.49,Thursday
10,2750.70,Friday
;;;
run;

Sasuser.review2010

data sasuser.review2010;
   infile datalines dsd truncover;
   input Site:$11. Day:32. Rate:$2. Name:$11.;
datalines;
Westin,12,A2,'Mitchell, K'
Stockton,4,A5,'Worton M'
Center City,17,B1,'Smith, A'
;
run;

 

PS1214
Calcite | Level 5

Sorry for the late response.

 

I did another quick look through and it looks like it is just those four.

Did not realize I was worried about just needing to fix four.

Thank you very much.

 

sabisw
SAS Employee

@PS1214 No worries. Glad I could help!

 

I will add these to the this edition's errata. 

Welcome to the Certification Community

 

This is a knowledge-sharing community for SAS Certified Professionals and anyone who wants to learn more about becoming SAS Certified. Ask questions and get answers fast. Share with others who are interested in certification and who are studying for certifications.To get the most from your community experience, use these getting-started resources:

Community Do's and Don'ts
How to add SAS syntax to your post
How to get fast, helpful answers

 

Why Get SAS Certified.jpg

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2366 views
  • 0 likes
  • 3 in conversation