Hello,
I have an invalid data problem when I am trying to generate sample datasets from this link as I go through the advanced prep guide book.
https://support.sas.com/content/dam/SAS/support/en/books/data/sampledata.txt
For example sasuser.budget2 dataset generates invalid data because the column numbers in the datalines don't match the input mapping. It looks like it has a mixed combination of tab and space delimiters. I could manually just fix the datalines by rewriting the datalines or remove the blanks, but that shouldn't be the case, and I might have done something wrong. I can't figure it out with my limited knowledge.
I have also attached a log that shows my problem.
I am using SAS Studio UNIVERSITY EDITION 2.8 9.4 M6.
Thank you for your time.
@PS1214, You can replace the Sasuser.budget code in your sample data with the following code. Sometimes, because of Enterprise Guide and SAS Studio this sampledata program may cause extra spaces or errors. We are working on coming out with a sample data program that is a lot cleaner so no matter what version of SAS you are using it should run your sample data with no issue.
data SASUSER.BUDGET2;
infile datalines dsd truncover;
input Manager:$8. Job_Type:32. WageCat:$2. WageRate:32. Yearly_Salary:32. Payroll:32.;
datalines4;
Coxe,3,S,3392,40710,40710
Coxe,50,S,3420,41040,41040
Coxe,50,S,6862,82350,123390
Coxe,240,H,13.65,27300.0,27300.0
Coxe,240,S,4522.50,54270.0,81570.0
Delgado,240,S,2960.00,35520.0,35520.0
Delgado,240,S,5260.00,63120.0,98640.0
Delgado,420,S,1572.50,18870.0,18870.0
Delgado,420,S,3819.20,45830.0,64700.4
Delgado,440,S,1813.30,21759.6,21759.6
Overby,1,S,6855.90,82270.8,82270.8
Overby,5,S,4045.80,48550.2,48549.6
Overby,10,S,4480.50,53766.0,53766.0
Overby,20,S,5910.80,70929.0,70929.0
Overby,20,S,9073.80,108850.0,179815.2
;;;;
run;
I am not sure if the tabs are there on purpose as some sort of test of your skills?
I quick and easy way to convert the tabs to spaces is to use the EXPANDTABS options on INFILE statement.
943 data _null_; 944 if eof then put n=; 945 infile 'c:\downloads\sampledata.txt' end=eof; 946 input; 947 if index(_infile_,'09'x) then n+1; 948 run; NOTE: The infile 'c:\downloads\sampledata.txt' is: Filename=c:\downloads\sampledata.txt, RECFM=V,LRECL=32767,File Size (bytes)=427585, Last Modified=14May2019:09:53:31, Create Time=14May2019:09:53:29 n=109 NOTE: 9784 records were read from the infile 'c:\downloads\sampledata.txt'. The minimum record length was 0. The maximum record length was 132. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 949 data _null_; 950 if eof then put n=; 951 infile 'c:\downloads\sampledata.txt' expandtabs end=eof; 952 input; 953 if index(_infile_,'09'x) then n+1; 954 run; NOTE: The infile 'c:\downloads\sampledata.txt' is: Filename=c:\downloads\sampledata.txt, RECFM=V,LRECL=32767,File Size (bytes)=427585, Last Modified=14May2019:09:53:31, Create Time=14May2019:09:53:29 n=0 NOTE: 9784 records were read from the infile 'c:\downloads\sampledata.txt'. The minimum record length was 0. The maximum record length was 132. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
So you could use that to fix the file.
data _null_;
infile 'c:\downloads\sampledata.txt' expandtabs ;
file 'c:\downloads\sampledata_notabs.txt';
input;
put _infile_;
run;
Thank you, that seemed to fix the log errors for invalid data, but some datasets still have incorrect/missing data.
For example:
data budget2;
input Manager $1-7 Job_Type 9-11 WageCat $13-14 WageRate 16-21 Yearly_Salary 24-31 Payroll 33-43;
datalines;
Coxe 3 S 3392.50 40710.0 40710.0
Coxe 50 S 3420.00 41040.0 41040.0
Coxe 50 S 6862.50 82350.0 123390.0
Coxe 240 H 13.65 27300.0 27300.0
Coxe 240 S 4522.50 54270.0 81570.0
Delgado 240 S 2960.00 35520.0 35520.0
Delgado 240 S 5260.00 63120.0 98640.0
Delgado 420 S 1572.50 18870.0 18870.0
Delgado 420 S 3819.20 45830.0 64700.4
Delgado 440 S 1813.30 21759.6 21759.6
Overby 1 S 6855.90 82270.8 82270.8
Overby 5 S 4045.80 48550.2 48549.6
Overby 10 S 4480.50 53766.0 53766.0
Overby 20 S 5910.80 70929.0 70929.0
Overby 20 S 9073.80 108850.0 179815.2
;
run;
I guess there isn't a fix all line of code, and I would just have to live with it.
Also according to what @sabisw pointed out the inherent flaw of the sampledata when used in SAS Studio.
@PS1214, You can replace the Sasuser.budget code in your sample data with the following code. Sometimes, because of Enterprise Guide and SAS Studio this sampledata program may cause extra spaces or errors. We are working on coming out with a sample data program that is a lot cleaner so no matter what version of SAS you are using it should run your sample data with no issue.
data SASUSER.BUDGET2;
infile datalines dsd truncover;
input Manager:$8. Job_Type:32. WageCat:$2. WageRate:32. Yearly_Salary:32. Payroll:32.;
datalines4;
Coxe,3,S,3392,40710,40710
Coxe,50,S,3420,41040,41040
Coxe,50,S,6862,82350,123390
Coxe,240,H,13.65,27300.0,27300.0
Coxe,240,S,4522.50,54270.0,81570.0
Delgado,240,S,2960.00,35520.0,35520.0
Delgado,240,S,5260.00,63120.0,98640.0
Delgado,420,S,1572.50,18870.0,18870.0
Delgado,420,S,3819.20,45830.0,64700.4
Delgado,440,S,1813.30,21759.6,21759.6
Overby,1,S,6855.90,82270.8,82270.8
Overby,5,S,4045.80,48550.2,48549.6
Overby,10,S,4480.50,53766.0,53766.0
Overby,20,S,5910.80,70929.0,70929.0
Overby,20,S,9073.80,108850.0,179815.2
;;;;
run;
Thank you.
Would I have to just manually rewrite the code for each affected dataset then?
@PS1214 , you shouldn't have to. It depends on what is causing the issue. If you use my code you shouldn't have any missing data set for Sasuser.Budget2
There are a bunch of datasets with the same issue.
I am not sure how to figure out what causes the problem either.
I simply just copied the contents of the link and followed the instructions to generate the data.
@PS1214, looking at the log you provided these are the other ones:
Can you verify these are the ones?
Here is the rewritten version of the sample data sets that are causing errors.
Sasuser.Visit
data Sasuser.Visit;
infile datalines dsd truncover;
input ID:$4. Age:32. Visit:32. SysBP:32. DiasBP:32. Weight:32. Date:$8.;
datalines4;
A008,26,1,126,80,182,05/22/09
A005,33,1,136,76,174,02/27/09
A005,31,2,132,78,175,07/11/09
A005,29,3,134,78,176,04/16/09
A004,26,1,143,86,204,03/30/09
A003,38,1,118,68,125,08/12/09
A003,41,2,112,65,123,08/21/09
A002,22,1,121,75,168,04/14/09
A001,23,1,140,85,195,11/05/09
A001,38,2,138,90,198,10/13/09
A001,35,3,145,95,200,07/04/09
;;;;
run;
Sasuser.Demog
data Sasuser.Demog;
infile datalines dsd truncover;
input ID:$4. Age:32. Sex:$2. Date:$10.;
datalines4;
A007,39,M,11/11/05
A005,44,F,02/24/05
A004,,,01/27/06
A003,24,F,08/17/07
A002,32,M,06/15/06
A001,21,M,05/22/07
;;;;
run;
Sort Sasuser.Demog and Sasuser.Visit
proc sort data=sasuser.visit out=sasuser.visit;
by ID;
run;
proc sort data=sasuser.demog out=sasuser.demog;
by ID;
run;
Sasuser.y02jan
data sasuser.y02jan;
infile datalines dsd truncover;
input Week:32. Sale:32. Day $9.;
datalines;
1,1869.33,Monday
1,1689.01,Tuesday
1,2655.00,Wednesday
1,1556.23,Thursday
1,3341.11,Friday
2,2212.63,Monday
2,1701.85,Tuesday
2,1005.46,Wednesday
2,1990.86,Thursday
2,3642.53,Friday
3,1775.34,Monday
3,1639.72,Tuesday
3,2335.69,Wednesday
3,2863.82,Thursday
3,3010.17,Friday
4,1398.22,Monday
4,1330.58,Tuesday
4,1458.67,Wednesday
4,1623.42,Thursday
4,2336.00,Friday
5,2034.97,Monday
5,1803.04,Tuesday
5,1953.38,Wednesday
5,2064.67,Thursday
5,2336.44,Friday
6,1046.25,Monday
6,1334.85,Tuesday
6,1455.88,Wednesday
6,2288.30,Thursday
6,3401.68,Friday
7,1652.73,Monday
7,1987.24,Tuesday
7,1773.12,Wednesday
7,2468.81,Thursday
7,3014.25,Friday
8,1996.77,Monday
8,1843.54,Tuesday
8,1268.59,Wednesday
8,1663.84,Thursday
8,2657.44,Friday
9,1699.74,Monday
9,1798.32,Tuesday
9,1973.16,Wednesday
9,2634.84,Thursday
9,3219.98,Friday
10,1883.47,Monday
10,1432.83,Tuesday
10,1803.44,Wednesday
10,2137.49,Thursday
10,2750.70,Friday
;;;
run;
Sasuser.review2010
data sasuser.review2010;
infile datalines dsd truncover;
input Site:$11. Day:32. Rate:$2. Name:$11.;
datalines;
Westin,12,A2,'Mitchell, K'
Stockton,4,A5,'Worton M'
Center City,17,B1,'Smith, A'
;
run;
Sorry for the late response.
I did another quick look through and it looks like it is just those four.
Did not realize I was worried about just needing to fix four.
Thank you very much.
This is a knowledge-sharing community for SAS Certified Professionals and anyone who wants to learn more about becoming SAS Certified. Ask questions and get answers fast. Share with others who are interested in certification and who are studying for certifications.To get the most from your community experience, use these getting-started resources:
Community Do's and Don'ts
How to add SAS syntax to your post
How to get fast, helpful answers
Ready to level-up your skills? Choose your own adventure.