BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dthompsonada
Obsidian | Level 7

Hello SAS Communities, 

I am having intermittent problems with loading CSV formatted data sets in SAS Windowing -  SOFTWARE: SAS (r) Proprietary Software 9.4 (TS1M6). The data are coming from different raw data sets each time. What I am getting is errors in the log and being told that the import failed, though it *appears* that all of the rows/columns of data upload.  I had 33 variables and 3632 rows in a data set, as named above. The data appeared to match the CSV file, however, I had chronic problems running subsequent code and was forced to resave as an Excel document to enable data cleaning code to run.

In some cases, this is working. In others, I am finding that it corrupts the data and am having to manually manipulate the data into new excel files, which I would like to avoid at all costs.

Because I am seeing this happen some of the time, not all of the time, I am wondering if this is something that the community has seen? What the issue is with the raw CSV data? If there is something that I can request with my raw data, or some alteration that I can make prior to import, to avoid this in future?

Here’s an example where the import struggled:

A sample of my data:

id

AGG_M

AGG_P

HV

HF

inc

BMI

HBP

DIA

LIV

KID

FRP

FP

TCH

TRIG

LDL

DYS

CESD

SM

DK

HER

IDU

LEU

VL

ADH

RACE

EDB

hiv

age

ART

eART

yr

hd

1

44.9071

52.52557

1

NA

4

24.71756

1

1

1

1

1

1

133

176

62

2

14

3

0

1

2

104.1659

102013

NA

1

4

1

52

0

0

0

1

1

58.20754

41.29347

1

4

4

26.06801

1

1

1

1

1

1

131

107

66

1

2

3

3

1

1

262.0061

27

2

1

4

1

53

1

1

1

0

1

59.65136

48.54453

1

0

4

27.16421

1

1

1

1

1

1

180

233

86

2

1

3

0

1

1

345.401

60

1

1

4

1

54

1

1

2

0

1

56.80657

46.73991

1

0

5

25.71786

1

1

1

1

1

1

171

139

96

1

18

3

1

1

1

292.3271

9

1

1

4

1

55

1

1

3

0

2

46.3419

27.92331

1

0

2

26.66936

2

9

1

9

2

9

125

NA

NA

2

20

3

0

1

2

257.8278

8121

NA

3

2

1

54

0

0

0

1

2

48.71791

38.03807

2

0

1

25.96576

2

9

1

9

1

9

NA

NA

NA

9

18

3

3

1

1

459.4562

21

1

3

2

1

55

1

1

1

0

2

45.41483

37.32204

1

0

2

26.96037

2

9

1

9

1

1

134

NA

NA

4

18

3

2

1

1

263.0693

48

1

3

2

1

56

1

1

2

0

2

48.05706

31.80529

2

0

1

27.56218

2

1

1

2

1

1

105

104

44

1

21

3

1

1

1

238.6691

20

1

3

2

1

60

1

1

6

0

2

48.53362

42.05479

1

0

9

27.5511

2

1

1

2

1

1

141

162

73

2

23

3

0

1

1

253.8643

32

1

3

2

1

61

1

1

7

0

2

58.60481

34.22034

2

0

2

27.08387

2

1

1

2

1

1

153

127

84

1

17

3

1

1

1

246.149

70

1

3

2

1

62

1

1

8

0

3

40.22337

60.0697

1

4

6

28.59085

9

9

1

1

1

1

170

NA

NA

2

18

2

0

1

2

563.1223

4001.556

NA

1

7

1

47

0

0

0

1

3

44.42011

62.71705

1

4

6

28.3532

9

9

1

1

1

1

170

NA

NA

2

22

2

0

1

2

488.91

2020

1

1

7

1

48

1

1

1

1

3

41.70079

58.5145

2

4

6

28.1851

1

1

1

1

1

1

180

82

127

2

23

2

0

1

2

405.1816

27.50917

1

1

7

1

49

1

1

2

1

3

50.03523

51.93553

1

0

NA

26.76923

1

9

1

2

1

1

180

NA

NA

9

14

2

1

1

2

366.3389

4.219373

2

1

7

1

51

1

1

4

1

3

55.93561

56.14367

2

0

7

30.42388

1

9

2

1

1

1

162

NA

NA

2

1

2

2

1

2

416.9268

6.452357

1

1

7

1

52

1

1

5

1

3

58.07309

52.45267

1

0

7

31.26218

1

9

1

1

1

1

137

NA

NA

2

1

2

1

1

1

327.5391

6.726179

1

1

7

1

53

1

1

6

0

3

59.91793

54.91597

1

0

7

27.51856

1

9

1

1

1

1

132

NA

NA

2

1

2

1

1

1

256.2515

3.959164

2

1

7

1

54

1

1

7

0

3

58.09464

57.69027

2

0

NA

30.03093

1

9

1

2

1

1

165

NA

NA

2

0

2

2

1

1

317.7432

3.726179

1

1

7

1

55

1

1

8

0

 

The Import:

/*Set up Library*/

%LET CourseRoot = C:\LOCAL FILE;

LIBNAME ProjectA "&CourseRoot\Project A\Code";

 

/*Import Data*/

proc import

out=Project1.rawdata

datafile='C:DataFilePath.csv'

dbms=csv replace;

run;

The Log/Error:

22   /*Set up Library*/

23   %LET CourseRoot = C:LOCALFILE;

24   LIBNAME Project1 "&CourseRoot\Project A\Code";

NOTE: Libref PROJECTA was successfully assigned as follows:

      Engine:        V9

      Physical Name: C:\BIOS 6623 SAS\Project A\Code

25

26   /*Import Data*/

27   proc import

28   out=ProjectA.rawdata

29   datafile='C:\FILELOCATION.csv'

30   dbms=csv replace;

31   run;

 

32    /**********************************************************************

33    *   PRODUCT:   SAS

34    *   VERSION:   9.4

35    *   CREATOR:   External File Interface

36    *   DATE:      25OCT20

37    *   DESC:      Generated SAS Datastep Code

38    *   TEMPLATE SOURCE:  (None Specified.)

39    ***********************************************************************/

40       data PROJECTA.RAWDATA    ;

41       %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

42       infile 'C:\FILELOCATION.csv' delimiter = ',' MISSOVER

42 ! DSD lrecl=32767 firstobs=2 ;

43          informat newid best32. ;

44          informat AGG_MENT best32. ;

45          informat AGG_PHYS best32. ;

46          informat HASHV best32. ;

47          informat HASHF $2. ;

48          informat income $2. ;

49          informat BMI best32. ;

50          informat HBP best32. ;

51          informat DIAB best32. ;

52          informat LIV34 best32. ;

53          informat KID best32. ;

54          informat FRP best32. ;

55          informat FP best32. ;

56          informat TCHOL $3. ;

57          informat TRIG $3. ;

58          informat LDL $3. ;

59          informat DYSLIP best32. ;

60          informat CESD best32. ;

61          informat SMOKE best32. ;

62          informat DKGRP best32. ;

63          informat HEROPIATE best32. ;

64          informat IDU best32. ;

65          informat LEU3N best32. ;

66          informat VLOAD best32. ;

67          informat ADH $2. ;

68          informat RACE best32. ;

69          informat EDUCBAS best32. ;

70          informat hivpos best32. ;

71          informat age best32. ;

72          informat ART best32. ;

73          informat everART best32. ;

74          informat years best32. ;

75          informat hard_drugs best32. ;

76          format newid best12. ;

77          format AGG_MENT best12. ;

78          format AGG_PHYS best12. ;

79          format HASHV best12. ;

80          format HASHF $2. ;

81          format income $2. ;

82          format BMI best12. ;

83          format HBP best12. ;

84          format DIAB best12. ;

85          format LIV34 best12. ;

86          format KID best12. ;

87          format FRP best12. ;

88          format FP best12. ;

89          format TCHOL $3. ;

90          format TRIG $3. ;

91          format LDL $3. ;

92          format DYSLIP best12. ;

93          format CESD best12. ;

94          format SMOKE best12. ;

95          format DKGRP best12. ;

96          format HEROPIATE best12. ;

97          format IDU best12. ;

98          format LEU3N best12. ;

99          format VLOAD best12. ;

100         format ADH $2. ;

101         format RACE best12. ;

102         format EDUCBAS best12. ;

103         format hivpos best12. ;

104         format age best12. ;

105         format ART best12. ;

106         format everART best12. ;

107         format years best12. ;

108         format hard_drugs best12. ;

109      input

110                  newid

111                  AGG_MENT

112                  AGG_PHYS

113                  HASHV

114                  HASHF  $

115                  income  $

116                  BMI

117                  HBP

118                  DIAB

119                  LIV34

120                  KID

121                  FRP

122                  FP

123                  TCHOL  $

124                  TRIG  $

125                  LDL  $

126                  DYSLIP

127                  CESD

128                  SMOKE

129                  DKGRP

130                  HEROPIATE

131                  IDU

132                  LEU3N

133                  VLOAD

134                  ADH  $

135                  RACE

136                  EDUCBAS

137                  hivpos

138                  age

139                  ART

140                  everART

141                  years

142                  hard_drugs

143      ;

144      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

145      run;

 

NOTE: The infile 'C:\FILEPATH.csv' is:

      Filename=C:\FILEPATH.csv,

      RECFM=V,LRECL=32767,File Size (bytes)=425108,

      Last Modified=26Sep2020:17:20:45,

      Create Time=26Sep2020:17:20:44

 

NOTE: Invalid data for HASHV in line 23 27-28.

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-

23        4,50.79010982,57.02273438,NA,0,3,21.1071036,1,3,1,1,1,1,234,NA,NA,4,9,3,1,1,1,313.4508

      87  076,29,2,1,5,1,47,1,1,3,0 111

newid=4 AGG_MENT=50.79010982 AGG_PHYS=57.02273438 HASHV=. HASHF=0 income=3 BMI=21.1071036 HBP=1

DIAB=3 LIV34=1 KID=1 FRP=1 FP=1 TCHOL=234 TRIG=NA LDL=NA DYSLIP=4 CESD=9 SMOKE=3 DKGRP=1

HEROPIATE=1 IDU=1 LEU3N=313.4508076 VLOAD=29 ADH=2 RACE=1 EDUCBAS=5 hivpos=1 age=47 ART=1

everART=1 years=3 hard_drugs=0 _ERROR_=1 _N_=22

NOTE: Invalid data for HASHV in line 30 27-28.

NOTE: Invalid data for HEROPIATE in line 30 77-78.

30        5,56.21993476,30.47054901,NA,3,2,24.97865422,1,9,1,9,1,9,204,192,NA,9,0,3,1,NA,2,92.66

      87  33977,30389,1,1,7,1,54,1,1,1,1 116

newid=5 AGG_MENT=56.21993476 AGG_PHYS=30.47054901 HASHV=. HASHF=3 income=2 BMI=24.97865422 HBP=1

DIAB=9 LIV34=1 KID=9 FRP=1 FP=9 TCHOL=204 TRIG=192 LDL=NA DYSLIP=9 CESD=0 SMOKE=3 DKGRP=1

HEROPIATE=. IDU=2 LEU3N=92.6633977 VLOAD=30389 ADH=1 RACE=1 EDUCBAS=7 hivpos=1 age=54 ART=1

everART=1 years=1 hard_drugs=1 _ERROR_=1 _N_=29

NOTE: Invalid data for HEROPIATE in line 44 75-76.

44        8,49.47258363,55.23964561,1,2,4,28.85804285,9,9,1,1,1,1,NA,NA,NA,9,10,3,3,NA,2,552.919

      87  9764,326013,NA,1,4,1,30,0,0,0,1 117

newid=8 AGG_MENT=49.47258363 AGG_PHYS=55.23964561 HASHV=1 HASHF=2 income=4 BMI=28.85804285 HBP=9

DIAB=9 LIV34=1 KID=1 FRP=1 FP=1 TCHOL=NA TRIG=NA LDL=NA DYSLIP=9 CESD=10 SMOKE=3 DKGRP=3

HEROPIATE=. IDU=2 LEU3N=552.9199764 VLOAD=326013 ADH=NA RACE=1 EDUCBAS=4 hivpos=1 age=30 ART=0

everART=0 years=0 hard_drugs=1 _ERROR_=1 _N_=43

NOTE: Invalid data for DKGRP in line 52 72-73.

52        9,37.85690949,54.04586028,1,0,7,22.9624659,2,9,1,1,1,1,NA,NA,NA,9,16,2,NA,1,2,434.8355

      87  223,4.726178748,2,1,7,1,52,1,1,5,1 120

newid=9 AGG_MENT=37.85690949 AGG_PHYS=54.04586028 HASHV=1 HASHF=0 income=7 BMI=22.9624659 HBP=2

DIAB=9 LIV34=1 KID=1 FRP=1 FP=1 TCHOL=NA TRIG=NA LDL=NA DYSLIP=9 CESD=16 SMOKE=2 DKGRP=.

HEROPIATE=1 IDU=2 LEU3N=434.8355223 VLOAD=4.726178748 ADH=2 RACE=1 EDUCBAS=7 hivpos=1 age=52

ART=1 everART=1 years=5 hard_drugs=1 _ERROR_=1 _N_=51

NOTE: Invalid data for DKGRP in line 55 75-76.

55        10,40.42751974,51.17592208,2,0,1,21.07192056,1,9,1,9,1,1,133,NA,NA,9,15,2,NA,1,1,69.36

      87  626023,424291,2,3,2,1,42,1,1,2,0 118

newid=10 AGG_MENT=40.42751974 AGG_PHYS=51.17592208 HASHV=2 HASHF=0 income=1 BMI=21.07192056

HBP=1 DIAB=9 LIV34=1 KID=9 FRP=1 FP=1 TCHOL=133 TRIG=NA LDL=NA DYSLIP=9 CESD=15 SMOKE=2 DKGRP=.

HEROPIATE=1 IDU=1 LEU3N=69.36626023 VLOAD=424291 ADH=2 RACE=3 EDUCBAS=2 hivpos=1 age=42 ART=1

everART=1 years=2 hard_drugs=0 _ERROR_=1 _N_=54

NOTE: Invalid data for BMI in line 59 35-36.

59        11,35.70460086,44.29373175,1,0,NA,NA,9,1,1,9,1,9,97,36,63,2,26,3,1,2,1,334.1167999,188

      87  14.65713,3,3,2,1,34,1,1,2,1 113

newid=11 AGG_MENT=35.70460086 AGG_PHYS=44.29373175 HASHV=1 HASHF=0 income=NA BMI=. HBP=9 DIAB=1

LIV34=1 KID=9 FRP=1 FP=9 TCHOL=97 TRIG=36 LDL=63 DYSLIP=2 CESD=26 SMOKE=3 DKGRP=1 HEROPIATE=2

IDU=1 LEU3N=334.1167999 VLOAD=18814.65713 ADH=3 RACE=3 EDUCBAS=2 hivpos=1 age=34 ART=1 everART=1

years=2 hard_drugs=1 _ERROR_=1 _N_=58

NOTE: Invalid data for BMI in line 79 34-35.

79        14,41.35498177,36.92849307,2,1,2,NA,1,9,1,1,2,9,280,NA,NA,9,17,3,0,1,2,265.1174831,13,

      87  2,3,4,1,48,1,1,8,1 104

newid=14 AGG_MENT=41.35498177 AGG_PHYS=36.92849307 HASHV=2 HASHF=1 income=2 BMI=. HBP=1 DIAB=9

LIV34=1 KID=1 FRP=2 FP=9 TCHOL=280 TRIG=NA LDL=NA DYSLIP=9 CESD=17 SMOKE=3 DKGRP=0 HEROPIATE=1

IDU=2 LEU3N=265.1174831 VLOAD=13 ADH=2 RACE=3 EDUCBAS=4 hivpos=1 age=48 ART=1 everART=1 years=8

hard_drugs=1 _ERROR_=1 _N_=78

NOTE: Invalid data for DKGRP in line 83 76-77.

83        15,24.29926218,39.77557131,1,3,2,20.11648424,2,2,1,2,1,1,136,144,46,1,28,3,NA,1,1,489.

      87  614708,8,2,3,4,1,53,1,1,3,0 113

newid=15 AGG_MENT=24.29926218 AGG_PHYS=39.77557131 HASHV=1 HASHF=3 income=2 BMI=20.11648424

HBP=2 DIAB=2 LIV34=1 KID=2 FRP=1 FP=1 TCHOL=136 TRIG=144 LDL=46 DYSLIP=1 CESD=28 SMOKE=3 DKGRP=.

HEROPIATE=1 IDU=1 LEU3N=489.614708 VLOAD=8 ADH=2 RACE=3 EDUCBAS=4 hivpos=1 age=53 ART=1

everART=1 years=3 hard_drugs=0 _ERROR_=1 _N_=82

NOTE: Invalid data for BMI in line 94 32-33.

94        16,39.7622484,48.4113255,1,0,4,NA,9,1,1,9,1,9,216,117,130,2,20,2,1,1,1,898.5309496,559

      87  ,2,1,6,1,58,1,1,7,0 105

newid=16 AGG_MENT=39.7622484 AGG_PHYS=48.4113255 HASHV=1 HASHF=0 income=4 BMI=. HBP=9 DIAB=1

LIV34=1 KID=9 FRP=1 FP=9 TCHOL=216 TRIG=117 LDL=130 DYSLIP=2 CESD=20 SMOKE=2 DKGRP=1 HEROPIATE=1

IDU=1 LEU3N=898.5309496 VLOAD=559 ADH=2 RACE=1 EDUCBAS=6 hivpos=1 age=58 ART=1 everART=1 years=7

hard_drugs=0 _ERROR_=1 _N_=93

NOTE: Invalid data for BMI in line 113 34-35.

113       20,41.74346997,57.74035835,2,0,2,NA,1,2,1,1,1,1,150,80,84,1,32,3,2,1,1,552.6720921,110

      87  2,NA,3,4,1,51,0,0,0,0 107

newid=20 AGG_MENT=41.74346997 AGG_PHYS=57.74035835 HASHV=2 HASHF=0 income=2 BMI=. HBP=1 DIAB=2

LIV34=1 KID=1 FRP=1 FP=1 TCHOL=150 TRIG=80 LDL=84 DYSLIP=1 CESD=32 SMOKE=3 DKGRP=2 HEROPIATE=1

IDU=1 LEU3N=552.6720921 VLOAD=1102 ADH=NA RACE=3 EDUCBAS=4 hivpos=1 age=51 ART=0 everART=0

years=0 hard_drugs=0 _ERROR_=1 _N_=112

NOTE: Invalid data for BMI in line 117 33-34.

117       21,20.6085118,63.19184148,2,4,1,NA,9,9,1,1,1,1,216,NA,NA,4,23,1,2,1,2,1030.013266,5,1,

      87  1,4,1,40,1,1,1,1 102

newid=21 AGG_MENT=20.6085118 AGG_PHYS=63.19184148 HASHV=2 HASHF=4 income=1 BMI=. HBP=9 DIAB=9

LIV34=1 KID=1 FRP=1 FP=1 TCHOL=216 TRIG=NA LDL=NA DYSLIP=4 CESD=23 SMOKE=1 DKGRP=2 HEROPIATE=1

IDU=2 LEU3N=1030.013266 VLOAD=5 ADH=1 RACE=1 EDUCBAS=4 hivpos=1 age=40 ART=1 everART=1 years=1

hard_drugs=1 _ERROR_=1 _N_=116

NOTE: Invalid data for DKGRP in line 132 77-78.

132       24,39.04482517,58.78267161,2,4,NA,29.31100018,1,1,1,1,1,1,180,82,127,2,23,2,NA,1,2,294

      87  .4764196,4.69895446,1,1,7,1,50,1,1,2,1 124

newid=24 AGG_MENT=39.04482517 AGG_PHYS=58.78267161 HASHV=2 HASHF=4 income=NA BMI=29.31100018

HBP=1 DIAB=1 LIV34=1 KID=1 FRP=1 FP=1 TCHOL=180 TRIG=82 LDL=127 DYSLIP=2 CESD=23 SMOKE=2 DKGRP=.

HEROPIATE=1 IDU=2 LEU3N=294.4764196 VLOAD=4.69895446 ADH=1 RACE=1 EDUCBAS=7 hivpos=1 age=50

ART=1 everART=1 years=2 hard_drugs=1 _ERROR_=1 _N_=131

NOTE: Invalid data for HASHV in line 134 28-29.

134       24,55.45852748,55.01564877,NA,0,NA,30.03363327,1,9,2,1,1,1,162,NA,NA,2,1,2,1,1,2,393.3

      87  508032,5.452357495,1,1,7,1,53,1,1,5,1 123

newid=24 AGG_MENT=55.45852748 AGG_PHYS=55.01564877 HASHV=. HASHF=0 income=NA BMI=30.03363327

HBP=1 DIAB=9 LIV34=2 KID=1 FRP=1 FP=1 TCHOL=162 TRIG=NA LDL=NA DYSLIP=2 CESD=1 SMOKE=2 DKGRP=1

HEROPIATE=1 IDU=2 LEU3N=393.3508032 VLOAD=5.452357495 ADH=1 RACE=1 EDUCBAS=7 hivpos=1 age=53

ART=1 everART=1 years=5 hard_drugs=1 _ERROR_=1 _N_=133

NOTE: Invalid data for DKGRP in line 137 73-74.

137       24,58.20620548,58.4404159,1,0,7,29.21812397,1,9,1,2,1,1,165,NA,NA,2,0,2,NA,1,1,318.716

      87  4998,2.972775712,1,1,7,1,56,1,1,8,0 121

newid=24 AGG_MENT=58.20620548 AGG_PHYS=58.4404159 HASHV=1 HASHF=0 income=7 BMI=29.21812397 HBP=1

DIAB=9 LIV34=1 KID=2 FRP=1 FP=1 TCHOL=165 TRIG=NA LDL=NA DYSLIP=2 CESD=0 SMOKE=2 DKGRP=.

HEROPIATE=1 IDU=1 LEU3N=318.7164998 VLOAD=2.972775712 ADH=1 RACE=1 EDUCBAS=7 hivpos=1 age=56

ART=1 everART=1 years=8 hard_drugs=0 _ERROR_=1 _N_=136

NOTE: Invalid data for HEROPIATE in line 139 76-77.

139       25,56.73641306,31.1083321,1,3,2,24.46095074,1,9,1,9,1,9,204,192,NA,9,0,3,2,NA,2,83.575

      87  86756,30384,2,1,7,1,55,1,1,1,1 116

newid=25 AGG_MENT=56.73641306 AGG_PHYS=31.1083321 HASHV=1 HASHF=3 income=2 BMI=24.46095074 HBP=1

DIAB=9 LIV34=1 KID=9 FRP=1 FP=9 TCHOL=204 TRIG=192 LDL=NA DYSLIP=9 CESD=0 SMOKE=3 DKGRP=2

HEROPIATE=. IDU=2 LEU3N=83.57586756 VLOAD=30384 ADH=2 RACE=1 EDUCBAS=7 hivpos=1 age=55 ART=1

everART=1 years=1 hard_drugs=1 _ERROR_=1 _N_=138

NOTE: Invalid data for HEROPIATE in line 150 76-77.

150       27,52.99936528,53.90043722,1,2,4,30.83560493,9,9,1,1,1,1,NA,NA,NA,9,10,3,0,NA,2,561.41

      87  0269,326007,NA,1,4,1,29,0,0,0,1 117

newid=27 AGG_MENT=52.99936528 AGG_PHYS=53.90043722 HASHV=1 HASHF=2 income=4 BMI=30.83560493

HBP=9 DIAB=9 LIV34=1 KID=1 FRP=1 FP=1 TCHOL=NA TRIG=NA LDL=NA DYSLIP=9 CESD=10 SMOKE=3 DKGRP=0

HEROPIATE=. IDU=2 LEU3N=561.410269 VLOAD=326007 ADH=NA RACE=1 EDUCBAS=4 hivpos=1 age=29 ART=0

everART=0 years=0 hard_drugs=1 _ERROR_=1 _N_=149

NOTE: Invalid data for DKGRP in line 157 77-78.

157       28,21.72158133,56.13800099,1,0,NA,24.34540024,2,1,1,1,1,1,184,85,114,1,21,2,NA,1,2,426

      87  .4473977,4.192148387,1,1,7,1,51,1,1,4,1 125

newid=28 AGG_MENT=21.72158133 AGG_PHYS=56.13800099 HASHV=1 HASHF=0 income=NA BMI=24.34540024

HBP=2 DIAB=1 LIV34=1 KID=1 FRP=1 FP=1 TCHOL=184 TRIG=85 LDL=114 DYSLIP=1 CESD=21 SMOKE=2 DKGRP=.

HEROPIATE=1 IDU=2 LEU3N=426.4473977 VLOAD=4.192148387 ADH=1 RACE=1 EDUCBAS=7 hivpos=1 age=51

ART=1 everART=1 years=4 hard_drugs=1 _ERROR_=1 _N_=156

NOTE: Invalid data for BMI in line 182 34-35.

182       32,41.17376252,38.73491783,2,1,2,NA,1,9,1,1,2,9,280,NA,NA,9,17,3,1,1,2,258.9808542,9,2

      87  ,3,4,1,49,1,1,8,1 103

newid=32 AGG_MENT=41.17376252 AGG_PHYS=38.73491783 HASHV=2 HASHF=1 income=2 BMI=. HBP=1 DIAB=9

LIV34=1 KID=1 FRP=2 FP=9 TCHOL=280 TRIG=NA LDL=NA DYSLIP=9 CESD=17 SMOKE=3 DKGRP=1 HEROPIATE=1

IDU=2 LEU3N=258.9808542 VLOAD=9 ADH=2 RACE=3 EDUCBAS=4 hivpos=1 age=49 ART=1 everART=1 years=8

hard_drugs=1 _ERROR_=1 _N_=181

NOTE: Invalid data for BMI in line 190 34-35.

190       33,39.04321297,50.08230628,1,0,4,NA,9,1,1,9,1,9,216,117,130,2,20,2,1,1,1,897.9483274,5

      87  68,2,1,6,1,55,1,1,7,0 107

newid=33 AGG_MENT=39.04321297 AGG_PHYS=50.08230628 HASHV=1 HASHF=0 income=4 BMI=. HBP=9 DIAB=1

LIV34=1 KID=9 FRP=1 FP=9 TCHOL=216 TRIG=117 LDL=130 DYSLIP=2 CESD=20 SMOKE=2 DKGRP=1 HEROPIATE=1

IDU=1 LEU3N=897.9483274 VLOAD=568 ADH=2 RACE=1 EDUCBAS=6 hivpos=1 age=55 ART=1 everART=1 years=7

hard_drugs=0 _ERROR_=1 _N_=189

NOTE: Invalid data for HASHV in line 201 28-29.

WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.

201       35,55.72507427,49.92695567,NA,0,3,21.44340399,1,1,1,1,1,2,177,66,127,2,11,3,1,1,1,426.

      87  1807571,4,1,3,3,1,64,1,1,4,0 114

newid=35 AGG_MENT=55.72507427 AGG_PHYS=49.92695567 HASHV=. HASHF=0 income=3 BMI=21.44340399

HBP=1 DIAB=1 LIV34=1 KID=1 FRP=1 FP=2 TCHOL=177 TRIG=66 LDL=127 DYSLIP=2 CESD=11 SMOKE=3 DKGRP=1

HEROPIATE=1 IDU=1 LEU3N=426.1807571 VLOAD=4 ADH=1 RACE=3 EDUCBAS=3 hivpos=1 age=64 ART=1

everART=1 years=4 hard_drugs=0 _ERROR_=1 _N_=200

NOTE: 3632 records were read from the infile 'C:\BIOS 6623 SAS\Project 1\Raw

      Data\hiv_dataset.csv'.

      The minimum record length was 88.

      The maximum record length was 127.

NOTE: The data set PROJECT1.RAWDATA has 3632 observations and 33 variables.

NOTE: DATA statement used (Total process time):

      real time           0.79 seconds

      cpu time            0.71 seconds

Errors detected in submitted DATA step. Examine log.

3632 rows created in PROJECTA.RAWDATA from C:\FILE PATH.csv.

 

ERROR: Import unsuccessful.  See SAS Log for details.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE IMPORT used (Total process time):

      real time           1.26 seconds

      cpu time            1.00 seconds

 

Thank you,

Danielle

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If all of these, or even many of these, have the same file layout then I would suggest copying the generated data step code and then modifying it for one set until it works correctly.

One thing you need to do is decide if those NA values should be numeric or character. and modify the Informat accordingly.

 

Since what you show is NOT a "CSV" file, but apparently a spreadsheet rendition of a CSV that could actually be part of your problem. Spreadsheets have been known to do nasty things to CSV, sometimes in columns you were not concerned with  (at first).

 

Did anyone give any documentation about what these files should look like, column types, layouts and such? That may be a place to start with modifying code to read non-Excel corrupted versions of the CSV.

 

An example of @jimbarbour's suggestion for how to cut down on all of the errors of NA in numeric variables:

data example;
   informat x best16.;
   input x ??;
   format x best16.;
datalines;
123
3
456.789033433
NA
0.0000044
;

 

 

View solution in original post

4 REPLIES 4
jimbarbour
Meteorite | Level 14

@dthompsonada,

 

Danielle,

 

The errors I'm seeing in your SAS log are caused by the text "NA" being present in numeric fields.  SAS issues an error and then sets the value to Missing -- which is probably what you want.  The data is probably good at this point.  You can suppress the invalid data errors with the double question mark notation (??), but you'd have to edit the Data step code generated by the proc.

 

As for why downstream data is getting corrupted, I think I would need to see a specific example starting with the raw data, then seeing the import program and log, and then seeing the downstream code and log as well as the final, corrupted, output.

 

Jim

andreas_lds
Jade | Level 19

This is a normal problem when using proc import instead of a data-step. The procedure sees numbers and guesses that the variable should be numeric, but with the value "NA" reading the variable fails. So you have to decide, whether the variables should be numeric (the "na" must be replaced by missing during reading the data) or alphanumeric.  If you don't have to use the variable in calculations of any kind, i would use alphanumeric.

ballardw
Super User

If all of these, or even many of these, have the same file layout then I would suggest copying the generated data step code and then modifying it for one set until it works correctly.

One thing you need to do is decide if those NA values should be numeric or character. and modify the Informat accordingly.

 

Since what you show is NOT a "CSV" file, but apparently a spreadsheet rendition of a CSV that could actually be part of your problem. Spreadsheets have been known to do nasty things to CSV, sometimes in columns you were not concerned with  (at first).

 

Did anyone give any documentation about what these files should look like, column types, layouts and such? That may be a place to start with modifying code to read non-Excel corrupted versions of the CSV.

 

An example of @jimbarbour's suggestion for how to cut down on all of the errors of NA in numeric variables:

data example;
   informat x best16.;
   input x ??;
   format x best16.;
datalines;
123
3
456.789033433
NA
0.0000044
;

 

 

dthompsonada
Obsidian | Level 7

Thanks for that, BallardW.

 

The data was, indeed, provided in an MS Excel .csv format. Sadly, I am not going to be able to dictate this going forward and it is the only file that I have received in this layout.  I did convert all of the NAs to numeric "." after import...once I got there.

I was given a codebook for variables but no other guidance on layout.

Thanks again!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 528 views
  • 5 likes
  • 4 in conversation