BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Zaichik17
Calcite | Level 5

Hi, I'm new to SAS programming and I'm working on a homework project.  Here is a sample of the data from the csv file:

 

id,trtgroup,sex,race,age,smoker,sites,attachbase,attach1year,pdbase,pd1year
101,4,2,5,44.57221082,1,162,2.432098765,2.577639752,3.24691358,3.407407407
102,5,2,5,35.57289528,1,162,2.543209877,NA,3.00617284,NA
103,2,2,5,47.94524298,1,144,2.881944444,3.076388889,3.118055556,3.125
.

.

120,3,2,5,41.83709788,1,150,3.886666667,3.92,4.2,3.88

 

Age is importing correctly as the decimial is the same length, but the other variables have varying lengths.

 

Here is my code:

 

*Import data into SAS *;

DATA pdstudy;
INFILE "/home/u59281128/Project 1/project1data.csv" DLM = ',' MISSOVER FIRSTOBS = 2;
INPUT id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
RUN;

 

The values look correct in the output data, but the log shows a warning, here is the log:

Can someone help me understand what I am missing?

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 DATA pdstudy;
70 INFILE "/home/u59281128/Project 1/project1data.csv" DLM = ',' MISSOVER FIRSTOBS = 2;
71 INPUT id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
72
73 RUN;
 
NOTE: The infile "/home/u59281128/Project 1/project1data.csv" is:
Filename=/home/u59281128/Project 1/project1data.csv,
Owner Name=u59281128,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=29Aug2022:20:10:52,
File Size (bytes)=9309
 
NOTE: Invalid data for attach1year in line 3 41-42.
NOTE: Invalid data for pd1year in line 3 55-56.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
3 102,5,2,5,35.57289528,1,162,2.543209877,NA,3.00617284,NA 56
id=102 trtgroup=5 sex=2 race=5 age=35.57289528 smoker=1 sites=162 attachbase=2.543209877 attach1year=. pdbase=3.00617284 pd1year=.
_ERROR_=1 _N_=2
NOTE: Invalid data for attach1year in line 12 41-42.
NOTE: Invalid data for pd1year in line 12 56-57.
12 111,3,2,2,39.03353867,1,156,2.019230769,NA,3.301282051,NA 57
id=111 trtgroup=3 sex=2 race=2 age=39.03353867 smoker=1 sites=156 attachbase=2.019230769 attach1year=. pdbase=3.301282051 pd1year=.
_ERROR_=1 _N_=11
NOTE: Invalid data for attach1year in line 15 41-42.
NOTE: Invalid data for pd1year in line 15 56-57.
15 114,4,2,5,40.71731691,1,168,1.946428571,NA,3.422619048,NA 57
id=114 trtgroup=4 sex=2 race=5 age=40.71731691 smoker=1 sites=168 attachbase=1.946428571 attach1year=. pdbase=3.422619048 pd1year=.
_ERROR_=1 _N_=14
NOTE: Invalid data for attach1year in line 24 41-42.
NOTE: Invalid data for pd1year in line 24 56-57.
24 123,1,2,5,55.99726215,1,156,2.051282051,NA,3.134615385,NA 57
id=123 trtgroup=1 sex=2 race=5 age=55.99726215 smoker=1 sites=156 attachbase=2.051282051 attach1year=. pdbase=3.134615385 pd1year=.
_ERROR_=1 _N_=23
NOTE: Invalid data for age in line 49 11-12.
49 148,1,2,5,NA,0,138,3.644927536,3.101449275,3.239130435,2.905797101 66
id=148 trtgroup=1 sex=2 race=5 age=. smoker=0 sites=138 attachbase=3.644927536 attach1year=3.101449275 pdbase=3.239130435
pd1year=2.905797101 _ERROR_=1 _N_=48
NOTE: Invalid data for attach1year in line 54 41-42.
NOTE: Invalid data for pd1year in line 54 50-51.
54 153,5,2,5,66.83641342,0,144,2.708333333,NA,3.375,NA 51
id=153 trtgroup=5 sex=2 race=5 age=66.83641342 smoker=0 sites=144 attachbase=2.708333333 attach1year=. pdbase=3.375 pd1year=.
_ERROR_=1 _N_=53
NOTE: Invalid data for attach1year in line 57 41-42.
NOTE: Invalid data for pd1year in line 57 56-57.
57 156,4,2,5,37.45379877,0,168,2.833333333,NA,3.077380952,NA 57
id=156 trtgroup=4 sex=2 race=5 age=37.45379877 smoker=0 sites=168 attachbase=2.833333333 attach1year=. pdbase=3.077380952 pd1year=.
_ERROR_=1 _N_=56
NOTE: Invalid data for attach1year in line 60 41-42.
NOTE: Invalid data for pd1year in line 60 56-57.
60 159,3,2,5,40.73648186,0,168,1.648809524,NA,2.970238095,NA 57
id=159 trtgroup=3 sex=2 race=5 age=40.73648186 smoker=0 sites=168 attachbase=1.648809524 attach1year=. pdbase=2.970238095 pd1year=.
_ERROR_=1 _N_=59
NOTE: Invalid data for attach1year in line 61 41-42.
NOTE: Invalid data for pd1year in line 61 56-57.
61 160,1,2,5,41.45379877,0,162,1.135802469,NA,3.037037037,NA 57
id=160 trtgroup=1 sex=2 race=5 age=41.45379877 smoker=0 sites=162 attachbase=1.135802469 attach1year=. pdbase=3.037037037 pd1year=.
_ERROR_=1 _N_=60
NOTE: Invalid data for attach1year in line 62 41-42.
NOTE: Invalid data for pd1year in line 62 56-57.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
62 162,3,2,5,71.90143737,0,144,2.416666667,NA,2.520833333,NA 57
id=162 trtgroup=3 sex=2 race=5 age=71.90143737 smoker=0 sites=144 attachbase=2.416666667 attach1year=. pdbase=2.520833333 pd1year=.
_ERROR_=1 _N_=61
NOTE: Invalid data for attach1year in line 71 34-35.
NOTE: Invalid data for pd1year in line 71 42-43.
71 201,2,1,5,38.58179329,1,150,2.62,NA,3.74,NA 43
id=201 trtgroup=2 sex=1 race=5 age=38.58179329 smoker=1 sites=150 attachbase=2.62 attach1year=. pdbase=3.74 pd1year=. _ERROR_=1
_N_=70
NOTE: Invalid data for attach1year in line 72 41-42.
NOTE: Invalid data for pd1year in line 72 56-57.
72 202,5,1,5,44.08761123,1,168,2.011904762,NA,2.869047619,NA 57
id=202 trtgroup=5 sex=1 race=5 age=44.08761123 smoker=1 sites=168 attachbase=2.011904762 attach1year=. pdbase=2.869047619 pd1year=.
_ERROR_=1 _N_=71
NOTE: Invalid data for attach1year in line 74 34-35.
NOTE: Invalid data for pd1year in line 74 49-50.
74 204,4,1,5,55.24161533,1,150,2.04,NA,3.133333333,NA 50
id=204 trtgroup=4 sex=1 race=5 age=55.24161533 smoker=1 sites=150 attachbase=2.04 attach1year=. pdbase=3.133333333 pd1year=.
_ERROR_=1 _N_=73
NOTE: Invalid data for attach1year in line 75 41-42.
NOTE: Invalid data for pd1year in line 75 56-57.
75 205,3,1,5,56.12867899,1,162,2.166666667,NA,2.845679012,NA 57
id=205 trtgroup=3 sex=1 race=5 age=56.12867899 smoker=1 sites=162 attachbase=2.166666667 attach1year=. pdbase=2.845679012 pd1year=.
_ERROR_=1 _N_=74
NOTE: Invalid data for attach1year in line 78 41-42.
NOTE: Invalid data for pd1year in line 78 56-57.
78 208,2,1,5,46.55167693,1,168,1.547619048,NA,3.029761905,NA 57
id=208 trtgroup=2 sex=1 race=5 age=46.55167693 smoker=1 sites=168 attachbase=1.547619048 attach1year=. pdbase=3.029761905 pd1year=.
_ERROR_=1 _N_=77
NOTE: Invalid data for attach1year in line 80 41-42.
NOTE: Invalid data for pd1year in line 80 56-57.
80 210,4,1,5,54.48323066,1,156,2.064102564,NA,3.326923077,NA 57
id=210 trtgroup=4 sex=1 race=5 age=54.48323066 smoker=1 sites=156 attachbase=2.064102564 attach1year=. pdbase=3.326923077 pd1year=.
_ERROR_=1 _N_=79
NOTE: Invalid data for attach1year in line 83 41-42.
NOTE: Invalid data for pd1year in line 83 56-57.
83 213,5,1,5,53.25941136,1,138,2.420289855,NA,3.101449275,NA 57
id=213 trtgroup=5 sex=1 race=5 age=53.25941136 smoker=1 sites=138 attachbase=2.420289855 attach1year=. pdbase=3.101449275 pd1year=.
_ERROR_=1 _N_=82
NOTE: Invalid data for attach1year in line 89 41-42.
NOTE: Invalid data for pd1year in line 89 56-57.
89 219,5,1,5,34.12183436,1,168,1.744047619,NA,3.208333333,NA 57
id=219 trtgroup=5 sex=1 race=5 age=34.12183436 smoker=1 sites=168 attachbase=1.744047619 attach1year=. pdbase=3.208333333 pd1year=.
_ERROR_=1 _N_=88
NOTE: Invalid data for attach1year in line 103 41-42.
NOTE: Invalid data for pd1year in line 103 56-57.
103 242,2,1,5,51.61943874,0,156,1.217948718,NA,2.935897436,NA 57
id=242 trtgroup=2 sex=1 race=5 age=51.61943874 smoker=0 sites=156 attachbase=1.217948718 attach1year=. pdbase=2.935897436 pd1year=.
_ERROR_=1 _N_=102
NOTE: Invalid data for attach1year in line 104 41-42.
NOTE: Invalid data for pd1year in line 104 56-57.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
104 243,4,1,5,57.90006845,0,168,1.023809524,NA,2.428571429,NA 57
id=243 trtgroup=4 sex=1 race=5 age=57.90006845 smoker=0 sites=168 attachbase=1.023809524 attach1year=. pdbase=2.428571429 pd1year=.
_ERROR_=1 _N_=103
NOTE: 130 records were read from the infile "/home/u59281128/BIOS6623/Project 1/project1data.csv".
The minimum record length was 41.
The maximum record length was 76.
NOTE: The data set WORK.PDSTUDY has 130 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 898.09k
OS Memory 24744.00k
Timestamp 09/01/2022 01:20:44 AM
Step Count 134 Switch Count 2
Page Faults 0
Page Reclaims 132
Page Swaps 0
Voluntary Context Switches 19
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 280
 
 
74
75 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
85
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I don't think has anything to do with number of decimal places.

 

The issue is the CSV file has CHARACTER STRINGS in some of the fields, but you are telling the data step they should be read as NUMBERS.

 

I suspect that the NA text is supposed to indicate a MISSING value.

 

As long as there are no fields where NA could be a VALID value (perhaps the symbol use for Sodium in Chemistry),  then you could try removing them from the data line before reading the values

data pdstudy;
  infile "/home/u59281128/Project 1/project1data.csv" dsd truncover firstobs=2;
  _infile_ = cats(',',_infile_,',');
  _infile_ = tranwrd(_infile_,',NA,',',,');
  input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
run;

Example:

filename csv temp;
options parmcards=csv;
parmcards;
id,trtgroup,sex,race,age,smoker,sites,attachbase,attach1year,pdbase,pd1year
101,4,2,5,44.57221082,1,162,2.432098765,2.577639752,3.24691358,3.407407407
102,5,2,5,35.57289528,1,162,2.543209877,NA,3.00617284,NA
103,2,2,5,47.94524298,1,144,2.881944444,3.076388889,3.118055556,3.125
;

data pdstudy;
  infile CSV dsd truncover firstobs=2;
  input @;
  _infile_ = cats(',',_infile_,',');
  _infile_ = tranwrd(_infile_,',NA,',',,');
  input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
run;

proc print;
run;

Log:

1339  data pdstudy;
1340    infile CSV dsd truncover firstobs=2;
1341    input @;
1342    _infile_ = cats(',',_infile_,',');
1343    _infile_ = tranwrd(_infile_,',NA,',',,');
1344    input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
1345  run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 56.
      The maximum record length was 74.
NOTE: The data set WORK.PDSTUDY has 3 observations and 11 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Results

OBS     id    trtgroup    sex    race      age      smoker    sites    attachbase    attach1year     pdbase    pd1year

 1     101        4        2       5     44.5722       1       162       2.43210       2.57764      3.24691    3.40741
 2     102        5        2       5     35.5729       1       162       2.54321        .           3.00617     .
 3     103        2        2       5     47.9452       1       144       2.88194       3.07639      3.11806    3.12500

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

I don't think has anything to do with number of decimal places.

 

The issue is the CSV file has CHARACTER STRINGS in some of the fields, but you are telling the data step they should be read as NUMBERS.

 

I suspect that the NA text is supposed to indicate a MISSING value.

 

As long as there are no fields where NA could be a VALID value (perhaps the symbol use for Sodium in Chemistry),  then you could try removing them from the data line before reading the values

data pdstudy;
  infile "/home/u59281128/Project 1/project1data.csv" dsd truncover firstobs=2;
  _infile_ = cats(',',_infile_,',');
  _infile_ = tranwrd(_infile_,',NA,',',,');
  input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
run;

Example:

filename csv temp;
options parmcards=csv;
parmcards;
id,trtgroup,sex,race,age,smoker,sites,attachbase,attach1year,pdbase,pd1year
101,4,2,5,44.57221082,1,162,2.432098765,2.577639752,3.24691358,3.407407407
102,5,2,5,35.57289528,1,162,2.543209877,NA,3.00617284,NA
103,2,2,5,47.94524298,1,144,2.881944444,3.076388889,3.118055556,3.125
;

data pdstudy;
  infile CSV dsd truncover firstobs=2;
  input @;
  _infile_ = cats(',',_infile_,',');
  _infile_ = tranwrd(_infile_,',NA,',',,');
  input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
run;

proc print;
run;

Log:

1339  data pdstudy;
1340    infile CSV dsd truncover firstobs=2;
1341    input @;
1342    _infile_ = cats(',',_infile_,',');
1343    _infile_ = tranwrd(_infile_,',NA,',',,');
1344    input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
1345  run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 56.
      The maximum record length was 74.
NOTE: The data set WORK.PDSTUDY has 3 observations and 11 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Results

OBS     id    trtgroup    sex    race      age      smoker    sites    attachbase    attach1year     pdbase    pd1year

 1     101        4        2       5     44.5722       1       162       2.43210       2.57764      3.24691    3.40741
 2     102        5        2       5     35.5729       1       162       2.54321        .           3.00617     .
 3     103        2        2       5     47.9452       1       144       2.88194       3.07639      3.11806    3.12500
Zaichik17
Calcite | Level 5

Thank you for your quick reply, NA were indeed missing values and that solution worked perfectly! 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 415 views
  • 0 likes
  • 2 in conversation