Hi, I'm new to SAS programming and I'm working on a homework project. Here is a sample of the data from the csv file:
id,trtgroup,sex,race,age,smoker,sites,attachbase,attach1year,pdbase,pd1year
101,4,2,5,44.57221082,1,162,2.432098765,2.577639752,3.24691358,3.407407407
102,5,2,5,35.57289528,1,162,2.543209877,NA,3.00617284,NA
103,2,2,5,47.94524298,1,144,2.881944444,3.076388889,3.118055556,3.125
.
.
120,3,2,5,41.83709788,1,150,3.886666667,3.92,4.2,3.88
Age is importing correctly as the decimial is the same length, but the other variables have varying lengths.
Here is my code:
*Import data into SAS *;
DATA pdstudy;
INFILE "/home/u59281128/Project 1/project1data.csv" DLM = ',' MISSOVER FIRSTOBS = 2;
INPUT id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
RUN;
The values look correct in the output data, but the log shows a warning, here is the log:
Can someone help me understand what I am missing?
I don't think has anything to do with number of decimal places.
The issue is the CSV file has CHARACTER STRINGS in some of the fields, but you are telling the data step they should be read as NUMBERS.
I suspect that the NA text is supposed to indicate a MISSING value.
As long as there are no fields where NA could be a VALID value (perhaps the symbol use for Sodium in Chemistry), then you could try removing them from the data line before reading the values
data pdstudy;
infile "/home/u59281128/Project 1/project1data.csv" dsd truncover firstobs=2;
_infile_ = cats(',',_infile_,',');
_infile_ = tranwrd(_infile_,',NA,',',,');
input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
run;
Example:
filename csv temp;
options parmcards=csv;
parmcards;
id,trtgroup,sex,race,age,smoker,sites,attachbase,attach1year,pdbase,pd1year
101,4,2,5,44.57221082,1,162,2.432098765,2.577639752,3.24691358,3.407407407
102,5,2,5,35.57289528,1,162,2.543209877,NA,3.00617284,NA
103,2,2,5,47.94524298,1,144,2.881944444,3.076388889,3.118055556,3.125
;
data pdstudy;
infile CSV dsd truncover firstobs=2;
input @;
_infile_ = cats(',',_infile_,',');
_infile_ = tranwrd(_infile_,',NA,',',,');
input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
run;
proc print;
run;
Log:
1339 data pdstudy; 1340 infile CSV dsd truncover firstobs=2; 1341 input @; 1342 _infile_ = cats(',',_infile_,','); 1343 _infile_ = tranwrd(_infile_,',NA,',',,'); 1344 input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ; 1345 run; NOTE: The infile CSV is: (system-specific pathname), (system-specific file attributes) NOTE: 3 records were read from the infile (system-specific pathname). The minimum record length was 56. The maximum record length was 74. NOTE: The data set WORK.PDSTUDY has 3 observations and 11 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
Results
OBS id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year 1 101 4 2 5 44.5722 1 162 2.43210 2.57764 3.24691 3.40741 2 102 5 2 5 35.5729 1 162 2.54321 . 3.00617 . 3 103 2 2 5 47.9452 1 144 2.88194 3.07639 3.11806 3.12500
I don't think has anything to do with number of decimal places.
The issue is the CSV file has CHARACTER STRINGS in some of the fields, but you are telling the data step they should be read as NUMBERS.
I suspect that the NA text is supposed to indicate a MISSING value.
As long as there are no fields where NA could be a VALID value (perhaps the symbol use for Sodium in Chemistry), then you could try removing them from the data line before reading the values
data pdstudy;
infile "/home/u59281128/Project 1/project1data.csv" dsd truncover firstobs=2;
_infile_ = cats(',',_infile_,',');
_infile_ = tranwrd(_infile_,',NA,',',,');
input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
run;
Example:
filename csv temp;
options parmcards=csv;
parmcards;
id,trtgroup,sex,race,age,smoker,sites,attachbase,attach1year,pdbase,pd1year
101,4,2,5,44.57221082,1,162,2.432098765,2.577639752,3.24691358,3.407407407
102,5,2,5,35.57289528,1,162,2.543209877,NA,3.00617284,NA
103,2,2,5,47.94524298,1,144,2.881944444,3.076388889,3.118055556,3.125
;
data pdstudy;
infile CSV dsd truncover firstobs=2;
input @;
_infile_ = cats(',',_infile_,',');
_infile_ = tranwrd(_infile_,',NA,',',,');
input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ;
run;
proc print;
run;
Log:
1339 data pdstudy; 1340 infile CSV dsd truncover firstobs=2; 1341 input @; 1342 _infile_ = cats(',',_infile_,','); 1343 _infile_ = tranwrd(_infile_,',NA,',',,'); 1344 input @2 id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year ; 1345 run; NOTE: The infile CSV is: (system-specific pathname), (system-specific file attributes) NOTE: 3 records were read from the infile (system-specific pathname). The minimum record length was 56. The maximum record length was 74. NOTE: The data set WORK.PDSTUDY has 3 observations and 11 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
Results
OBS id trtgroup sex race age smoker sites attachbase attach1year pdbase pd1year 1 101 4 2 5 44.5722 1 162 2.43210 2.57764 3.24691 3.40741 2 102 5 2 5 35.5729 1 162 2.54321 . 3.00617 . 3 103 2 2 5 47.9452 1 144 2.88194 3.07639 3.11806 3.12500
Thank you for your quick reply, NA were indeed missing values and that solution worked perfectly!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.