BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lih428
Calcite | Level 5

Hi,

I started working with SAS University edition in my graduate class.

I had a problem. 

When I create the library and save a SAS system file, the CVS dataset shows incomplete rows.

In the original CVS dataset, I have 100 observation, when it's opened in SAS, only shows 50 observation. All the variables are included in the SAS dataset.

This is the code that I am using.

libname lib2021 '/folders/myfolders';

data lib2021.ambEEG1;
infile '/folders/myfolders/Ambulatory/AmbulatorySASquestion.csv' dsd
dlm=',' firstobs=2;
input FRST$ SECMD$ FUweeksnew CENSORED OUTCOME2 SEX AgeSUS AgeSUS Agegroups2$ FAMHISEPI$ FEBRSZ$ STROKE$ HTN$ BRAINTMR$ HEADTRM$ Devlpmtldisab$ PSYCH$ CONFUS$ INCONTNCE$ NOCTURSZ$ TNGBITT$ AUTOMATS$ IMAGINE$ AMB$;
run;

 

Please find the CVS dataset. and a picture of the SAS table properties after 

QuestionSAScommunity.png

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Make sure to delete the duplicate AgeSUS from the INPUT statement as well, otherwise I'm not even sure what SAS would do with your data at that point.

View solution in original post

8 REPLIES 8
ballardw
Super User

Please show the LOG entry for the data step in question. Copy the entire text including all messages and notes. Paste into a text box opened on the forum with the </> icon to preserve formatting of text and any diagnostic messages.

 

If possible copy the first 5 to 10 lines from the CSV and paste into a text box as well. When doing this do not copy from spreadsheet software. Open the file with a plain text program like the Notepad or even the SAS editor.

 

If any of your records do not have all 25 columns you need to tell SAS that may happen. If you get notes in the log about "SAS continued to the next line" then that can cause incomplete data though usually one or more of your variables will have data that are inconsistent.

 

With one of your variables duplicated on the INPUT statement you may have more variables than columns of data, which would cause the above mentioned problem. At a minimum this reads two sequential values into a single variable and you only have one resulting value kept in the data.

input FRST$ SECMD$ FUweeksnew CENSORED OUTCOME2 SEX AgeSUS AgeSUS Agegroups2$ FAMHISEPI$ FEBRSZ$ STROKE$ HTN$ BRAINTMR$ HEADTRM$ Devlpmtldisab$ PSYCH$ CONFUS$ INCONTNCE$ NOCTURSZ$ TNGBITT$ AUTOMATS$ IMAGINE$ AMB$;

 

Tom
Super User Tom
Super User

The INPUT statement read 24 values into 23 variables.  Since the lines only have 23 values without the TRUNCOVER option it will read two lines for every observation processed.

 

Just removing the duplicate variable name should fix the issue.

But adding TRUNCOVER to the INFILE statement is also a good idea in case there are any short lines.

 

I am not sure why you are reading all of the variables as character strings. The values all look like numbers to me.

filename csv 'c:\downloads\AmbulatorySASquestion.csv' ;
filename header temp;

data _null_;
  infile csv dsd obs=1 ;
  file header ;
  if _n_ = 1 then put 'input ' @;
  input name :$32. @@ ;
  put name @;
run;

data ambEEG1;
  infile csv dsd firstobs=2 truncover ;
%include header / source2;
  ;
run;

proc print data=ambEEG1;
run;

Log:

103   data ambEEG1;
104     infile csv dsd firstobs=2 truncover ;
105   %include header / source2;
NOTE: %INCLUDE (level 1) file HEADER is file ...\#LN00061.
106  +input FRST SECMD FUweeksnew CENSORED OUTCOME2 SEX AgeSUS Agegroups2 FAMHISEPI FEBRSZ STROKE HTN
106 !+BRAINTMR HEADTRM Devlpmtldisab PSYCH CONFUS INCONTNCE NOCTURSZ TNGBITT AUTOMATS Imagine AMB
NOTE: %INCLUDE (level 1) ending.
107     ;
108   run;
NOTE: 100 records were read from the infile CSV.
      The minimum record length was 46.
      The maximum record length was 48.
NOTE: The data set WORK.AMBEEG1 has 100 observations and 23 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
lih428
Calcite | Level 5

THANK YOU!!!!!

I added "trunkover" after "dsd" and I have the full dataset!!!!!!!!

Reeza
Super User
Make sure to delete the duplicate AgeSUS from the INPUT statement as well, otherwise I'm not even sure what SAS would do with your data at that point.
Shmuel
Garnet | Level 18

1) Please post the full log of running code and the csv file as is.

2) You assigned dlm=',' but the csv file you posted is either tab limited or an excel view.

3) Try add option truncover to the infile statement.

Tom
Super User Tom
Super User

@Shmuel wrote:

1) Please post the full log of running code and the csv file as is.

2) You assigned dlm=',' but the csv file you posted is either tab limited or an excel view.

3) Try add option truncover to the infile statement.


The file is a CSV file, but the viewer that this forum uses for CSV files does not show the text of the file. Instead it displays it in a spreadsheet layout.

lih428
Calcite | Level 5

Hi Tom,

The "turnover" was exactly what fixed the problem.

However, in the last variable, "AMB$," the values are not displayed in SAS.

Any other recommendations?

Tom
Super User Tom
Super User

The INPUT statement read 24 values into 23 variables.  

Remove the duplicate variable name from the INPUT statement.

input FRST SECMD FUweeksnew CENSORED OUTCOME2 SEX 
      AgeSUS Agegroups2 FAMHISEPI FEBRSZ STROKE HTN
      BRAINTMR HEADTRM Devlpmtldisab PSYCH CONFUS INCONTNCE
      NOCTURSZ TNGBITT AUTOMATS Imagine AMB 
;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2065 views
  • 4 likes
  • 5 in conversation