BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mayasak
Quartz | Level 8

Hi,

 

I'm trying to input the following data sets. I have some errors in the resulted data (please check below:

 

data Disease_Inca;
input Person_Id Incident_ID MRN $ Last_Name $ First_Name $ Gender $;
cards;

122345 56788 M776 Doe John M
34567 12344 L87765 Mouse Mickey
90876 36476 009875 Mouse Minnie F
777654 227364 26376 Duck Donald M
56761 23676 M765 Pippin Mary F
667785 3624554 L97960 Smith Oliver M
345564 34567 9432 Johns Karen F
79908 123540 M78765 Doe Kevin M
677654 378754 L8544 Wayne Bruce M
run;

 

data Disease_NHS;
input MRN $ Last_Name $ First_Name $ Gender $;
datalines;
M776 Doe John M
L87765 Mouse Mickey M
26376 Duck Donald M
98332 Roe M
9432 Johns Karen F
M000911 Mill Lora F
L8544 Wayne Bruce M
S78524 Hill Jane F
run;

 

mayasak_1-1659052751995.png

 

 

mayasak_0-1659052683620.png

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@mayasak wrote:
I'm asking to know how to fix this in the input statement. And yes people with a space in their names is another issue that may happen.

The first thing you could try is to add TRUNCOVER option on an INFILE statement to prevent it from hunting to the next line when the line does not contain enough values.

data Disease_NHS;
  infile datalines truncover;
  input MRN $ Last_Name $ First_Name $ Gender $;
datalines;
M776 Doe John M
L87765 Mouse Mickey M
26376 Duck Donald M
98332 Roe M
9432 Johns Karen F
M000911 Mill Lora F
L8544 Wayne Bruce M
S78524 Hill Jane F
;

But if your data is ambiguous there is no way to write a program to read that will not involve some level of guessing.  So you could make sure to indicate any missing values with a period in the data line. So fix that one line to be one of these:

98332 Roe M .
98332 Roe . M
98332 . Roe M

But that will not really handle the issue of spaces in names.  So it would be best to use a format that is clearer.  For example you could use a delimiter that is not in the data (think CSV file) instead of simple spaces between the fields. 

data Disease_NHS;
  infile datalines dsd dlm=',' truncover;
  input MRN $ Last_Name $ First_Name $ Gender $;
datalines;
M776,Doe,Mary Jane,F
98332,Roe,,M
;

Or add double spaces after any field that could contain spaces and use the & modifier on the INPUT statement.

data Disease_NHS;
  infile datalines truncover;
  input MRN $ Last_Name &$ First_Name &$ Gender $;
datalines;
M776 Doe   Mary Jane   F
98332 Roe   .   M
;

View solution in original post

5 REPLIES 5
Astounding
PROC Star
So what do you think should happen when SAS looks for Mickey Mouse's GENDER, but there are no more characters left to read?
Tom
Super User Tom
Super User

What is your question exactly?  Are you asking how to fix the lines of text to work with the INPUT statement? Or asking how to fix the data step to read those lines of text?  Also what do you want to do with people that have spaces in either the FIRST name or the LAST name?

mayasak
Quartz | Level 8
I'm asking to know how to fix this in the input statement. And yes people with a space in their names is another issue that may happen.
Tom
Super User Tom
Super User

@mayasak wrote:
I'm asking to know how to fix this in the input statement. And yes people with a space in their names is another issue that may happen.

The first thing you could try is to add TRUNCOVER option on an INFILE statement to prevent it from hunting to the next line when the line does not contain enough values.

data Disease_NHS;
  infile datalines truncover;
  input MRN $ Last_Name $ First_Name $ Gender $;
datalines;
M776 Doe John M
L87765 Mouse Mickey M
26376 Duck Donald M
98332 Roe M
9432 Johns Karen F
M000911 Mill Lora F
L8544 Wayne Bruce M
S78524 Hill Jane F
;

But if your data is ambiguous there is no way to write a program to read that will not involve some level of guessing.  So you could make sure to indicate any missing values with a period in the data line. So fix that one line to be one of these:

98332 Roe M .
98332 Roe . M
98332 . Roe M

But that will not really handle the issue of spaces in names.  So it would be best to use a format that is clearer.  For example you could use a delimiter that is not in the data (think CSV file) instead of simple spaces between the fields. 

data Disease_NHS;
  infile datalines dsd dlm=',' truncover;
  input MRN $ Last_Name $ First_Name $ Gender $;
datalines;
M776,Doe,Mary Jane,F
98332,Roe,,M
;

Or add double spaces after any field that could contain spaces and use the & modifier on the INPUT statement.

data Disease_NHS;
  infile datalines truncover;
  input MRN $ Last_Name &$ First_Name &$ Gender $;
datalines;
M776 Doe   Mary Jane   F
98332 Roe   .   M
;
mayasak
Quartz | Level 8
Thank you Tom, you made my day.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 832 views
  • 1 like
  • 3 in conversation