BookmarkSubscribeRSS Feed
sanyam13
Fluorite | Level 6

A raw data file is listed below:

 

 

RANCH,1250,2,1,Sheppard Avenue,"$64,000"

 

SPLIT,1190,1,1,Rand Street,"$65,850"

 

CONDO,1400,2,1.5,Market Street,"80,050"

 

TWOSTORY,1810,4,3,Garris Street,"$107,250"

 

RANCH,1500,3,3,Kemble Avenue,"$86,650"

 

SPLIT,1615,4,3,West Drive,"94,450"

 

SPLIT,1305,3,1.5,Graham Avenue,"$73,650"

 

The following SAS program is submitted using the raw data file as input:

 

data work.condo_ranch;

 infile 'file-specification' dsd;

input style $ @;

if style = 'CONDO' or style = 'RANCH';

input sqfeet bedrooms baths street $ price : dollar10.;  run;

 

can you please help me out in this case , why only 3 observations are in output data set , ? in input there are 5 variables so there should be 5 observation

6 REPLIES 6
Cynthia_sas
SAS Super FREQ
Hi:
If you look at the data, you'll see that only 3 rows meet your subsetting IF condition. Read about the subsetting IF in the documentation. BEFORE the second INPUT statement, you have this:
if style='CONDO' or style='RANCH';

What did you think this statement was doing? Any data row that meets the criteria in the IF will go forward in the logic to the second INPUT statement. Any data row that does NOT meet the condition does NOT get parsed or get output.

Only the rows that meet the subsetting IF condition get parsed by the second INPUT statement and get OUTPUT by the implied output at the end of the DATA step program.

Cynthia
Nipun22
Obsidian | Level 7

i think the "if style" statement must have a "then" keyword at the end as i suspect you have got this question from somewhere on the web dumps

Kurt_Bremser
Super User

@Nipun22 wrote:

i think the "if style" statement must have a "then" keyword at the end as i suspect you have got this question from somewhere on the web dumps


The code is valid. What you see in it is a Subsetting IF.

Tom
Super User Tom
Super User

As others have said the reason you only get 3 observations from 5 lines in the text file is because the subsetting IF

if style = 'CONDO' or style = 'RANCH';

is only allowing 3 of the data step iterations to get to the implied OUTPUT at the end of the step and be written to the output dataset.

 

In other cases might see a mismatch between the number of lines read and the number of observations written for code like that because of the use of the default FLOWOVER option on the INFILE statement.  IN that case any line that did not have the full number of fields (or had a missing value for the last field) would cause the INPUT statement to go to the next line to look for a values to satisfy all of the variables being read.  It would be safer to include the TRUNCOVER option.

infile 'file-specification' dsd truncover;

NOTE: You almost never want the older MISSOVER option since that one will cause trouble if your INPUT statement is using FORMATTED MODE input and there are not enough characters on the line to fully meet the width the informat needs.  In that case MISSOVER will ignore those short values at the end of the line.  By using TRUNCOVER will allow the INPUT statement to read them so your result better matches the source text.

 

Nipun22
Obsidian | Level 7
Hi Tom,
actually i wanted to know if we place a "then" keyword at the end of the "if style" statement then the output dataset would have 7 observations. can you tell me why?
Tom
Super User Tom
Super User

@Nipun22 wrote:
Hi Tom,
actually i wanted to know if we place a "then" keyword at the end of the "if style" statement then the output dataset would have 7 observations. can you tell me why?

Adding THEN will change these two statements like this:

if <condition>;
<statement>;

into two statements like this:

if <condition> then <statement1>;
<statement2>;

So when the IF/THEN statement basically does nothing since when it the condition is FALSE it moves to the next statement and when it is TRUE it executes an empty statement before moving to the next statement.  The second statement (in your case the INPUT statement) executes in either case.

 

In the original program when the condition in the subsetting IF statement is FALSE it will immediately stop the current data step iteration. Which will mean that only the cases where the condition is true will make it to the end of the data step code where SAS will execute the implied OUTPUT it adds to any data step that does not have an explicit OUTPUT statement.

 

Note that if you also remove the semicolon you end up with just one statement in this form:

if <condition> then <statement>;

Then in your case it will cause the other 6 variables on the line to only be populated for 3 of the 7 lines of input text.  The other observations will have missing values for those variables.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 940 views
  • 2 likes
  • 5 in conversation