DATA Step, Macro, Functions and more

Can multiple row entries be created from single dataline?

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Can multiple row entries be created from single dataline?

[ Edited ]

Our teacher asked us to form different row values from a single dataline.

The data is:

12m13f14m15f

 

Now the result to be needed is like:

id     id2     id3

1       2       m

1       3       f

1       4       m

1       5       f

 

Now I am using this code:

data check_pointer;                                                                                                                     
input @1 (id) (1.+3)                                                                                                                      
@2 (id1) (1.+3) @3 (id2) ($1.+3); datalines; 12m13f14m15f ; run;

I know that column pointer are to be used here but I am not able to tell how and what's wrong with my approach. The issue is that it's just giving first line as output and that's it.

Is there a way to get the required output?


Accepted Solutions
Solution
‎02-26-2017 05:46 AM
Super User
Super User
Posts: 6,500

Re: Can multiple row entries be created from single dataline?

[ Edited ]

The problem is caused by the fact that when you read in-line records using DATALINES ( also known as CARDS) statement then SAS will pad the lines to be a multiple of 80 columns.  So that they look like 80 column IBM punch cards.  

 

Normally SAS will stop your DATA step when it reads past the end of the input data (INPUT statement or SET/MERGE/UPDATE/MODIFY statement).  But since you are reading a fixed number of characters per observation you do not attempt to read the second line until you have read more than 80 characters.

 

Note that this also explains the LOST CARD message.  Because 80 is not an even multiple of 3, when you read columns 79 and 80 and then SAs tries to go to the next card to find the 81st character, but there is no next line, hence the LOST CARD.

 

The answer that DATA_NULL_ posted takes advantage that the fact that when you use PARMCARDS statement instead of DATALINE (cards) statement that SAS does not pad the lines.

 

The tricks that Art posted are attempts to either eliminate the extra blank observations or calculate when to stop the data step.

 

Here is another way of stopping that uses the COL= infile option to create variable that indicates where you are on the line. That way you can issue an INPUT statement to skip the blanks at the end of the line.

 

data check_pointer;
  infile cards col=cc ;
  if cc > length(_infile_) then input;
  input (id id2 id3) (1. 1. $1.) @@ ;
cards;
12m13f14m15f
;;;;

 

 

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: Can multiple row entries be created from single dataline?

You are missing a double trailing @@. There is probably a way to stop the process when no more values are found but, at the moment, I can't think of what that might be.

 

Both of the following two methods will work:

 

data check_pointer;
  input id 1. id2 1. id3 $1. @@;
  if not missing(id);
  datalines;
12m13f14m15f
;

data check_pointer;
  input (id) (1.+3)
        (id1) (1.+3)
        (id2) ($1.+3) @@;
  if not missing(id);
  datalines;
12m13f14m15f
;

HTH,

Art, CEO, AnalystFinder.com

New Contributor
Posts: 2

Re: Can multiple row entries be created from single dataline?

Your answer does solve the problem and thanks a lot for it sir. But I have few questions regarding this approah.

 

First that SAS does solve this but with an error. As per the log:

NOTE: LOST CARD.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
15         ;
id=. id1=. id2=  _ERROR_=1 _N_=27
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

What does it mean and why is it giving an error?

 

Secondly, if we are using- if not missing(id) here then why isn't SAS stopped right after fourth row as the next id is not available. Why does it goes to 27th line.

 

Third question is that I did use @@ when I was trying to solve it, but my laptop stopped working completely. My code didn't include if not missing(id) command though. So what's the reason for that and how is this command stopping this infinite loop kind of thing that crashed my laptop.

 

And last is that if there was space in between the characters then I could had easily used @@ and it had worked but this one didn't have one. Is there a way where I can just use each next value as a new input. I mean to say making SAS believe that there is no delimiter and accept each value as into new next variable?

 

It'll be great if you could answer these sir.

PROC Star
Posts: 7,363

Re: Can multiple row entries be created from single dataline?

Not sure if I can answer your questions, but I'll try. However, they will only be guesses, as I really don't know the true answer.

 

Most significantly, you asked about the note, and mentioned that "if there was a space between each variable", and that led me to a solution. The default behavior of the input statement is to NOT pad each record. However, with all of my code suggestions (escept for the one where I explicitly delclared the record length), SAS was continuing to process until it had surpassed 80 characters.

 

As such, I tried a solution that intionally trimmed the record of all spaces.:

 

data check_pointer;
  input @;
  _infile_=trim(_infile_);
  input id 1. id2 1. id3 $1. @@;
  datalines;
12m13f14m15f
;

That still produces the note (it's just a note .. NOT an error) about reading past the end of the line, but doesn't create the additional 22 records with all missing values.

 

As such, my guess is that while the documentation says that it doesn't pad the records, it is clearly trying to read 80 characters of data. Trimming the record apparently forces it to try and read one extra set (of id id2 and id3), which it attempts by going to the next, non-existent line.  It also goes to the next line after reading 26 sets (i.e., 78 characters).

 

I don't know why it crashed your laptop, but I was able to get SAS University Edition to go into an infinite loop after I tried some infile optioins (like stopover, trunover, pad, nopad and the like).  Interestingly, while those options are only supposed to be in effect through the datastep in which they are used, I had to reboot SAS in order to get it to run correctly. Hopefully, someone at SAS will see this thread and provide the true explanation.

 

Finally, I get the NOTE about going to the next line regardless of whether there are or aren't spaces between each variable.

 

Art, CEO, AnalystFinder.com

 

 

 

PROC Star
Posts: 7,363

Re: Can multiple row entries be created from single dataline?

Here is a third way, but you have to count the number of characters being read and include that number in the linesize option:

 

data check_pointer;
  infile datalines linesize=12;
  input id 1. id2 1. id3 $1. @@;
  datalines;
12m13f14m15f
;

Art, CEO, AnalystFinder.com

 

PROC Star
Posts: 7,363

Re: Can multiple row entries be created from single dataline?

I've asked some of my colleagues if there might be a better way, but I haven't heard back from them yet.

 

The following accomplishes the task, avoiding the note appearing in the log:

 

data check_pointer (drop=len);
  infile datalines column=current;
  retain len;
  input @;
  _infile_=trim(_infile_);
  len=length(_infile_);
  input id 1. id2 1. id3 $1. @@;
  if current eq len+1 then do;
    output;
    input @1;
  end;
  else output;
  datalines;
11m12f13m14f
21m22f23m24f
31m32f33m34f35m36f37m38f
;

I'll let you know if I discover an easier way.

 

Art, CEO, AnalystFinder.com

 

Respected Advisor
Posts: 3,777

Re: Can multiple row entries be created from single dataline?

filename FT15F001 temp;
data check_pointer;      
   infile FT15F001; 
   input (id id2 id3) (2*1. $1.) @@;                                                                                                                    ;
   parmcards4;                                                                                                                              
12m13f14m15f                                                                                                                            
;;;;                                                                                                                                      
   run;
proc print;
   run;
PROC Star
Posts: 7,363

Re: Can multiple row entries be created from single dataline?

@data_null__: Definitely simpler and much appreciated! However, where my last attempt avoided the note in the log, your method does produce the "SAS went to a new line when INPUT statement reached past the end of a line." note.

 

Art, CEO, AnalystFinder.com

 

 

Solution
‎02-26-2017 05:46 AM
Super User
Super User
Posts: 6,500

Re: Can multiple row entries be created from single dataline?

[ Edited ]

The problem is caused by the fact that when you read in-line records using DATALINES ( also known as CARDS) statement then SAS will pad the lines to be a multiple of 80 columns.  So that they look like 80 column IBM punch cards.  

 

Normally SAS will stop your DATA step when it reads past the end of the input data (INPUT statement or SET/MERGE/UPDATE/MODIFY statement).  But since you are reading a fixed number of characters per observation you do not attempt to read the second line until you have read more than 80 characters.

 

Note that this also explains the LOST CARD message.  Because 80 is not an even multiple of 3, when you read columns 79 and 80 and then SAs tries to go to the next card to find the 81st character, but there is no next line, hence the LOST CARD.

 

The answer that DATA_NULL_ posted takes advantage that the fact that when you use PARMCARDS statement instead of DATALINE (cards) statement that SAS does not pad the lines.

 

The tricks that Art posted are attempts to either eliminate the extra blank observations or calculate when to stop the data step.

 

Here is another way of stopping that uses the COL= infile option to create variable that indicates where you are on the line. That way you can issue an INPUT statement to skip the blanks at the end of the line.

 

data check_pointer;
  infile cards col=cc ;
  if cc > length(_infile_) then input;
  input (id id2 id3) (1. 1. $1.) @@ ;
cards;
12m13f14m15f
;;;;

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 212 views
  • 10 likes
  • 4 in conversation