BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jfaruqui
Obsidian | Level 7

I have a raw data file that needs to be read into a dataset.

it looks like this:

 

----+----1----+----2----+----3----+----4----+----5

Harry    Mary      123 Vermont Ave.
Joe      3
Lisa     2
Felix    Laura     6 Baltic Drive
Karen    12
Thomas   Felicia   424 Suffolk place
Edward   Bella     666 Vampire Lane
Jeff     9
Jared    7

Tim      11

 

The variables are Father, Mother, Address followed by Child (if any) and Child Age.

I need to create a dataset that will have one observation per child with repeating Father, Mother and Address information. If no children then Child and age value should be blank. Please Help .. Thanks

 

Desired Result would look like this:

 

Father   Mother    Address             Child    Age

---------------------------------------------------

Harry    Mary      123 Vermont Ave.    Joe      3
Harry    Mary      123 Vermont Ave.    Lisa     2
Felix    Laura     6 Baltic Drive      Karen    12
Thomas   Felicia   424 Suffolk place            .
Edward   Bella     666 Vampire Lane    Jeff     9
Edward   Bella     666 Vampire Lane    Jared    7
Edward   Bella     666 Vampire Lane    Tim      11

 



1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

How about this?

data want;                                                                                                                              
  infile cards truncover eof=done;                                                                                                      
  input @1 father $9. mother $10. address $40.;                                                                                         
  do _N_=1 by 1 until(0);                                                                                                               
    input child $9. age ?? 3.0 @@;                                                                                                      
    if missing(age) then leave;                                                                                                         
    output;                                                                                                                             
    input;                                                                                                                              
    end;                                                                                                                                
  done: /* This is to avoid the "missing card" message in log */                                                                        
  if _N_=1 then do;  /* output when no children read */                                                                                 
    call missing(Child);  /* Father variable has been read as Child */                                                                                                              
    output;                                                                                                                             
    end;                                                                                                                                
cards;                                                                                                                                  
Harry    Mary      123 Vermont Ave.                                                                                                     
Joe      3                                                                                                                              
Lisa     2                                                                                                                              
Felix    Laura     6 Baltic Drive                                                                                                       
Karen    12                                                                                                                             
Thomas   Felicia   424 Suffolk place                                                                                                    
Edward   Bella     666 Vampire Lane                                                                                                     
Jeff     9                                                                                                                              
Jared    7                                                                                                                              
Tim      11                                                                                                                             
;                                                                                                                                       
run;

View solution in original post

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

Is this a .txt file?

jfaruqui
Obsidian | Level 7
yeah ... its a dat file ... i put the ruler up there just for convenience .. its not part of the file
Shmuel
Garnet | Level 18

Assuming you have no delimiters between variables and

each name contains one string only without blanks embedded

then try next code:

   

data test;
   length father $10 mother $10 address $40; /* addapt length to max expected */
   retain father mother address no_of_childs;
   infile datalines;
   input a_line $80.
   
   /* check is there an address */
   if scan(a_line,3) ne ' ' then do;
      if _N_ > 1 then do;   
	     if no_of_childs = 0 then do;
		    child= ' ';
			child_age = .;
			output;
	     end;
         a_line = compbl(a_line);
         father = scan(a_line,1);
	     mother = scan(a_line,2);
	     add_start = index(a_line, scan(a_line, 3));
	     address = substr(a_line, add_start);
	     no_of_childs = 0;
	  end;
   end;
   else do;
      input child $ child_age;
	  output;
	  no_of_childs +1;
   end;
datalines;
Harry    Mary      123 Vermont Ave.
Joe      3
Lisa     2
Felix    Laura     6 Baltic Drive
Karen    12
Thomas   Felicia   424 Suffolk place
Edward   Bella     666 Vampire Lane
Jeff     9
Jared    7
Tim      11
;
run;
jfaruqui
Obsidian | Level 7

Hi Shmuel

 

I ran your program verbatim (had to add the ";" for first input statement) ... sorry, but not getting the required result.

 

I have edited the initial question text to include the desired result .. 

Shmuel
Garnet | Level 18

Though solution is already selected, please find attached alternative code debugged and fixed

with hope you find some interest in it:

 

data test;
   length father  $10 mother  $10 child $10 tst $3
          address $40; /* addapt length to max expected */
   retain father mother address no_of_childs;
   infile datalines truncover;
   input a_line $80.; 
   
   /* check is there an address */
   tst = scan(a_line,3);
   if tst ne ' ' then do;
      if _N_ > 1 then do; 
	 if no_of_childs = 0 then do;
	    child= ' ';
	    child_age = .;
	    output;
        end;
     end;
     a_line = compbl(a_line);
     father = scan(a_line,1);
     mother = scan(a_line,2);
     add_start = index(a_line, scan(a_line, 3));
     address = substr(a_line, add_start);
     no_of_childs = 0;
     return;
   end;
   child = scan(a_line,1);
   child_age = scan(a_line,2);
   output;
   no_of_childs +1;
    
   drop a_line no_of_childs add_start;
datalines;
Harry    Mary      123 Vermont Ave.
Joe      3
Lisa     2
Felix    Laura     6 Baltic Drive
Karen    12
Thomas   Felicia   424 Suffolk place
Edward   Bella     666 Vampire Lane
Jeff     9
Jared    7
Tim      11
;
run;

 

jfaruqui
Obsidian | Level 7
yessir !! that did it too .. thanks
s_lassen
Meteorite | Level 14

How about this?

data want;                                                                                                                              
  infile cards truncover eof=done;                                                                                                      
  input @1 father $9. mother $10. address $40.;                                                                                         
  do _N_=1 by 1 until(0);                                                                                                               
    input child $9. age ?? 3.0 @@;                                                                                                      
    if missing(age) then leave;                                                                                                         
    output;                                                                                                                             
    input;                                                                                                                              
    end;                                                                                                                                
  done: /* This is to avoid the "missing card" message in log */                                                                        
  if _N_=1 then do;  /* output when no children read */                                                                                 
    call missing(Child);  /* Father variable has been read as Child */                                                                                                              
    output;                                                                                                                             
    end;                                                                                                                                
cards;                                                                                                                                  
Harry    Mary      123 Vermont Ave.                                                                                                     
Joe      3                                                                                                                              
Lisa     2                                                                                                                              
Felix    Laura     6 Baltic Drive                                                                                                       
Karen    12                                                                                                                             
Thomas   Felicia   424 Suffolk place                                                                                                    
Edward   Bella     666 Vampire Lane                                                                                                     
Jeff     9                                                                                                                              
Jared    7                                                                                                                              
Tim      11                                                                                                                             
;                                                                                                                                       
run;
jfaruqui
Obsidian | Level 7
can you please elaborate on what this line does ??
"" do _N_=1 by 1 until(0); ""
This seems to be the kicker in the solution but i cant understand the logic
s_lassen
Meteorite | Level 14

Sorry, I overdid that a bit. "do _N_=1 by 1;" is enough. It is just that I started out with "do until(0);" (meaning "do forever", or at least until you reach a LEAVE or DELETE statement), but then I found I that I needed the loop counter (_N_) to identify couples with no children. And I forgot to remove the "until(0)" bit.

 

"do _N_=1 by 1;" means the same, but with a counter, like "do _N_=1 to infinity;".

 

If your infile is a real file, and not a Cards statement, the DO UNTIL construct can be used like this (CARDS statement does not accept the END= option):

data want;                                                                                                                              
  infile <filename here!> truncover end=done;                                                                                                    
  input @1 father $9. mother $10. address $40.;                                                                                         
  do _N_=1 by 1 until(done);                                                                                                            
    input child $9. age ?? 3.0 @@;                                                                                                      
    if missing(age) then leave;                                                                                                         
    output;                                                                                                                             
    input;                                                                                                                              
    end;                                                                                                                                
  if _N_=1 then do;  /* output when no children read */                                                                                 
    call missing(Child);                                                                                                                
    output;                                                                                                                             
    end;                                                                                                                                
run;

Which will make the program a little bit simpler.

jfaruqui
Obsidian | Level 7
Got it ...
thank you very much !!!
you da man !!!
Ksharp
Super User
data want;                                                                                                                              
input;
length Father   Mother    Address $ 100;
retain Father   Mother    Address ;
temp=scan(_infile_,2,' ');
if not missing(temp) and notdigit(strip(temp)) then do;
Father=scan(_infile_,1,' ');   
Mother=scan(_infile_,2,' ');
call scan(_infile_,2,p,l,' ');
Address=left(substr(_infile_,p+l));
end; 
else do;
Child=scan(_infile_,1,' ');  
Age=scan(_infile_,2,' '); 
end;
drop temp p l;
cards;                                                                                                                                  
Harry    Mary      123 Vermont Ave.                                                                                                     
Joe      3                                                                                                                              
Lisa     2                                                                                                                              
Felix    Laura     6 Baltic Drive                                                                                                       
Karen    12                                                                                                                             
Thomas   Felicia   424 Suffolk place                                                                                                    
Edward   Bella     666 Vampire Lane                                                                                                     
Jeff     9                                                                                                                              
Jared    7                                                                                                                              
Tim      11                                                                                                                             
;                                                                                                                                       
run;
data final_want;
 set want;
 by Father   Mother    Address notsorted;
 if missing(child) and not (first.Address and last.Address) then delete;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2237 views
  • 1 like
  • 5 in conversation