How to read multiple rows of data with missing values

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

How to read multiple rows of data with missing values

data sales_staff2;
input #1 Employee_ID 6. @21 Last_Name $18.
#2 Job_title $20. @22 Hire_date mmddyy10. @33 Salary dollar8./
#3 Gender $1. @3 Birth_Date mmddyy10. @14 Country $2.;
datalines;
120102 Tom Zhou
Sales Manager 06/01/1993 $108,255
M 08/11/1973 AU
120103 Wilson Dawes
Sales Manager 01/01/1978 $87,975
120121 Irenie Elvish
F 08/02/1948 AU
120122 Christina Ngan
Sales Rep. II 07/01/1982 $27,475
F 07/27/1958 AU
;
run;

proc print data=sales_staff2;
run;


Accepted Solutions
Solution
‎03-14-2017 12:47 PM
Super User
Posts: 11,114

Re: How to read multiple rows of data with missing values

[ Edited ]

Something like this may work for you:

data sales_staff2;
   infile datalines truncover   ;
   informat Employee_ID 6. Last_Name $18.
      Job_title $20. Hire_date mmddyy10. Salary comma9.
      Gender $1. Birth_Date mmddyy10. Country $2.;
   format Hire_date Birth_date mmddyy10. Salary dollar8.;
input #1 Employee_ID  @21 Last_Name 
#2 test $ @  ;
   /* test that the second line does not start with the value F or M*/
   if  test not in ('F' 'M') then do;
      input Job_title 1-20  @22 Hire_date mmddyy10. @"$" Salary
   /* test that the next line to read does start with F or M*/ 
      #3 test2 $ @;
         if test2 in ('F' 'M') then do;
            input @1 Gender $1. @3 Birth_Date mmddyy10. @14 Country $2.;
            output;
         end;

         else do;
            output;
            input @1 @@;
         end;
      end;
   else do;
      input @1 Gender $1. @3 Birth_Date mmddyy10. @14 Country $2.;
      output;
   end;
   drop test test2;
datalines;
120102 Tom          Zhou
Sales Manager        06/01/1993 $108,255
M 08/11/1973 AU
120103 Wilson       Dawes
Sales Manager        01/01/1978 $87,975
120121 Irenie       Elvish
F 08/02/1948 AU
120122 Christina    Ngan
Sales Rep. II        07/01/1982 $27,475
F 07/27/1958 AU
;
run;

But I would really go to the data source an see if there is a possiblity of getting this in an actual data interchange format such as CSV with one row / record per person. Or at least consistent row positions.

 

My install of SAS does not have DOLLAR as an informat so used comma for reading.

Note the above code will fail if any of your job titles are something like: F manager or anything that starts with a single F or M

and if gender is not upper case (fix with UPCASE statement on Test2) or something other than F and M, which could be added to the comparison for Test2. But see the note about job titles.

This kind of data skipping is very fragile and any change can get extremely difficult to manage.

View solution in original post


All Replies
Super User
Posts: 11,114

Re: How to read multiple rows of data with missing values

It really helps to explicitly state your question. In this case I am guessing that you want to not read when you do not have the row of data that should have gender, birth date and country. Is that correct? Or could you be missing other rows as well?

 

You should post code in a code box using the {i} icon. It appears that the code may have been reformatted by the message so that your input statements are non-sensical: None of the last names start in column 21, none of the hire dates are in column 22, salary is not at column 33 on any of the example data.

Solution
‎03-14-2017 12:47 PM
Super User
Posts: 11,114

Re: How to read multiple rows of data with missing values

[ Edited ]

Something like this may work for you:

data sales_staff2;
   infile datalines truncover   ;
   informat Employee_ID 6. Last_Name $18.
      Job_title $20. Hire_date mmddyy10. Salary comma9.
      Gender $1. Birth_Date mmddyy10. Country $2.;
   format Hire_date Birth_date mmddyy10. Salary dollar8.;
input #1 Employee_ID  @21 Last_Name 
#2 test $ @  ;
   /* test that the second line does not start with the value F or M*/
   if  test not in ('F' 'M') then do;
      input Job_title 1-20  @22 Hire_date mmddyy10. @"$" Salary
   /* test that the next line to read does start with F or M*/ 
      #3 test2 $ @;
         if test2 in ('F' 'M') then do;
            input @1 Gender $1. @3 Birth_Date mmddyy10. @14 Country $2.;
            output;
         end;

         else do;
            output;
            input @1 @@;
         end;
      end;
   else do;
      input @1 Gender $1. @3 Birth_Date mmddyy10. @14 Country $2.;
      output;
   end;
   drop test test2;
datalines;
120102 Tom          Zhou
Sales Manager        06/01/1993 $108,255
M 08/11/1973 AU
120103 Wilson       Dawes
Sales Manager        01/01/1978 $87,975
120121 Irenie       Elvish
F 08/02/1948 AU
120122 Christina    Ngan
Sales Rep. II        07/01/1982 $27,475
F 07/27/1958 AU
;
run;

But I would really go to the data source an see if there is a possiblity of getting this in an actual data interchange format such as CSV with one row / record per person. Or at least consistent row positions.

 

My install of SAS does not have DOLLAR as an informat so used comma for reading.

Note the above code will fail if any of your job titles are something like: F manager or anything that starts with a single F or M

and if gender is not upper case (fix with UPCASE statement on Test2) or something other than F and M, which could be added to the comparison for Test2. But see the note about job titles.

This kind of data skipping is very fragile and any change can get extremely difficult to manage.

New Contributor
Posts: 2

Re: How to read multiple rows of data with missing values

Thank you so much.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 147 views
  • 1 like
  • 2 in conversation