Solved: Hierarchical data file

Lynn23 · Posted 11-01-2020 01:17 AM

I got a hierarchical data file:

I want to generate one observation per detail record and my code is:

data Household;
infile datalines dlm=',';
input Household_ID : $15. Type$;
input DOB : date11. Indicator$ Gender$ Marital_Status$ Education Employment_Status$ Income$;
Second_field=scan(_infile_,2,',');
Do while (Second_field ='Y'or'N');
output;
input DOB--Income;
end;
datalines;
A1234567BC012,A
15/FEB/1980,Y,Male,Married,3,FT,55000
3/JUN/1982,N,Female,Married,3,UE,0
24/JAN/2005,N,Male,Unknown,2,NA,0
D135EG0234678,B
19/OCT/1950,Y,Female,Divorced,0,PT,5000
X123A567F9,A
B2345234CC,A
21/May/1975,N,Male,Married,2,FT,30000
30/JUN/1978,Y,Female,Married,1,PT,10000
;
run;

But my output is like this:

How could I generate one observation per detail record?

Kurt_Bremser · Posted 11-01-2020 03:07 AM

Use retained variables, and start with a "blind input":

data household;
infile datalines dlm=',';
length
  household $15
  type $1
  dob 4
  indicator $1
  gender $6
  marital_status $8
  education 3
  employment_status $2
  income 8
;
format dob yymmdd10.;
retain household type;
input @;
if countw(_infile_,',') = 2
then input household type;
else do;
  input DOB :date11. Indicator Gender Marital_Status Education Employment_Status Income;
  output;
end;
datalines;

Untested, posted from my tablet.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

ballardw · Posted 11-01-2020 03:02 AM

@Lynn23 wrote:

I got a hierarchical data file:

I want to generate one observation per detail record and my code is:

data Household;
infile datalines dlm=',';
input Household_ID : $15. Type$;
input DOB : date11. Indicator$ Gender$ Marital_Status$ Education Employment_Status$ Income$;
Second_field=scan(_infile_,2,',');
Do while (Second_field ='Y'or'N');
output;
input DOB--Income;
end;
datalines;
A1234567BC012,A
15/FEB/1980,Y,Male,Married,3,FT,55000
3/JUN/1982,N,Female,Married,3,UE,0
24/JAN/2005,N,Male,Unknown,2,NA,0
D135EG0234678,B
19/OCT/1950,Y,Female,Divorced,0,PT,5000
X123A567F9,A
B2345234CC,A
21/May/1975,N,Male,Married,2,FT,30000
30/JUN/1978,Y,Female,Married,1,PT,10000
;
run;

But my output is like this:

How could I generate one observation per detail record?

Before even attempting to place "one observation per detail record" you should describe what you meant. As in what you expect the output to look like.

Maybe you want something like this to read the data so that you have the household and "type" attached to each of the other lines of data.

data Household;
   infile datalines dlm=',';
   informat Household_ID $15. Type $1.
      DOB date11. indicator $1. Gender $6.
      Marital_Status $15. Education best.
      Employment_Status $5. Income best.

   ;
   retain Household_ID Type houseflag;
   input @;
   if anyalpha(_infile_) = 1 AND houseflag then do;
      output;
      input @1 Household_ID  Type;
      houseflag=1;
   end;
   Else if anyalpha(_infile_) = 1 AND Not(houseflag) then do;
      input @1 Household_ID  Type;
      houseflag=1;
   end;
   else do;
   input DOB  Indicator Gender Marital_Status Education 
        Employment_Status Income;
        houseflag=0;
      output;
   end;
   drop houseflag;
   format DOB date9.;
datalines;
A1234567BC012,A
15/FEB/1980,Y,Male,Married,3,FT,55000
3/JUN/1982,N,Female,Married,3,UE,0
24/JAN/2005,N,Male,Unknown,2,NA,0
D135EG0234678,B
19/OCT/1950,Y,Female,Divorced,0,PT,5000
X123A567F9,A
B2345234CC,A
21/May/1975,N,Male,Married,2,FT,30000
30/JUN/1978,Y,Female,Married,1,PT,10000
;

You really have to watch some timing issues. First you need to identify when the record you are reading is a household. Second you have to identify when when the household is following a household with no data so that the record with no data can be written. Third, you have to make sure the household information is attached to each record.

The last bit is accomplished by Retaining the two household values so they are available.

The first two are accomplished by setting and retaining a flag for set when the current line is a household record. Notice the difference in timing of the OUTPUT statements.

The input @; loads the current line of data (limit 32K characters) and holds the input at that line. So the values can be tested for conditions. I am using a rule of "if the first character of the record is a letter then this is a household identifier line". If you have any stupid data with a DOB value of Jan 1980, or Unknown, this will not work and you will have to be very explicit on creating and describing a rule that will separate all your DOB values from a household identifier. (and need to make sure you have an informat that will read those properly)

If the current value is a household and the line immediately prior was also, then the OUTPUT is performed before actually inputing the household information so the household appears in the dataset. Other wise only the detail records when read get output, which because of the Retain will have the proper household information, barring bad DOB values...

I moved the informat information to an Informat statement because delimited data and infromats on an input statement often have some issues.

And for @#$# why read "income" as character????

Kurt_Bremser · Posted 11-01-2020 03:07 AM

Use retained variables, and start with a "blind input":

data household;
infile datalines dlm=',';
length
  household $15
  type $1
  dob 4
  indicator $1
  gender $6
  marital_status $8
  education 3
  employment_status $2
  income 8
;
format dob yymmdd10.;
retain household type;
input @;
if countw(_infile_,',') = 2
then input household type;
else do;
  input DOB :date11. Indicator Gender Marital_Status Education Employment_Status Income;
  output;
end;
datalines;

Untested, posted from my tablet.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 11-01-2020 04:12 AM

Kurt,
You gonna missing obs "X123A567F9,A" .

Kurt_Bremser · Posted 11-01-2020 04:20 AM

The OP wanted "one observation per detail record", so I inferred that households without details are not wanted. Seems I was right 😉

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Lynn23 · Posted 11-01-2020 04:25 AM

Yes you are right. Thank you all!!

Ksharp · Posted 11-01-2020 03:51 AM

data temp;
infile datalines length=len;
input temp $varying800. len;
datalines;
A1234567BC012,A
15/FEB/1980,Y,Male,Married,3,FT,55000
3/JUN/1982,N,Female,Married,3,UE,0
24/JAN/2005,N,Male,Unknown,2,NA,0
D135EG0234678,B
19/OCT/1950,Y,Female,Divorced,0,PT,5000
X123A567F9,A
B2345234CC,A
21/May/1975,N,Male,Married,2,FT,30000
30/JUN/1978,Y,Female,Married,1,PT,10000
;
run;
data Household;
 merge temp temp(rename=(temp=_temp) firstobs=2);
if notdigit(temp)=1 then do;
 Household_ID=scan(temp,1,','); 
 Type=scan(temp,2,',');
 retain Household_ID Type ;
 if notdigit(_temp)=1 then output;
end;
else do;
 DOB=scan(temp,1,','); 
 Indicator=scan(temp,2,','); 
 Gender=scan(temp,3,',');
 Marital_Status=scan(temp,4,','); 
 Education=scan(temp,5,',');
 Employment_Status=scan(temp,6,','); 
 Income=scan(temp,7,',');
 output;
end;
drop temp _temp;
run;

Hierarchical data file

Re: Hierarchical data file

Re: Hierarchical data file

Re: Hierarchical data file

Re: Hierarchical data file

Re: Hierarchical data file

Re: Hierarchical data file

Re: Hierarchical data file

Registration is open

SAS Training: Just a Click Away