BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Lynn23
Calcite | Level 5

I got a hierarchical data file:

Pasted Graphic.png

 

I want to generate one observation per detail record and my code is:

 

data Household;
infile datalines dlm=',';
input Household_ID : $15. Type$;
input DOB : date11. Indicator$ Gender$ Marital_Status$ Education Employment_Status$ Income$;
Second_field=scan(_infile_,2,',');
Do while (Second_field ='Y'or'N');
output;
input DOB--Income;
end;
datalines;
A1234567BC012,A
15/FEB/1980,Y,Male,Married,3,FT,55000
3/JUN/1982,N,Female,Married,3,UE,0
24/JAN/2005,N,Male,Unknown,2,NA,0
D135EG0234678,B
19/OCT/1950,Y,Female,Divorced,0,PT,5000
X123A567F9,A
B2345234CC,A
21/May/1975,N,Male,Married,2,FT,30000
30/JUN/1978,Y,Female,Married,1,PT,10000
;
run;

 

But my output is like this:

Pasted Graphic 2.png

How could I generate one observation per detail record?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use retained variables, and start with a "blind input":

data household;
infile datalines dlm=',';
length
  household $15
  type $1
  dob 4
  indicator $1
  gender $6
  marital_status $8
  education 3
  employment_status $2
  income 8
;
format dob yymmdd10.;
retain household type;
input @;
if countw(_infile_,',') = 2
then input household type;
else do;
  input DOB :date11. Indicator Gender Marital_Status Education Employment_Status Income;
  output;
end;
datalines;

Untested, posted from my tablet.

View solution in original post

6 REPLIES 6
ballardw
Super User

@Lynn23 wrote:

I got a hierarchical data file:

Pasted Graphic.png

 

I want to generate one observation per detail record and my code is:

 

data Household;
infile datalines dlm=',';
input Household_ID : $15. Type$;
input DOB : date11. Indicator$ Gender$ Marital_Status$ Education Employment_Status$ Income$;
Second_field=scan(_infile_,2,',');
Do while (Second_field ='Y'or'N');
output;
input DOB--Income;
end;
datalines;
A1234567BC012,A
15/FEB/1980,Y,Male,Married,3,FT,55000
3/JUN/1982,N,Female,Married,3,UE,0
24/JAN/2005,N,Male,Unknown,2,NA,0
D135EG0234678,B
19/OCT/1950,Y,Female,Divorced,0,PT,5000
X123A567F9,A
B2345234CC,A
21/May/1975,N,Male,Married,2,FT,30000
30/JUN/1978,Y,Female,Married,1,PT,10000
;
run;

 

But my output is like this:

Pasted Graphic 2.png

How could I generate one observation per detail record?


Before even attempting to place "one observation per detail record" you should describe what you meant. As in what you expect the output to look like.

 

Maybe you want something like this to read the data so that you have the household and "type" attached to each of the other lines of data.

data Household;
   infile datalines dlm=',';
   informat Household_ID $15. Type $1.
      DOB date11. indicator $1. Gender $6.
      Marital_Status $15. Education best.
      Employment_Status $5. Income best.

   ;
   retain Household_ID Type houseflag;
   input @;
   if anyalpha(_infile_) = 1 AND houseflag then do;
      output;
      input @1 Household_ID  Type;
      houseflag=1;
   end;
   Else if anyalpha(_infile_) = 1 AND Not(houseflag) then do;
      input @1 Household_ID  Type;
      houseflag=1;
   end;
   else do;
   input DOB  Indicator Gender Marital_Status Education 
        Employment_Status Income;
        houseflag=0;
      output;
   end;
   drop houseflag;
   format DOB date9.;
datalines;
A1234567BC012,A
15/FEB/1980,Y,Male,Married,3,FT,55000
3/JUN/1982,N,Female,Married,3,UE,0
24/JAN/2005,N,Male,Unknown,2,NA,0
D135EG0234678,B
19/OCT/1950,Y,Female,Divorced,0,PT,5000
X123A567F9,A
B2345234CC,A
21/May/1975,N,Male,Married,2,FT,30000
30/JUN/1978,Y,Female,Married,1,PT,10000
;

You really have to watch some timing issues. First you need to identify when the record you are reading is a household. Second you have to identify when when the household is following a household with no data so that the record with no data can be written. Third, you have to make sure the household information is attached to each record.

 

The last bit is accomplished by Retaining the two household values so they are available.

The first two are accomplished by setting and retaining a flag for set when the current line is a household record. Notice the difference in timing of the OUTPUT statements.

 

The input @; loads the current line of data (limit 32K characters) and holds the input at that line. So the values can be tested for conditions. I am using a rule of "if the first character of the record is a letter then this is a household identifier line". If you have any stupid data with a DOB value of Jan 1980, or Unknown, this will not work and you will have to be very explicit on creating and describing a rule that will separate all your DOB values from a household identifier. (and need to make sure you have an informat that will read those properly)

If the current value is a household and the line immediately prior was also, then the OUTPUT is performed before actually inputing the household information so the household appears in the dataset. Other wise only the detail records when read get output, which because of the Retain will have the proper household information, barring bad DOB values...

 

I moved the informat information to an Informat statement because delimited data and infromats on an input statement often have some issues.

 

And for @#$# why read "income" as character????

 

 

Kurt_Bremser
Super User

Use retained variables, and start with a "blind input":

data household;
infile datalines dlm=',';
length
  household $15
  type $1
  dob 4
  indicator $1
  gender $6
  marital_status $8
  education 3
  employment_status $2
  income 8
;
format dob yymmdd10.;
retain household type;
input @;
if countw(_infile_,',') = 2
then input household type;
else do;
  input DOB :date11. Indicator Gender Marital_Status Education Employment_Status Income;
  output;
end;
datalines;

Untested, posted from my tablet.

Ksharp
Super User
Kurt,
You gonna missing obs "X123A567F9,A" .
Lynn23
Calcite | Level 5
Yes you are right. Thank you all!!
Ksharp
Super User
data temp;
infile datalines length=len;
input temp $varying800. len;
datalines;
A1234567BC012,A
15/FEB/1980,Y,Male,Married,3,FT,55000
3/JUN/1982,N,Female,Married,3,UE,0
24/JAN/2005,N,Male,Unknown,2,NA,0
D135EG0234678,B
19/OCT/1950,Y,Female,Divorced,0,PT,5000
X123A567F9,A
B2345234CC,A
21/May/1975,N,Male,Married,2,FT,30000
30/JUN/1978,Y,Female,Married,1,PT,10000
;
run;
data Household;
 merge temp temp(rename=(temp=_temp) firstobs=2);
if notdigit(temp)=1 then do;
 Household_ID=scan(temp,1,','); 
 Type=scan(temp,2,',');
 retain Household_ID Type ;
 if notdigit(_temp)=1 then output;
end;
else do;
 DOB=scan(temp,1,','); 
 Indicator=scan(temp,2,','); 
 Gender=scan(temp,3,',');
 Marital_Status=scan(temp,4,','); 
 Education=scan(temp,5,',');
 Employment_Status=scan(temp,6,','); 
 Income=scan(temp,7,',');
 output;
end;
drop temp _temp;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1715 views
  • 1 like
  • 4 in conversation