BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bimohkc
Calcite | Level 5

Data household (drop=HorP_indicator) ;
Infile datalines dsd ;
retain Household_identity Type No_of_members;
input HorP_indicator :$10. @ ;
if length(HorP_indicator )>1 then do;
input @1 Household_identity :$10.
@12 Type $
No_of_members
Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income;
end;
else do;
input
@1 Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income ;
end;
format DOB date9.;

datalines;

A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
run;

 .....jpg

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Don't post code/data/logs into the main window. Use the icons, {i} or SAS Running Man, on the menu bar to open a pop-up window to paste/edit code.

Here are some hints for reading a delimited file. DEFINE your variables before using them in statement like INPUT, FORMAT, INFORMAT or assignment.  Attach any required informats and/or formats to the variables. Then you can just use list mode on your INPUT statement(s).  If you define the variables in the order that appear in the source file then you can even use positional variable lists in your INPUT statement.

 

If you assume you can trust the number of members field then you do not need to use RETAIN since you can read all members of the household in one data step loop.

data want ;
  infile cards dsd truncover ;
  length Household_identity $10 Type $8 No_of_members 8
         Member_no 8 H_Indicator $8 DOB 8 Gender $1 M_status $8
         Edu_level 8 E_Status $8 M_Income 8
  ;
  informat DOB date. ;
  format DOB date9.;
  input Household_identity -- No_of_members ;
  do _n_=1 to No_of_members ;
     input Member_no -- M_income ;
     output;
  end;
cards;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;

 

View solution in original post

14 REPLIES 14
andreas_lds
Jade | Level 19

The following step reads the datalines and creates one observation for each row in the details part of your data. Normal group-processing should be sufficient to create the required output. 

 

data have;
   length
      Household_identity $ 10
      Type $ 1
      No_of_members 8
      
      Member_no 8
      H_Indicator $ 1
      DOB 8
      Gender $ 1
      M_status $ 1
      Edu_level 8
      E_Status $ 2
      M_Income 8;     
   ;
   
   informat dob date9.;
   format dob date9.;
   
   retain Household_identity Type No_of_members;
   
   infile datalines delimiter=",";
   input @;
   
   if countc(_infile_, ',') = 2 then do;
      input Household_identity Type No_of_members;
   end;
   else do;
      input Member_no
         H_Indicator
         DOB
         Gender 
         M_status
         Edu_level
         E_Status
         M_Income
      ;
      output;
   end;
   
datalines;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;
run;
bimohkc
Calcite | Level 5

 Thank for your editing and advice.but in fact, can you provide the code that those data can be shown similar to the photo for reference?  since I didn't know how to integrate the data into one observation per header record 

 

Jim_G
Pyrite | Level 9

Data household (drop=HorP_indicator) ;
Infile datalines dsd ;
retain Household_identity Type No_of_members;
input HorP_indicator :$10. @ ;
if length(HorP_indicator )>1 then do;
input @1 Household_identity :$10.
@12 Type $
No_of_members;
*Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income;
end;
else do;
input
@1 Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income ;
output;
end;
format DOB date9.;

datalines;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
proc print; run;
/* summary */;
data want; set household; by household_identity;
if first.household_identity then age+intck('year',dob,today())-12;
if e_status in ('FT','PT') then employed+1;
income+M_income;
if gender='M' then males+1; else females+1;
if last.household_identity then do; income=income/no_of_members;
output; employed=0; income=0; males=0; females=0; age=0;
end;
keep type males females age employed income household_identity;
proc print label split='*'; id household_identity type
males females employed age income;
label males='number of* male members'
females='number of*female members' age='age of *householder'
income='average*household*income'; run;

 

Jim

bimohkc
Calcite | Level 5

thanks for the solution! Instead of just adding a new data step, is it possible to output my desired output in one data step? sorry for any inconvenience as I am a SAS beginner

Jim_G
Pyrite | Level 9

It could be done in one data step in SAS but there are several reasons why 2 data steps are better.

 

Data step 1 is not just inputting a flat file, it is flattening a hierarchical file with 2 different record types. Data step 2 uses the BY statement with last. feature that tells we are on the last record of this household.  The by statement is not valid after the infile statement. So to put the logic to detect last household   in the first data step would complicate it unnecessarily.

 

In programming world there is maintenance. Either you or your peer will need to update this program and keeping the functions of flattening the hierarchical file and summarizing the results in different data steps keeps the logic  straight forward and makes maintenance easier.

 

It could be done but would require lots more logic.

 

Jim

 

   

bimohkc
Calcite | Level 5

Sorry for the inconvenience caused. Actually, your solution really helps me a lot. However, As I am now doing an assignment that requires me to read the above-shown in-stream data within one data step, the alternative of one data step is more suitable for my assignment requirement. Therefore, can you provide some insight or some example/crucial data step that I need to notice? 

 

Here are my assignment details:

Suppose the raw data file SURVEY.TXT (data set is not available) in ‘d:\Temp’ folder
contains the result of a recent household survey. The data file is hierarchical in
structure, consists of a header record for a household and followed by one detail
record for each member of the household. For example, a household with 3
members will have 3 detailed records exactly. There is no missing value in the data
file. Data values are separated by commas and they are arranged in the order of the
following variables:
 
 
 
Header record:
Variable                                             Format
 
Household identity                       Exactly 10 characters
 
Type of housing                            Character ‘A’ for private, and ‘B’ for public
                           
Number of members in the              Standard numeric 
household                                     
                                                     
Detail record:
Variable                                              Format
Member number of the
household                                    Standard numeric
Householder indicator                   Character ‘Y’ for yes, and ‘N’ for no
                                                      
Date of birth                                  ddmonyyyy
                                                            
Gender                                          Character ‘F’ for female, and ‘M’ for male
                                                     
 
Marital status                                Character ‘M’ for married, ‘S’ for single, and   ‘D’ for divorced
 
Achieved education level               Standard numeric with 0 for none, 1 for  primary, 2 for secondary, 3 for tertiary
 
Employment status                       Characters ‘FT’ for full time, ‘PT’ for part time,‘UE’ for employed, and ‘NA’ for not applicable
      
 
 
Monthly income                               Standard numeric
 
The first few records of the data file are displayed below:
Write a DATA step that reads the records from SURVEY.TXT and creates a SAS
data set that contains only one observation for each household regardless of the
number of members in the household. The created data set must contain only the
following variables but not necessary in the given order: household identity, type of
housing, number of male members (0 for none), number of female members (0 for
none), number of members are working (with either full time or part time job; 0 for
none), age of the householder as on 31 of December 2008 rounded to the nearest
half year, and average household income rounded to 2 decimal places.
 
The first few observations of the created data set are shown below for reference:
.....jpg
 
 
 
 
Tom
Super User Tom
Super User

@bimohkc wrote:
...
consists of a header record for a household and followed by one detail record for each member of the household.
For example, a household with 3 members will have 3 detailed records exactly. There is no missing value in the data
file.
...
 

This is normally a simple exercise. Use RETAIN statement to keep the values you read from the HOUSEHOLD record. Use OUTPUT statement to only write an observations when you are reading a DETAIL record.

 

The question does not provide any information on how you are supposed to tell the difference between the HOUSEHOLD and DETAIL records.  Is there another field that has that information? Or a COUNT on the HOUSEHOLD record that lets you know how many DETAIL record will follow?

bimohkc
Calcite | Level 5

What means the difference between household record and detail record?

The data has been listed in my previous post and listed as an in-stream data. Thanks for helping!

bimohkc
Calcite | Level 5

Thanks everyone for shortlisted my beginner stupid mistakes! But Actually, I am wondering how can I use those data to generate the output in my photo posted in the last post within one data step as  I found out it was difficult to finish it within one data step....Can anyone give some insight on how to project desired output into a dataset?

Tom
Super User Tom
Super User

Do you really need to use a data step or can you use other SAS procedure(s) to produce the report?

Are you sure you are supposed to generate a dataset or do you just need to reproduce the report?

 

For each column in the report you should first figure out what it represents.

  • How would you get that value from the dataset? 
  • Is it just a copy of an existing variable?
  • is it derived somehow? 
    • What is the definition of the derivation?

 

What did you try?  How did it not meet your needs?

 

Note: Don't generate the report in the same step that creates the data from the delimited file, that is just silly. That would only be a useful exercise if you already knew how to read the data and how to produce the report with a data step and wanted to learn how to streamline the process because you were working with extremely large data files where two passes through the data would take a extreme amount of time. Even then you could just code the read step as a data step view and then you could code two steps but only pass through the source once.

  

 

bimohkc
Calcite | Level 5

Since there is a requirement of 'Write a Data step', it means I can only generate this dataset by using the following data (which is same as the previous post)

A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000

.....jpg

I know it is silly, but i am curious how can this be done in the same dataset

 

Tom
Super User Tom
Super User

Don't post code/data/logs into the main window. Use the icons, {i} or SAS Running Man, on the menu bar to open a pop-up window to paste/edit code.

Here are some hints for reading a delimited file. DEFINE your variables before using them in statement like INPUT, FORMAT, INFORMAT or assignment.  Attach any required informats and/or formats to the variables. Then you can just use list mode on your INPUT statement(s).  If you define the variables in the order that appear in the source file then you can even use positional variable lists in your INPUT statement.

 

If you assume you can trust the number of members field then you do not need to use RETAIN since you can read all members of the household in one data step loop.

data want ;
  infile cards dsd truncover ;
  length Household_identity $10 Type $8 No_of_members 8
         Member_no 8 H_Indicator $8 DOB 8 Gender $1 M_status $8
         Edu_level 8 E_Status $8 M_Income 8
  ;
  informat DOB date. ;
  format DOB date9.;
  input Household_identity -- No_of_members ;
  do _n_=1 to No_of_members ;
     input Member_no -- M_income ;
     output;
  end;
cards;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;

 

bimohkc
Calcite | Level 5

Arrrr.... Since i misunderstood the question, your solution is the best fit for my purpose !!!!!! Anyway, thanks for helping me,everyone!

Ksharp
Super User

data want ;
  infile cards dsd truncover ;
length  a b c $ 20;
retain  a b c ;
input aa : $20. @;
if anyalpha(aa)=1 then do;
  a=aa;
  input b : $20. c : $20.;
end;
else do;
 x1=aa;
 input (x2-x8) (: $20.);
 output;
end;
drop aa;
cards;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1502 views
  • 1 like
  • 5 in conversation