Data household (drop=HorP_indicator) ;
Infile datalines dsd ;
retain Household_identity Type No_of_members;
input HorP_indicator :$10. @ ;
if length(HorP_indicator )>1 then do;
input @1 Household_identity :$10.
@12 Type $
No_of_members
Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income;
end;
else do;
input
@1 Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income ;
end;
format DOB date9.;
datalines;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
run;
Don't post code/data/logs into the main window. Use the icons, {i} or SAS Running Man, on the menu bar to open a pop-up window to paste/edit code.
Here are some hints for reading a delimited file. DEFINE your variables before using them in statement like INPUT, FORMAT, INFORMAT or assignment. Attach any required informats and/or formats to the variables. Then you can just use list mode on your INPUT statement(s). If you define the variables in the order that appear in the source file then you can even use positional variable lists in your INPUT statement.
If you assume you can trust the number of members field then you do not need to use RETAIN since you can read all members of the household in one data step loop.
data want ;
infile cards dsd truncover ;
length Household_identity $10 Type $8 No_of_members 8
Member_no 8 H_Indicator $8 DOB 8 Gender $1 M_status $8
Edu_level 8 E_Status $8 M_Income 8
;
informat DOB date. ;
format DOB date9.;
input Household_identity -- No_of_members ;
do _n_=1 to No_of_members ;
input Member_no -- M_income ;
output;
end;
cards;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;
The following step reads the datalines and creates one observation for each row in the details part of your data. Normal group-processing should be sufficient to create the required output.
data have;
length
Household_identity $ 10
Type $ 1
No_of_members 8
Member_no 8
H_Indicator $ 1
DOB 8
Gender $ 1
M_status $ 1
Edu_level 8
E_Status $ 2
M_Income 8;
;
informat dob date9.;
format dob date9.;
retain Household_identity Type No_of_members;
infile datalines delimiter=",";
input @;
if countc(_infile_, ',') = 2 then do;
input Household_identity Type No_of_members;
end;
else do;
input Member_no
H_Indicator
DOB
Gender
M_status
Edu_level
E_Status
M_Income
;
output;
end;
datalines;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;
run;
Thank for your editing and advice.but in fact, can you provide the code that those data can be shown similar to the photo for reference? since I didn't know how to integrate the data into one observation per header record
Data household (drop=HorP_indicator) ;
Infile datalines dsd ;
retain Household_identity Type No_of_members;
input HorP_indicator :$10. @ ;
if length(HorP_indicator )>1 then do;
input @1 Household_identity :$10.
@12 Type $
No_of_members;
*Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income;
end;
else do;
input
@1 Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income ;
output;
end;
format DOB date9.;
datalines;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
proc print; run;
/* summary */;
data want; set household; by household_identity;
if first.household_identity then age+intck('year',dob,today())-12;
if e_status in ('FT','PT') then employed+1;
income+M_income;
if gender='M' then males+1; else females+1;
if last.household_identity then do; income=income/no_of_members;
output; employed=0; income=0; males=0; females=0; age=0;
end;
keep type males females age employed income household_identity;
proc print label split='*'; id household_identity type
males females employed age income;
label males='number of* male members'
females='number of*female members' age='age of *householder'
income='average*household*income'; run;
Jim
thanks for the solution! Instead of just adding a new data step, is it possible to output my desired output in one data step? sorry for any inconvenience as I am a SAS beginner
It could be done in one data step in SAS but there are several reasons why 2 data steps are better.
Data step 1 is not just inputting a flat file, it is flattening a hierarchical file with 2 different record types. Data step 2 uses the BY statement with last. feature that tells we are on the last record of this household. The by statement is not valid after the infile statement. So to put the logic to detect last household in the first data step would complicate it unnecessarily.
In programming world there is maintenance. Either you or your peer will need to update this program and keeping the functions of flattening the hierarchical file and summarizing the results in different data steps keeps the logic straight forward and makes maintenance easier.
It could be done but would require lots more logic.
Jim
Sorry for the inconvenience caused. Actually, your solution really helps me a lot. However, As I am now doing an assignment that requires me to read the above-shown in-stream data within one data step, the alternative of one data step is more suitable for my assignment requirement. Therefore, can you provide some insight or some example/crucial data step that I need to notice?
Here are my assignment details:
@bimohkc wrote:
...consists of a header record for a household and followed by one detail record for each member of the household.For example, a household with 3 members will have 3 detailed records exactly. There is no missing value in the datafile....
This is normally a simple exercise. Use RETAIN statement to keep the values you read from the HOUSEHOLD record. Use OUTPUT statement to only write an observations when you are reading a DETAIL record.
The question does not provide any information on how you are supposed to tell the difference between the HOUSEHOLD and DETAIL records. Is there another field that has that information? Or a COUNT on the HOUSEHOLD record that lets you know how many DETAIL record will follow?
What means the difference between household record and detail record?
The data has been listed in my previous post and listed as an in-stream data. Thanks for helping!
Thanks everyone for shortlisted my beginner stupid mistakes! But Actually, I am wondering how can I use those data to generate the output in my photo posted in the last post within one data step as I found out it was difficult to finish it within one data step....Can anyone give some insight on how to project desired output into a dataset?
Do you really need to use a data step or can you use other SAS procedure(s) to produce the report?
Are you sure you are supposed to generate a dataset or do you just need to reproduce the report?
For each column in the report you should first figure out what it represents.
What did you try? How did it not meet your needs?
Note: Don't generate the report in the same step that creates the data from the delimited file, that is just silly. That would only be a useful exercise if you already knew how to read the data and how to produce the report with a data step and wanted to learn how to streamline the process because you were working with extremely large data files where two passes through the data would take a extreme amount of time. Even then you could just code the read step as a data step view and then you could code two steps but only pass through the source once.
Since there is a requirement of 'Write a Data step', it means I can only generate this dataset by using the following data (which is same as the previous post)
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
I know it is silly, but i am curious how can this be done in the same dataset
Don't post code/data/logs into the main window. Use the icons, {i} or SAS Running Man, on the menu bar to open a pop-up window to paste/edit code.
Here are some hints for reading a delimited file. DEFINE your variables before using them in statement like INPUT, FORMAT, INFORMAT or assignment. Attach any required informats and/or formats to the variables. Then you can just use list mode on your INPUT statement(s). If you define the variables in the order that appear in the source file then you can even use positional variable lists in your INPUT statement.
If you assume you can trust the number of members field then you do not need to use RETAIN since you can read all members of the household in one data step loop.
data want ;
infile cards dsd truncover ;
length Household_identity $10 Type $8 No_of_members 8
Member_no 8 H_Indicator $8 DOB 8 Gender $1 M_status $8
Edu_level 8 E_Status $8 M_Income 8
;
informat DOB date. ;
format DOB date9.;
input Household_identity -- No_of_members ;
do _n_=1 to No_of_members ;
input Member_no -- M_income ;
output;
end;
cards;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;
Arrrr.... Since i misunderstood the question, your solution is the best fit for my purpose !!!!!! Anyway, thanks for helping me,everyone!
data want ; infile cards dsd truncover ; length a b c $ 20; retain a b c ; input aa : $20. @; if anyalpha(aa)=1 then do; a=aa; input b : $20. c : $20.; end; else do; x1=aa; input (x2-x8) (: $20.); output; end; drop aa; cards; A123456789,A,3 1,Y,15Feb1960,M,M,3,FT,55000 2,N,3Jun1965,F,M,3,UE,0 3,N,24Jan1988,M,S,2,NA,3000 A135790234,B,1 1,Y,19Oct1944,F,D,0,NA,3000 B234523456,A,2 1,Y,30Jun1978,F,M,1,PT,4000 2,N,21May1975,M,M,2,FT,30000 ; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.