Solved: Re: How can i use these code to generate output shown in the photo?

bimohkc · Posted 10-28-2017 08:50 AM

Data household (drop=HorP_indicator) ;
Infile datalines dsd ;
retain Household_identity Type No_of_members;
input HorP_indicator :$10. @ ;
if length(HorP_indicator )>1 then do;
input @1 Household_identity :$10.
@12 Type $
No_of_members
Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income;
end;
else do;
input
@1 Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income ;
end;
format DOB date9.;

datalines;

A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
run;

Tom · Posted 10-28-2017 01:40 PM

Don't post code/data/logs into the main window. Use the icons, {i} or SAS Running Man, on the menu bar to open a pop-up window to paste/edit code.

Here are some hints for reading a delimited file. DEFINE your variables before using them in statement like INPUT, FORMAT, INFORMAT or assignment. Attach any required informats and/or formats to the variables. Then you can just use list mode on your INPUT statement(s). If you define the variables in the order that appear in the source file then you can even use positional variable lists in your INPUT statement.

If you assume you can trust the number of members field then you do not need to use RETAIN since you can read all members of the household in one data step loop.

data want ;
  infile cards dsd truncover ;
  length Household_identity $10 Type $8 No_of_members 8
         Member_no 8 H_Indicator $8 DOB 8 Gender $1 M_status $8
         Edu_level 8 E_Status $8 M_Income 8
  ;
  informat DOB date. ;
  format DOB date9.;
  input Household_identity -- No_of_members ;
  do _n_=1 to No_of_members ;
     input Member_no -- M_income ;
     output;
  end;
cards;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;

View solution in original post

andreas_lds · Posted 10-28-2017 09:51 AM

The following step reads the datalines and creates one observation for each row in the details part of your data. Normal group-processing should be sufficient to create the required output.

data have;
   length
      Household_identity $ 10
      Type $ 1
      No_of_members 8
      
      Member_no 8
      H_Indicator $ 1
      DOB 8
      Gender $ 1
      M_status $ 1
      Edu_level 8
      E_Status $ 2
      M_Income 8;     
   ;
   
   informat dob date9.;
   format dob date9.;
   
   retain Household_identity Type No_of_members;
   
   infile datalines delimiter=",";
   input @;
   
   if countc(_infile_, ',') = 2 then do;
      input Household_identity Type No_of_members;
   end;
   else do;
      input Member_no
         H_Indicator
         DOB
         Gender 
         M_status
         Edu_level
         E_Status
         M_Income
      ;
      output;
   end;
   
datalines;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;
run;

bimohkc · Posted 10-28-2017 10:51 AM

Thank for your editing and advice.but in fact, can you provide the code that those data can be shown similar to the photo for reference? since I didn't know how to integrate the data into one observation per header record

Jim_G · Posted 10-28-2017 11:09 AM

Data household (drop=HorP_indicator) ;
Infile datalines dsd ;
retain Household_identity Type No_of_members;
input HorP_indicator :$10. @ ;
if length(HorP_indicator )>1 then do;
input @1 Household_identity :$10.
@12 Type $
No_of_members;
*Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income;
end;
else do;
input
@1 Member_no
H_Indicator $
DOB :date9.
Gender $
M_status $
Edu_level
E_Status $
M_Income ;
output;
end;
format DOB date9.;

datalines;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
proc print; run;
/* summary */;
data want; set household; by household_identity;
if first.household_identity then age+intck('year',dob,today())-12;
if e_status in ('FT','PT') then employed+1;
income+M_income;
if gender='M' then males+1; else females+1;
if last.household_identity then do; income=income/no_of_members;
output; employed=0; income=0; males=0; females=0; age=0;
end;
keep type males females age employed income household_identity;
proc print label split='*'; id household_identity type
males females employed age income;
label males='number of* male members'
females='number of*female members' age='age of *householder'
income='average*household*income'; run;

Jim

bimohkc · Posted 10-28-2017 11:41 AM

thanks for the solution! Instead of just adding a new data step, is it possible to output my desired output in one data step? sorry for any inconvenience as I am a SAS beginner

Jim_G · Posted 10-28-2017 12:21 PM

It could be done in one data step in SAS but there are several reasons why 2 data steps are better.

Data step 1 is not just inputting a flat file, it is flattening a hierarchical file with 2 different record types. Data step 2 uses the BY statement with last. feature that tells we are on the last record of this household. The by statement is not valid after the infile statement. So to put the logic to detect last household in the first data step would complicate it unnecessarily.

In programming world there is maintenance. Either you or your peer will need to update this program and keeping the functions of flattening the hierarchical file and summarizing the results in different data steps keeps the logic straight forward and makes maintenance easier.

It could be done but would require lots more logic.

Jim

bimohkc · Posted 10-28-2017 12:59 PM

Sorry for the inconvenience caused. Actually, your solution really helps me a lot. However, As I am now doing an assignment that requires me to read the above-shown in-stream data within one data step, the alternative of one data step is more suitable for my assignment requirement. Therefore, can you provide some insight or some example/crucial data step that I need to notice?

Here are my assignment details:

Suppose the raw data file SURVEY.TXT (data set is not available) in ‘d:\Temp’ folder

contains the result of a recent household survey. The data file is hierarchical in

structure, consists of a header record for a household and followed by one detail

record for each member of the household. For example, a household with 3

members will have 3 detailed records exactly. There is no missing value in the data

file. Data values are separated by commas and they are arranged in the order of the

following variables:

Header record:

Variable Format

Household identity Exactly 10 characters

Type of housing Character ‘A’ for private, and ‘B’ for public

Number of members in the Standard numeric

household

Detail record:

Variable Format

Member number of the

household Standard numeric

Householder indicator Character ‘Y’ for yes, and ‘N’ for no

Date of birth ddmonyyyy

Gender Character ‘F’ for female, and ‘M’ for male

Marital status Character ‘M’ for married, ‘S’ for single, and ‘D’ for divorced

Achieved education level Standard numeric with 0 for none, 1 for primary, 2 for secondary, 3 for tertiary

Employment status Characters ‘FT’ for full time, ‘PT’ for part time,‘UE’ for employed, and ‘NA’ for not applicable

Monthly income Standard numeric

The first few records of the data file are displayed below:

Write a DATA step that reads the records from SURVEY.TXT and creates a SAS

data set that contains only one observation for each household regardless of the

number of members in the household. The created data set must contain only the

following variables but not necessary in the given order: household identity, type of

housing, number of male members (0 for none), number of female members (0 for

none), number of members are working (with either full time or part time job; 0 for

none), age of the householder as on 31 of December 2008 rounded to the nearest

half year, and average household income rounded to 2 decimal places.

The first few observations of the created data set are shown below for reference:

Tom · Posted 10-28-2017 01:27 PM

@bimohkc wrote:
...

consists of a header record for a household and followed by one detail record for each member of the household.

For example, a household with 3 members will have 3 detailed records exactly. There is no missing value in the data

file.

...

This is normally a simple exercise. Use RETAIN statement to keep the values you read from the HOUSEHOLD record. Use OUTPUT statement to only write an observations when you are reading a DETAIL record.

The question does not provide any information on how you are supposed to tell the difference between the HOUSEHOLD and DETAIL records. Is there another field that has that information? Or a COUNT on the HOUSEHOLD record that lets you know how many DETAIL record will follow?

bimohkc · Posted 10-28-2017 01:40 PM

What means the difference between household record and detail record?

The data has been listed in my previous post and listed as an in-stream data. Thanks for helping!

bimohkc · Posted 10-28-2017 02:03 PM

Thanks everyone for shortlisted my beginner stupid mistakes! But Actually, I am wondering how can I use those data to generate the output in my photo posted in the last post within one data step as I found out it was difficult to finish it within one data step....Can anyone give some insight on how to project desired output into a dataset?

Tom · Posted 10-28-2017 02:33 PM

Do you really need to use a data step or can you use other SAS procedure(s) to produce the report?

Are you sure you are supposed to generate a dataset or do you just need to reproduce the report?

For each column in the report you should first figure out what it represents.

How would you get that value from the dataset?
Is it just a copy of an existing variable?
is it derived somehow?
- What is the definition of the derivation?

What did you try? How did it not meet your needs?

Note: Don't generate the report in the same step that creates the data from the delimited file, that is just silly. That would only be a useful exercise if you already knew how to read the data and how to produce the report with a data step and wanted to learn how to streamline the process because you were working with extremely large data files where two passes through the data would take a extreme amount of time. Even then you could just code the read step as a data step view and then you could code two steps but only pass through the source once.

bimohkc · Posted 10-28-2017 02:46 PM

Since there is a requirement of 'Write a Data step', it means I can only generate this dataset by using the following data (which is same as the previous post)

A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000

I know it is silly, but i am curious how can this be done in the same dataset

Tom · Posted 10-28-2017 01:40 PM

Don't post code/data/logs into the main window. Use the icons, {i} or SAS Running Man, on the menu bar to open a pop-up window to paste/edit code.

Here are some hints for reading a delimited file. DEFINE your variables before using them in statement like INPUT, FORMAT, INFORMAT or assignment. Attach any required informats and/or formats to the variables. Then you can just use list mode on your INPUT statement(s). If you define the variables in the order that appear in the source file then you can even use positional variable lists in your INPUT statement.

If you assume you can trust the number of members field then you do not need to use RETAIN since you can read all members of the household in one data step loop.

data want ;
  infile cards dsd truncover ;
  length Household_identity $10 Type $8 No_of_members 8
         Member_no 8 H_Indicator $8 DOB 8 Gender $1 M_status $8
         Edu_level 8 E_Status $8 M_Income 8
  ;
  informat DOB date. ;
  format DOB date9.;
  input Household_identity -- No_of_members ;
  do _n_=1 to No_of_members ;
     input Member_no -- M_income ;
     output;
  end;
cards;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;

bimohkc · Posted 10-28-2017 03:33 PM

Arrrr.... Since i misunderstood the question, your solution is the best fit for my purpose !!!!!! Anyway, thanks for helping me,everyone!

Ksharp · Posted 10-29-2017 07:32 AM


data want ;
  infile cards dsd truncover ;
length  a b c $ 20;
retain  a b c ;
input aa : $20. @;
if anyalpha(aa)=1 then do;
  a=aa;
  input b : $20. c : $20.;
end;
else do;
 x1=aa;
 input (x2-x8) (: $20.);
 output;
end;
drop aa;
cards;
A123456789,A,3
1,Y,15Feb1960,M,M,3,FT,55000
2,N,3Jun1965,F,M,3,UE,0
3,N,24Jan1988,M,S,2,NA,3000
A135790234,B,1
1,Y,19Oct1944,F,D,0,NA,3000
B234523456,A,2
1,Y,30Jun1978,F,M,1,PT,4000
2,N,21May1975,M,M,2,FT,30000
;
run;

Classroom Training Available!