DATA Step, Macro, Functions and more

Logic needed

Reply
N/A
Posts: 1

Logic needed

Kindly help me with the logic for this one.

If the relation is husband, then all the other observations following it until another husband occurs should be read into a single record, and i need to get corrresponding spouse and child name in the same husbands record but with spouse and child name.

Data to be processed:
NAME AGE GENDER RELATION
A 40 M HUSBAND
B 30 F sPOUSE
C 10 M CHILD
D 40 M HUSBAND
E 30 F SPOUSE
F 10 M CHILD


REPORT:

Name age gender relation spouse child
a 40 m husband b c
d 40 m husband e f
Contributor
Posts: 31

Re: Logic needed

Hey Krishna,
Try this out

data dx;
infile datalines missover;
input NAME :$10. AGE :3. GENDER :$1. RELATION :$10.;
datalines;
A 40 M HUSBAND
B 30 F sPOUSE
C 10 M CHILD
D 40 M HUSBAND
E 30 F SPOUSE
F 10 M CHILD
;
data dxx;
set dx;
retain counter 0;
if strip(relation) = 'HUSBAND' then counter=counter+1 ;
run;

proc summary nway missing data= dxx;
class counter ;
output out=sushil(drop=_: relation_2 relation_3 gender_2 gender_3 age_2 age_3 counter)
idgroup( out{3} (NAME AGE GENDER RELATION)=);
run;
proc print data=sushil;run;

if you don't like proc summary, you can do datastep manipulation after the dxx dataset creation. I kinda love summary procedure, hence used it over here.

hope that helps Smiley Happy
SAS Employee
Posts: 160

Re: Logic needed

If your husband ALWAYS has only one SPOUSE and one CHILD, you can simply use this
[pre]
data dx;
infile datalines ;
input NAME :$10. AGE :3. GENDER :$1. RELATION :$10. / S_NAME :$10. / C_NAME :$10.;
datalines;
A 40 M HUSBAND
B 30 F SPOUSE
C 10 M CHILD
D 40 M HUSBAND
E 30 F SPOUSE
F 10 M CHILD
;
[/pre]

But I guess that is not always the case - Below I have added an extra child in the datalines.

[pre]
data TEST;
length name_HUSBAND type R_Name $10;
retain name_HUSBAND count;
input @8 type :$10. @; put type=;
if type='HUSBAND' then
do;
count=0;
input @1 name_HUSBAND :$10.;
end;
else
do;
if type='CHILD' then
do;
count+1;
type=cats(type,count);
end;
input @1 R_Name :$10.;
output TEST;
end;
datalines;
A 40 M HUSBAND
B 30 F SPOUSE
C 10 M CHILD
D 40 M HUSBAND
E 30 F SPOUSE
F 10 M CHILD
G 12 F CHILD
;
run;

proc transpose data=test out=trans(drop=_name_) prefix=name_;
by name_HUSBAND;
id type;
var r_name;
run;
[/pre]
SAS Super FREQ
Posts: 8,743

Re: Logic needed

Hi:
With just a few modifications to this program, you do NOT need the PROC TRANSPOSE step. You can create one obs for every family by using ARRAYS to hold the childrens' names, ages and genders and by RETAINING all the family information until you've collected 1 family.

Support you had this data stored in c:\temp\famdata.txt:
[pre]
A 40 M HUSBAND
B 30 F SPOUSE
C 10 M CHILD
D 40 M HUSBAND
E 30 F SPOUSE
F 10 M CHILD
G 12 F CHILD
H 40 M HUSBAND
I 30 F SPOUSE
J 12 M CHILD
K 14 F CHILD
L 40 M HUSBAND
M 12 M CHILD
N 14 F CHILD
O 40 M HUSBAND
P 38 F SPOUSE
[/pre]

(Note how the last family has no children and the second to the last family is a single-parent family with a husband and no spouse.)

Now the COUNT variable shows the number of children and could be used in subsequent programs as a DO loop counter. Here is the output from the program shown below.

cynthia
The Output
[pre]
test

name_ name_
Obs HUSBAND h_age h_gen Spouse s_age s_gen cn1 cn2 cn3 cn4 cg1 cg2 cg3 cg4 a1 a2 a3 a4 count

1 A 40 M B 30 F C M 10 . . . 1
2 D 40 M E 30 F F G M F 10 12 . . 2
3 H 40 M I 30 F J K M F 12 14 . . 2
4 L 40 M NONE . M N M F 12 14 . . 2
5 O 40 M P 38 F . . . . 0

[/pre]

using this program:
The ChangedProgram
[pre]
options nodate nonumber nocenter;
data TEST;
keep name_HUSBAND h_age h_gen
name_SPOUSE s_age s_gen
cn1 cn2 cn3 cn4
cg1 cg2 cg3 cg4
a1 a2 a3 a4 count;

length name_HUSBAND $10 h_age 8 h_gen $1
name_Spouse $10 s_age 8 s_gen $1
type R_Name $10;
array cn $10 cn1-cn4;
array cg $1 cg1-cg4;
array a a1-a4;
retain name_HUSBAND h_age h_gen
name_SPOUSE s_age s_gen
cn1 cn2 cn3 cn4 cg1 cg2 cg3 cg4
a1 a2 a3 a4 count;
label name_Husband = 'Husband'
name_Spouse = 'Spouse'
count = 'Number Children';

** note: can only use END= with a FILE, not with DATALINES;
infile 'c:\temp\famdata.txt' pad end=eof;
input @8 type :$10. @;
put type=;
if type='HUSBAND' then do;
** Output "saved" info every time a new HUSBAND dataline;
** is read -- after first dataline.;
** (when on 1st dataline, have not read the whole family yet);
** But when on a HUSBAND record, the previous family has been read.;
if _n_ gt 1 then output TEST;

** input new HUSBAND record;
input @1 name_HUSBAND $ h_age h_gen $;

** clear arrays for retained children info and ;
** reset spouse info for new family;
do i = 1 to 4 by 1;
cn(i) = ' ';
cg(i) = ' ';
a(i) = .;
end;
name_SPOUSE='NONE';
s_age = .;
s_gen = ' ';
count = 0;
end;
else do;
** Read input dataline into 2 holding values and;
** then assign to variables based on type.;
input @1 R_Name :$10. r_age r_gen $;

if type='CHILD' then do;
count+1;
cn(count) = R_Name;
cg(count) = r_gen;
a(count) = R_Age;
end;
if type = 'SPOUSE' then do;
name_SPOUSE = R_Name;
s_age = R_Age;
s_gen = r_gen;
end;
end;

** on last dataline, output the last family;
if eof = 1 then output TEST;
run;

proc print data=test;
title 'test';
run;

[/pre]
Ask a Question
Discussion stats
  • 3 replies
  • 139 views
  • 0 likes
  • 4 in conversation