I may have corrected the code myself, but would definitely appreciate feedback regarding whether I did it correctly. The following code appears to accomplish what the OP had actually wanted:
data want (keep=pan1 pan2 pan3 add1 household);
if _n_ eq 1 then do;
declare hash ha(hashexp: 16);
ha.definekey('key');
ha.definedata('hhold');
ha.definedone();
end;
set test;
array _house{*} $ 40 pan1--pan3;
do i=1 to dim(_house);
key=_house{i};
call missing(hhold);
rc=ha.find();
if rc=0 then do;
found=1;
household=hhold;
leave;
end;
end;
if not found then do;
n+1;
household=n;
end;
do j=1 to dim(_house);
if not missing(_house{j}) then do;
key=_house{j};
hhold=household;
ha.replace();
end;
end;
run;
Art297.
I have already written a code like you before at this forum.
And I also remener the OP is R_Win.
But your code is not what sas_Forum want.
I ran your code based on the following data, but the result is wrong.Plz see the following.
aaa bbb ccc ddd
qqq rrr www aaa
rrr ppp mmm lll
uuu zzz ffff ppp
p l m n
jjjj eee rrr ooo
. . . .
sss www . .
. . . eee
Your output:
pan1 pan2 pan3 add1 household
aaa bbb ccc ddd 1
qqq rrr www aaa 2
rrr ppp mmm lll 2
uuu zzz ffff ppp 3
p l m n 4
jjjj eee rrr ooo 2
5
sss www 2
eee 6
Regards.
Ksharp
Ksharp,
I'm not sure what you are saying. The results you showed are NOT the results of the code I posted. I just ran the code against your data as shown below:
The results I got were:
pan1 add1 pan2 pan3 household
aaa bbb ccc ddd 1
qqq rrr www aaa 1
rrr ppp mmm lll 1
uuu zzz ffff ppp 1
p l m n 2
jjjj eee rrr ooo 1
. . . . 3
sss www . . 1
. . . eee 1
As I understood sas_Forum's original request (at the top of this thread), that was precisely the desired result.
Records 1-4,6,8 and 9 all get a 1 because they all share something in common.
Record 5 gets a 2 because all of its values are unique as compared to the previous values;
And, finally, record 7 gets a 3 because all of its values were missing.
I'm not sure what you ran, but the above is what I got using the code that I posted (and shown again below).
Art
p.s. Congratulations on attaining the fourth highest point total on the forum.
data test;
input (pan1 add1 pan2 pan3) ($);
cards;
aaa bbb ccc ddd
qqq rrr www aaa
rrr ppp mmm lll
uuu zzz ffff ppp
p l m n
jjjj eee rrr ooo
. . . .
sss www . .
. . . eee
data want (keep=pan1 pan2 pan3 add1 household);
if _n_ eq 1 then do;
declare hash ha(hashexp: 16);
ha.definekey('key');
ha.definedata('hhold');
ha.definedone();
end;
set test;
array _house{*} $ 40 pan1--pan3;
do i=1 to dim(_house);
key=_house{i};
call missing(hhold);
rc=ha.find();
if rc=0 then do;
found=1;
household=hhold;
leave;
end;
end;
if not found then do;
n+1;
household=n;
end;
do j=1 to dim(_house);
if not missing(_house{j}) then do;
key=_house{j};
hhold=household;
ha.replace();
end;
end;
run;
Art thqs it working but u did not added add1 in the array ,but that may create a probelm i hope
Art.T already add add1 into array.
Check whether the code is what your need.
ksharp i did not find the add1 added in the array ..
pan1 -- pan3
It is a range, already include add1,since in the dataset order of variable is pan1 add1 pan2 pan3.
Actually I wrote this code before ,at the R_Win post.
http://communities.sas.com/thread/30690
Ksharp
ya i checked it Thqs Ksharp and art for your Help...
Ksharp,
I didn't check it line by line but, yes, the code is either VERY similar if not exactly the same as the code you suggested to the other thread. The code you suggested in the current thread, though, assigned a "1" for household each time a match was found.
Did you ever discover why you thought that the code I presented didn't work correctly?
Because I test the data is:
aaa bbb ccc ddd
. . . .
qqq rrr www aaa
. . . .
not like:
aaa bbb ccc ddd
qqq rrr www aaa
data copied have a missing line between every two obs.
Art297.
I have a question. Consider the following situation.
pan1 pan2 pan3 add1 household
1 2 3 4 1
5 6 7 8 2
10 2 7 80 ?
How do you count the third obs?
in third obs, 2 is in obs1, however 7 is in obs2 .
Ksharp
HI Ksharp i have tested art code e with 8 lakh obs but the output is getting wrong the pan1,add1,pan2,pan3 is moving to 2-11 differnt household id ideally one pan should be in one Household id only it should not move to anothe household id..
for exx pan1 --xyz
pan2--abc
pan3 --lpo
In the above exaple xyz is having Household id like 200,678,12,34,89 like that...
Your origin data not include pan4,
And what is your logic? Your explaination is ambiguous.
Can you post some more data and what output you would like.
Give some more sample data will be more helpful.
Ksharp
sorry in place of it pan 4 is nothing but add1 i have renamed it ,ok finally in pan1,add1,pan2 and pan3 but regaring data i can not post that much i am having huge data i have posted 100 records ,but my logic is if any pan is in one household it should be in that household only it should not move to another household id's ..
Art i have tested the code but for pan1 is fine it was not going in to two houshold ids but add1,pan2 and pan3 are going to two or three household id ...ex pan2-aaa the household should be single it should not have two or more household ids it is having the Household is like --23,45,999..
please can u help me in the remaining for the add1,pan2 and pan3 it should not go to more than one householdids..
Ksharp,
I tested it with missing values in every other record and still appear to get the correct results.
The code was not designed to account for crossover household members. Doing so would require knowing the maximum number of potential crossovers.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.