08-29-2011 08:27 AM
Grouping and Giving one id
Hi i am having pan1,pan2,pan3 and add1
now i want to generate household id on that
input pan1 pan2 pan3 add1
1 2 3 4
5 6 7 8
. . . .
1 2 7 8
1 5 9 12
13 14 15 16
12 18 19 .
Here i want to create one unique number as Household id
first it should check the pan1 if any of the pan are in pan2 or pan3 or add1 it should give one
household id like versa pan2 should check the observations matching in pan1 or pan3 or add1
and pan3 and add1 to do the same like pan1 and pan2..
my final createria is to get one number unique if any of the pan1,pan2,pan3 and add1 obs matching
any where and i should generate one new number if all the observations are blank.
These is the process of creating household in Dataflux i am doing this in baase sas.
pan1 pan2 pan3 add1 Household
1 2 3 4 1
5 6 7 8 1
. . . . 2
1 2 7 8 1
1 5 9 12 1
13 14 15 16 3
12 18 19 . 1
08-29-2011 08:33 AM
You've posted this twice but haven't provided enough information for one to understand what you are trying to do. You seem to want to increase Household by 1 each time something does or doesn't happen, but I wasn't able to understand what that event might be.
Also, you will have to clearly indicate if your actual data are just going to be numbers, like in your example and, if so, how large those numbers can be.
08-29-2011 08:46 AM
I want the Household id for the if there is a observation 123 in pan1 and if this observations again accor in pan2 at any place and if in pan3 and add1 then these all should get one Household id not only that
pan1 pan2 pan3 add1
123 456 789 345
567 123 098 067
In this example if 123 is repeating on pan1 and pan2 then these two get one H_id not only that the number corresponding to
123 if occuring any where like(456 789 345 in obs1 and 567 098 067 in obs 2) like that they should
get same H_id...
08-29-2011 01:32 PM
It sounds like you want to split your ids into connected subgraphs. If you have SAS/OR licensed I have had success using PROC NETDRAW to do this. Otherwise you can code your own logic.
I would start by converting your data from the current four variables to two variables and populate it with pairs of values from your four variables.
array x pan1-pan3 add1 ;
do i=1 to dim(x)-1;
do j=i+1 to dim(x);
id1=x(i); id2=x(j); output;
id1=x(j); id2=x(i); output;
keep id1 id2 ;
proc sort data=relations nodupkey;
by id1 id2;
You can grow your own subnets by looping and adding in all of the connected ids from the current list until the current subnet stops growing.
08-30-2011 03:26 AM
Still not understand what is your mean totally.
data test; input pan1 pan2 pan3 add1 ; cards; 1 2 3 4 5 6 7 8 . . . . 1 2 7 8 1 5 9 12 13 14 15 16 12 18 19 . ; run; data want(keep=pan1 pan2 pan3 add1 household); set test; retain first; count+1; _count=0; do i=1 to _nobs; set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1)) nobs=_nobs point=i; _count+1; if count ne _count and ((pan1=_pan1 and not missing(_pan1)) or (pan1=_pan2 and not missing(_pan2)) or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or (pan2=_pan1 and not missing(_pan1)) or (pan2=_pan2 and not missing(_pan2)) or (pan2=_pan3 and not missing(_pan3)) or (pan2=_add1 and not missing(_add1)) or (pan3=_pan1 and not missing(_pan1)) or (pan3=_pan2 and not missing(_pan2)) or (pan3=_pan3 and not missing(_pan3)) or (pan3=_add1 and not missing(_add1)) or (add1=_pan1 and not missing(_pan1)) or (add1=_pan2 and not missing(_pan2)) or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1)) ) then do; _n+1; if _n eq 1 then do; n+1; first=n; household=n; end; else household=first; found=1;leave; end; end; if not found then do; n+1; household=n; end; run;
Message was edited by: xia keshan