BookmarkSubscribeRSS Feed
sas_Forum
Calcite | Level 5

HI art i have tested your code with 8 lakh obs but the output is getting wrong the pan1,add1,pan2,pan3 is moving to 2-11 differnt household id ideally one pan should be in one Household id only it should not move to anothe household id..

for exx pan1 --xyz  

           pan2--abc

           pan3 --lpo

In the above exaple xyz is having Household id like 200,678,12,34,89  like that...

used this code..

data want (keep=pan1 add1 pan2 pan3 household);

  if _n_ eq 1 then do;

    declare hash ha(hashexp: 16);

    ha.definekey('key');

    ha.definedata('hhold');

    ha.definedone();

  end;

  set test;

  array _house{*} $ 40 pan1--pan3;

  do i=1 to dim(_house);

    key=_house{i};

    call missing(hhold);

    rc=ha.find();

    if rc=0 then do;

      found=1;

      household=hhold;

      leave;

   end;

  end;

  if not found then do;

    n+1;

    household=n;

  end;

  do j=1 to dim(_house);

    if not missing(_house{j}) then do;

     key=_house{j};

      hhold=household;

      ha.replace();

    end;

  end;

run;

sas_Forum
Calcite | Level 5

Art i have tested the code but for pan1 is fine it was not going in to two houshold ids but add1,pan2 and pan3 are going to two or three household id ...ex pan2-aaa the household should be single it should not have two or more household ids it is having the Household is like --23,45,999..

please can u help me in the remaining for the add1,pan2 and pan3 it should not go to more hta one householdids..

art297
Opal | Level 21

As long as there is no possibility of crossover households, and no problems with your data, the code appears to work correctly for all four variables.  Have you ensured that your four variables are of the same type and with the same lengths and don't contain any stray characters?

Ksharp
Super User

Can you post more output you want.

Just like:

pan1 add1 pan2 pan3    household

aaa   bbb    ccc     ddd     1

qqq   rrr       www   aaa     1

rrr     ppp    mmm lll          1

uuu   zzz    ffff      ppp       1

p       l        m          n       2

jjjj    eee     rrr       ooo      1

.    .   .   .                         3

sss   www  .    .                1

.         .        .        eee      1

Ksharp

Ksharp
Super User

And

I have a question. Consider the following situation.

pan1 pan2 pan3 add1  household

1         2         3       4           1

5         6         7        8           2

10        2         7      80        ?

How do you count the third obs?

in third obs, 2 is in obs1, however 7 is in obs2 .

sas_Forum
Calcite | Level 5

for the above output

obs pan1 add1 pan2 pan3    household

1   aaa   bbb    ccc     ddd     1

2   qqq   rrr       www   aaa     1

3   rrr     ppp    mmm lll          1

4   uuu   zzz    ffff      ppp       1

5    p       l        m          n       2

6    jjjj    eee     rrr       ooo      1

7  .    .   .   .                         3

8  sss   www  .    .                1

9  .         .        .        eee      1

In this for ex pan1 having aaa should be in Household 1 only if again repeated in add1 or pan2 or pa3 ,again

check for obs2 the having qqq or rrr or www or aaa all the observation if found any where all these should

have household 1 like obs3 rrr or ppp  or mmm or lll again all these should be in Household 1 only these obs should not go to other household ids they should have household=1 only   like that to alll....

Thqs ksharp for your responce

art297
Opal | Level 21

Another factor that could be causing the problem is a discrepancy between what the code is expecting and your actual data.  The code was written expecting the variables to be in the order as shown in your example (i.e., starting with pan1 and ending with pan4).

Try the code actually spelling out the variables rather than using a variable list.  i.e., replace pan1--pan3 with pan1 add1 pan2 pan3.

sas_Forum
Calcite | Level 5

tryed and in place of pan1-- pan3 i have made like this  pan1 add1 pan2 pan3  and cheked also but also the pan and add are moving to more than one household id .

please help me in this...

art297
Opal | Level 21

Like Ksharp suggested, post your code including a datastep that reads in enough records (using a datalines statement), to show the error you think you are getting.

Each time I run the code I think I am getting the correct answer but, of course, I could be overlooking something.

Ksharp
Super User

And there is also another problem. It is invisible character such as tab character '09'x and hex blank '00'x.

Maybe you need to clean your data firstly, only keep character you need.

I think Art 's code is right. If there is not crossover problem.

and I recode it again.

Still not quit sure  whether it is what you want.

data temp;
infile datalines dsd dlm='|' truncover;
input (pan1 add1 pan2 pan3 ) (: $40.);
datalines;
M5669K|4HH5BFWBBMBY48Y7P8S||
I2175Q|P5P8YBBMBY48Y7P8S0005D||
S1495P|8BWP_Y8V3MYVWBBMBBMB||
G4909P|WN8Y4YFN48FYP884P83Y4||
S0560L|488Y8N8N8H-K56H4YP8P||
F0994B|R8YPF8Y4P-0BWP8BBMBY||
H6408L|HICPM4BBPWFP4WV4MP8YBB||
M2091M|48YJ23N3P8VWBBMBY48Y7||
S2426K|5V4PZHKB8YPNPVWPVBBMBY||
D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L|
G9847L|XB3845-WM43YWBBMBY48Y7||
D3871H|D3PC34Z0-38W3YBBMBY48Y||
T0753D|2P8Y38Y84W3N4NPJ2R8P8Y||
P5328F|Z06V4B8YFYFYMY8FMYVWBB||
D3938K|KSJ2VBYWBY4P8YBBMBY48Y||
A3529D|K0K4F4BY8BBXVBBMBY48Y8||
P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L
R1129A|Z0KMV4YWM483NBBMB8Y7P8S||
R|KK0ZJ2PYJ2NW8K5SH8BV84||
S7807B|YPFBYBP4WNWN88YBMBBMBY||
N8681Q|KHH4B8MMYNW3Y8YBBMBY48||
P8632Q|H550HFMBM4YPFYFPBBMBY4||
K3437H|DZ6MYVNN3YWMBY48Y7P8S0||
G5510C|ZH@YPBY84BB$$$$$$$$$||
R2518M|HZB8NV8MWXYBBY48Y7P8S00||
M1014A|B8XY5IZH_38VWBBMBBMBY4||
M0049P|KZYJ2C8Y3BNCVY4VP8YBBM||
PD4296R|K0KVPJ23YV43PFY3P8VWBB||
L5351M|KF47PP8NPCN4VWNWBBMS00||
D7744C|56PW3WF3W8P3P8VWBBMBY4||
U1088L|#8B8WP8FPMP3YWBBMS0005||
U1088L|#8B8WP8FPMP3YWBBMBY48Y||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
B0102C|@P7P8W4@PZIYBWDSZ4M_W||
U1088L|#8B8WP8FPMPWBBMBY48Y||
U0564D|#8B8WP8FPMPYWBBMBY48Y||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
B1476E|S05YCGY4N4CYHZKPYBPBBMS||
C2940N|56PW3WF3W8P3P8VWBBMBY4||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
N8784Q|S0K4PY4WD65YWBBMBY48Y7||
G8553G|KZBP8SFYFPNYBBMBY48Y7P||
S5904H|60KZZ2Y3NCP5J2Y3N3P8VW||
S3482C|Z05Y4YJ24VVYWN884YBBMB||
U1088L|#8B8WP8FPMPYWBBMBY48Y||
K3031C|ZH0HF8YPFY84Y83NBBMBY||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
P4466N|IHY48CH84YBBY48Y7P8S00||
J2314G|5WXBM8Y3YWBBBMBY48Y7P8||
R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0||
TI7823D|78MH-ZZ3M8GYBBMBBMBY48||
B0509R|50Z88B5V48YN4FYFPBBMB||
M0682N|B4YGWP443YPY3WMBBMBY48||
K0725D|ZH0HF8YPFY8Y4Y83NBBMBY||
P9185M|Z-8WX6SW8BPBMBY48Y7P8S||
V2636H|Z05WWMFJ2YC4P4WWMFBBMBY||
K9954E|MJ2FPFYBCBBY48Y7P8$$$$||
PS0210M|S0-SBBW88YW8BBMBY48Y7P||
PS5842M|38FW-WXBV4FWX_BNWXBBMB||
M5480A|50Z44YWVWNWMBY48Y7P8S0||
V2825L|SCF84C@WBWBYBYWP8YBBMB||
J9090J|BM83Y8P4NN6V883NBBMBY4||
G9332F|Z0SZ4W8HV8FPBBMBY48Y7P||
M2017K|K6KDNPJ28ZK4YBBMBY48Y7||
M6901D|I-NPX488V3YPBBMBY48Y7P8||
S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K|
M4134E|DNB4FHZIHZ43Y4BBMBY48Y||
G7604R|5-Y8PRPMWPF4YBBMBY48Y7||
S3468A|HCFP48FWWV88BBMBY48Y7P||
S0796N|K0SP3KPVFYWBWP8BBMBY48||
D7781Q|H0DN48484P8883NBBMBY48|S2395K|
B4607P|HBPFWIS3N3YBMBY48Y7P8S||
C4398Q|K0HKCYBX8YMKMYVWBBMBY4||
J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q|
D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K
K2110J|HKS6FYNWFYHMWBBMBY48Y7P||
M5116F|S5SIB8BP85GBBMBY48Y7P8||
M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q
P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J|
S9598G|KZWXB48VYBPBBMBY48Y7P8||
K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M|
S2395K|ZSZ48PF#NFVBW8BBMBBMBY||
M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M
B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K
S5018D|KZ0HVMWM4Y3BBMBY48Y7P8||
K5943E|Z6PP8VPM83N8BPMV8BBMBY||
N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K|
HJ7138K|MV4YK08W38BBMBY48Y7P8S||
 | | |
R9004J|K0KCNP7P85SBBMBY48Y7P||
F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q
M2724Q|ZZZ54RFP8GYBMBBMBY48Y7||
D9109E|YVYB4W46SWYBMBBMBY48Y7P||
M9585E|KBGWY3YB2WBMBY48Y7P8S00||
 | | |
H3629M|-_P8VY-0WP8BMBY48Y7P8S0||
T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K|
M9295K|Z5ZMPFV8M8BMBBMBY48Y7||
P3089F|K6KDNPJ28ZK4YBBMBY48Y7||
PO2499R|-K067P8KPYBMBY48Y7P8S0||
L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K|
1088L|||
;
run;
data want(keep=pan1 pan2 pan3 add1 household);
if _n_ eq 1 then do;
 declare hash ha(hashexp: 16);
  ha.definekey('key');
  ha.definedata('index');
  ha.definedone();
end;
length key $ 50;
set temp;
array _house{*} $ 32 pan1 pan2 pan3 add1;
do i=1 to dim(_house);
 key=strip(_house{i});rc=ha.find();
 if rc=0 then do;
                found=1;
                    household=index;leave;
                 end;
end;
if not found then do;
                    n+1;household=n;index=n;
                      end;
do j=1 to dim(_house);
 if not missing(strip(_house{j})) and _house{j} not in ('09'x '00'x) then do; key=strip(_house{j});ha.replace();end;
end;
run;

Ksharp

消息编辑者为:xia keshan

sas_Forum
Calcite | Level 5

Ksharp i have checked with your recent code in the given i am getting output with out any wrong household ids ,but in the above example i just kept "|" as delimiter to post the example observations

i the real time i am working with the sas dataset that is alredy created ,when i have ran this code on 8 laks obs the pan1 is getting correctly the obs in pan1 are going only in to one Househodld but while coming to add1,pan2,pan3 the pan2 obs also should have ex-BBB in pan2 should have only in one household like that

i am struggling for this..tnqs

Ksharp
Super User

My code is almost like Art's code.

I doubted that your data have some invisible character such as tab character '09'x and hex blank '00'x, which will

mislead to judge.

So I recommend you to clean data firstly.

And another underlying problem is whether there is a crossover problem . Such as

pan1 pan2 pan3 add1  household

1         2         3       4           1

5         6         7        8           2

10        2         7      80        ?

How do you count the third obs?

in third obs, 2 is in obs1, however 7 is in obs2 .

Ksharp

sas_Forum
Calcite | Level 5

In the above condition the output should be like this

obs  pan1 pan2 pan3 add1  household

   1   1         2         3       4           1

   2     5         6         7        8           1

   3    10        2         7      80        1

first we will get houdhold 1 then it should check weather any of the four obs are there in the entire dataset then that obs also should get the same household id as 1 ,in this condition the obs3 aslo should get hid as 1 as pan2 having 2 is coming again so it shd get hid as 1. not only this if the rest of observations like 80.84,10,5,6 if there are

again repeating they should also go to hid 1 only...thqs

Ksharp
Super User

Oh. My God.

I understood what is your mean. How about this code:

data test;
infile datalines dsd dlm='|' truncover;
input (pan1 pan2 pan3 add1) (: $40.);
datalines;
M5669K|4HH5BFWBBMBY48Y7P8S||
I2175Q|P5P8YBBMBY48Y7P8S0005D||
S1495P|8BWP_Y8V3MYVWBBMBBMB||
G4909P|WN8Y4YFN48FYP884P83Y4||
S0560L|488Y8N8N8H-K56H4YP8P||
F0994B|R8YPF8Y4P-0BWP8BBMBY||
H6408L|HICPM4BBPWFP4WV4MP8YBB||
M2091M|48YJ23N3P8VWBBMBY48Y7||
S2426K|5V4PZHKB8YPNPVWPVBBMBY||
D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L|
G9847L|XB3845-WM43YWBBMBY48Y7||
D3871H|D3PC34Z0-38W3YBBMBY48Y||
T0753D|2P8Y38Y84W3N4NPJ2R8P8Y||
P5328F|Z06V4B8YFYFYMY8FMYVWBB||
D3938K|KSJ2VBYWBY4P8YBBMBY48Y||
A3529D|K0K4F4BY8BBXVBBMBY48Y8||
P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L
R1129A|Z0KMV4YWM483NBBMB8Y7P8S||
R|KK0ZJ2PYJ2NW8K5SH8BV84||
S7807B|YPFBYBP4WNWN88YBMBBMBY||
N8681Q|KHH4B8MMYNW3Y8YBBMBY48||
P8632Q|H550HFMBM4YPFYFPBBMBY4||
K3437H|DZ6MYVNN3YWMBY48Y7P8S0||
G5510C|ZH@YPBY84BB$$$$$$$$$||
R2518M|HZB8NV8MWXYBBY48Y7P8S00||
M1014A|B8XY5IZH_38VWBBMBBMBY4||
M0049P|KZYJ2C8Y3BNCVY4VP8YBBM||
PD4296R|K0KVPJ23YV43PFY3P8VWBB||
L5351M|KF47PP8NPCN4VWNWBBMS00||
D7744C|56PW3WF3W8P3P8VWBBMBY4||
U1088L|#8B8WP8FPMP3YWBBMS0005||
U1088L|#8B8WP8FPMP3YWBBMBY48Y||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
B0102C|@P7P8W4@PZIYBWDSZ4M_W||
U1088L|#8B8WP8FPMPWBBMBY48Y||
U0564D|#8B8WP8FPMPYWBBMBY48Y||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
B1476E|S05YCGY4N4CYHZKPYBPBBMS||
C2940N|56PW3WF3W8P3P8VWBBMBY4||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
N8784Q|S0K4PY4WD65YWBBMBY48Y7||
G8553G|KZBP8SFYFPNYBBMBY48Y7P||
S5904H|60KZZ2Y3NCP5J2Y3N3P8VW||
S3482C|Z05Y4YJ24VVYWN884YBBMB||
U1088L|#8B8WP8FPMPYWBBMBY48Y||
K3031C|ZH0HF8YPFY84Y83NBBMBY||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
P4466N|IHY48CH84YBBY48Y7P8S00||
J2314G|5WXBM8Y3YWBBBMBY48Y7P8||
R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0||
TI7823D|78MH-ZZ3M8GYBBMBBMBY48||
B0509R|50Z88B5V48YN4FYFPBBMB||
M0682N|B4YGWP443YPY3WMBBMBY48||
K0725D|ZH0HF8YPFY8Y4Y83NBBMBY||
P9185M|Z-8WX6SW8BPBMBY48Y7P8S||
V2636H|Z05WWMFJ2YC4P4WWMFBBMBY||
K9954E|MJ2FPFYBCBBY48Y7P8$$$$||
PS0210M|S0-SBBW88YW8BBMBY48Y7P||
PS5842M|38FW-WXBV4FWX_BNWXBBMB||
M5480A|50Z44YWVWNWMBY48Y7P8S0||
V2825L|SCF84C@WBWBYBYWP8YBBMB||
J9090J|BM83Y8P4NN6V883NBBMBY4||
G9332F|Z0SZ4W8HV8FPBBMBY48Y7P||
M2017K|K6KDNPJ28ZK4YBBMBY48Y7||
M6901D|I-NPX488V3YPBBMBY48Y7P8||
S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K|
M4134E|DNB4FHZIHZ43Y4BBMBY48Y||
G7604R|5-Y8PRPMWPF4YBBMBY48Y7||
S3468A|HCFP48FWWV88BBMBY48Y7P||
S0796N|K0SP3KPVFYWBWP8BBMBY48||
D7781Q|H0DN48484P8883NBBMBY48|S2395K|
B4607P|HBPFWIS3N3YBMBY48Y7P8S||
C4398Q|K0HKCYBX8YMKMYVWBBMBY4||
J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q|
D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K
K2110J|HKS6FYNWFYHMWBBMBY48Y7P||
M5116F|S5SIB8BP85GBBMBY48Y7P8||
M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q
P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J|
S9598G|KZWXB48VYBPBBMBY48Y7P8||
K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M|
S2395K|ZSZ48PF#NFVBW8BBMBBMBY||
M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M
B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K
S5018D|KZ0HVMWM4Y3BBMBY48Y7P8||
K5943E|Z6PP8VPM83N8BPMV8BBMBY||
N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K|
HJ7138K|MV4YK08W38BBMBY48Y7P8S||
  | | |
R9004J|K0KCNP7P85SBBMBY48Y7P||
F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q
M2724Q|ZZZ54RFP8GYBMBBMBY48Y7||
D9109E|YVYB4W46SWYBMBBMBY48Y7P||
M9585E|KBGWY3YB2WBMBY48Y7P8S00||
  | | |
H3629M|-_P8VY-0WP8BMBY48Y7P8S0||
T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K|
M9295K|Z5ZMPFV8M8BMBBMBY48Y7||
P3089F|K6KDNPJ28ZK4YBBMBY48Y7||
PO2499R|-K067P8KPYBMBY48Y7P8S0||
L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K|
1088L|||
;
run;

data want(keep=pan1 pan2 pan3 add1 household);
if _n_ =1 then do;
 if 0 then set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1));
 declare hash ha(hashexp: 16,
  dataset: 'work.test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1))',
  multidata: 'Y');
 declare hiter hi('ha');
  ha.definekey('_pan1');
  ha.definedata('_pan1','_pan2','_pan3','_add1');
  ha.definedone();

 declare hash _ha(hashexp: 16);
  _ha.definekey('key');
  _ha.definedata('household');
  _ha.definedone();
end;
   set test ;
   length key $ 50;
   key=pan1;r1=_ha.find();
   key=pan2;r2=_ha.find();
   key=pan3;r3=_ha.find();
   key=add1;r4=_ha.find();
   if r1 and r2 and r3 and r4 then do;
                     n+1; household=n;
                     key=pan1;if not missing(key) then _ha.replace();
                     key=pan2;if not missing(key) then _ha.replace();
                     key=pan3;if not missing(key) then _ha.replace();
                     key=add1;if not missing(key) then _ha.replace();
                   end;
   rc=hi.first();
   do while(rc=0);
   if 
   ((pan1=_pan1 and not missing(_pan1))  or  (pan1=_pan2 and not missing(_pan2))  or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or 
    (pan2=_pan1 and not missing(_pan1))  or  (pan2=_pan2 and not missing(_pan2))  or (pan2=_pan3 and not missing(_pan3)) or (pan2=_add1 and not missing(_add1)) or 
    (pan3=_pan1 and not missing(_pan1))  or  (pan3=_pan2 and not missing(_pan2))  or (pan3=_pan3 and not missing(_pan3)) or (pan3=_add1 and not missing(_add1)) or 
    (add1=_pan1 and not missing(_pan1))  or  (add1=_pan2 and not missing(_pan2))  or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1))  )
    then do; 
           key=_pan1;if not missing(key) then _ha.replace();
           key=_pan2;if not missing(key) then _ha.replace();
           key=_pan3;if not missing(key) then _ha.replace();
           key=_add1;if not missing(key) then _ha.replace();
         end;
   rc=hi.next();
   end;
run;

Ksharp

sas_Forum
Calcite | Level 5

Ksharp i have tryed your new code with 80 thousand obs it was runiing for 40mins and not complted so i stopped it,i am having

8 crores of data ,previously you or art has given some code it has run in 20 sec for 8lakh data..can u check this..

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 124 replies
  • 3693 views
  • 4 likes
  • 7 in conversation