HI art i have tested your code with 8 lakh obs but the output is getting wrong the pan1,add1,pan2,pan3 is moving to 2-11 differnt household id ideally one pan should be in one Household id only it should not move to anothe household id..
for exx pan1 --xyz
pan2--abc
pan3 --lpo
In the above exaple xyz is having Household id like 200,678,12,34,89 like that...
used this code..
data want (keep=pan1 add1 pan2 pan3 household);
if _n_ eq 1 then do;
declare hash ha(hashexp: 16);
ha.definekey('key');
ha.definedata('hhold');
ha.definedone();
end;
set test;
array _house{*} $ 40 pan1--pan3;
do i=1 to dim(_house);
key=_house{i};
call missing(hhold);
rc=ha.find();
if rc=0 then do;
found=1;
household=hhold;
leave;
end;
end;
if not found then do;
n+1;
household=n;
end;
do j=1 to dim(_house);
if not missing(_house{j}) then do;
key=_house{j};
hhold=household;
ha.replace();
end;
end;
run;
Art i have tested the code but for pan1 is fine it was not going in to two houshold ids but add1,pan2 and pan3 are going to two or three household id ...ex pan2-aaa the household should be single it should not have two or more household ids it is having the Household is like --23,45,999..
please can u help me in the remaining for the add1,pan2 and pan3 it should not go to more hta one householdids..
As long as there is no possibility of crossover households, and no problems with your data, the code appears to work correctly for all four variables. Have you ensured that your four variables are of the same type and with the same lengths and don't contain any stray characters?
Can you post more output you want.
Just like:
pan1 add1 pan2 pan3 household
aaa bbb ccc ddd 1
qqq rrr www aaa 1
rrr ppp mmm lll 1
uuu zzz ffff ppp 1
p l m n 2
jjjj eee rrr ooo 1
. . . . 3
sss www . . 1
. . . eee 1
Ksharp
And
I have a question. Consider the following situation.
pan1 pan2 pan3 add1 household
1 2 3 4 1
5 6 7 8 2
10 2 7 80 ?
How do you count the third obs?
in third obs, 2 is in obs1, however 7 is in obs2 .
for the above output
obs pan1 add1 pan2 pan3 household
1 aaa bbb ccc ddd 1
2 qqq rrr www aaa 1
3 rrr ppp mmm lll 1
4 uuu zzz ffff ppp 1
5 p l m n 2
6 jjjj eee rrr ooo 1
7 . . . . 3
8 sss www . . 1
9 . . . eee 1
In this for ex pan1 having aaa should be in Household 1 only if again repeated in add1 or pan2 or pa3 ,again
check for obs2 the having qqq or rrr or www or aaa all the observation if found any where all these should
have household 1 like obs3 rrr or ppp or mmm or lll again all these should be in Household 1 only these obs should not go to other household ids they should have household=1 only like that to alll....
Thqs ksharp for your responce
Another factor that could be causing the problem is a discrepancy between what the code is expecting and your actual data. The code was written expecting the variables to be in the order as shown in your example (i.e., starting with pan1 and ending with pan4).
Try the code actually spelling out the variables rather than using a variable list. i.e., replace pan1--pan3 with pan1 add1 pan2 pan3.
tryed and in place of pan1-- pan3 i have made like this pan1 add1 pan2 pan3 and cheked also but also the pan and add are moving to more than one household id .
please help me in this...
Like Ksharp suggested, post your code including a datastep that reads in enough records (using a datalines statement), to show the error you think you are getting.
Each time I run the code I think I am getting the correct answer but, of course, I could be overlooking something.
And there is also another problem. It is invisible character such as tab character '09'x and hex blank '00'x.
Maybe you need to clean your data firstly, only keep character you need.
I think Art 's code is right. If there is not crossover problem.
and I recode it again.
Still not quit sure whether it is what you want.
data temp; infile datalines dsd dlm='|' truncover; input (pan1 add1 pan2 pan3 ) (: $40.); datalines; M5669K|4HH5BFWBBMBY48Y7P8S|| I2175Q|P5P8YBBMBY48Y7P8S0005D|| S1495P|8BWP_Y8V3MYVWBBMBBMB|| G4909P|WN8Y4YFN48FYP884P83Y4|| S0560L|488Y8N8N8H-K56H4YP8P|| F0994B|R8YPF8Y4P-0BWP8BBMBY|| H6408L|HICPM4BBPWFP4WV4MP8YBB|| M2091M|48YJ23N3P8VWBBMBY48Y7|| S2426K|5V4PZHKB8YPNPVWPVBBMBY|| D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L| G9847L|XB3845-WM43YWBBMBY48Y7|| D3871H|D3PC34Z0-38W3YBBMBY48Y|| T0753D|2P8Y38Y84W3N4NPJ2R8P8Y|| P5328F|Z06V4B8YFYFYMY8FMYVWBB|| D3938K|KSJ2VBYWBY4P8YBBMBY48Y|| A3529D|K0K4F4BY8BBXVBBMBY48Y8|| P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L R1129A|Z0KMV4YWM483NBBMB8Y7P8S|| R|KK0ZJ2PYJ2NW8K5SH8BV84|| S7807B|YPFBYBP4WNWN88YBMBBMBY|| N8681Q|KHH4B8MMYNW3Y8YBBMBY48|| P8632Q|H550HFMBM4YPFYFPBBMBY4|| K3437H|DZ6MYVNN3YWMBY48Y7P8S0|| G5510C|ZH@YPBY84BB$$$$$$$$$|| R2518M|HZB8NV8MWXYBBY48Y7P8S00|| M1014A|B8XY5IZH_38VWBBMBBMBY4|| M0049P|KZYJ2C8Y3BNCVY4VP8YBBM|| PD4296R|K0KVPJ23YV43PFY3P8VWBB|| L5351M|KF47PP8NPCN4VWNWBBMS00|| D7744C|56PW3WF3W8P3P8VWBBMBY4|| U1088L|#8B8WP8FPMP3YWBBMS0005|| U1088L|#8B8WP8FPMP3YWBBMBY48Y|| U0564D|#8B8WP8FPMP3YWBBMBY48Y|| B0102C|@P7P8W4@PZIYBWDSZ4M_W|| U1088L|#8B8WP8FPMPWBBMBY48Y|| U0564D|#8B8WP8FPMPYWBBMBY48Y|| S9556K|@P7P8W46CN8SZ_WGP48PBB|| S9556K|@P7P8W46CN8SZ_WGP48PBB|| B1476E|S05YCGY4N4CYHZKPYBPBBMS|| C2940N|56PW3WF3W8P3P8VWBBMBY4|| U0564D|#8B8WP8FPMP3YWBBMBY48Y|| R0090B|_XN4S2YJ2F8MBY48Y7P8S0|| N8784Q|S0K4PY4WD65YWBBMBY48Y7|| G8553G|KZBP8SFYFPNYBBMBY48Y7P|| S5904H|60KZZ2Y3NCP5J2Y3N3P8VW|| S3482C|Z05Y4YJ24VVYWN884YBBMB|| U1088L|#8B8WP8FPMPYWBBMBY48Y|| K3031C|ZH0HF8YPFY84Y83NBBMBY|| R0090B|_XN4S2YJ2F8MBY48Y7P8S0|| P4466N|IHY48CH84YBBY48Y7P8S00|| J2314G|5WXBM8Y3YWBBBMBY48Y7P8|| R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0|| TI7823D|78MH-ZZ3M8GYBBMBBMBY48|| B0509R|50Z88B5V48YN4FYFPBBMB|| M0682N|B4YGWP443YPY3WMBBMBY48|| K0725D|ZH0HF8YPFY8Y4Y83NBBMBY|| P9185M|Z-8WX6SW8BPBMBY48Y7P8S|| V2636H|Z05WWMFJ2YC4P4WWMFBBMBY|| K9954E|MJ2FPFYBCBBY48Y7P8$$$$|| PS0210M|S0-SBBW88YW8BBMBY48Y7P|| PS5842M|38FW-WXBV4FWX_BNWXBBMB|| M5480A|50Z44YWVWNWMBY48Y7P8S0|| V2825L|SCF84C@WBWBYBYWP8YBBMB|| J9090J|BM83Y8P4NN6V883NBBMBY4|| G9332F|Z0SZ4W8HV8FPBBMBY48Y7P|| M2017K|K6KDNPJ28ZK4YBBMBY48Y7|| M6901D|I-NPX488V3YPBBMBY48Y7P8|| S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K| M4134E|DNB4FHZIHZ43Y4BBMBY48Y|| G7604R|5-Y8PRPMWPF4YBBMBY48Y7|| S3468A|HCFP48FWWV88BBMBY48Y7P|| S0796N|K0SP3KPVFYWBWP8BBMBY48|| D7781Q|H0DN48484P8883NBBMBY48|S2395K| B4607P|HBPFWIS3N3YBMBY48Y7P8S|| C4398Q|K0HKCYBX8YMKMYVWBBMBY4|| J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q| D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K K2110J|HKS6FYNWFYHMWBBMBY48Y7P|| M5116F|S5SIB8BP85GBBMBY48Y7P8|| M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J| S9598G|KZWXB48VYBPBBMBY48Y7P8|| K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M| S2395K|ZSZ48PF#NFVBW8BBMBBMBY|| M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K S5018D|KZ0HVMWM4Y3BBMBY48Y7P8|| K5943E|Z6PP8VPM83N8BPMV8BBMBY|| N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K| HJ7138K|MV4YK08W38BBMBY48Y7P8S|| | | | R9004J|K0KCNP7P85SBBMBY48Y7P|| F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q M2724Q|ZZZ54RFP8GYBMBBMBY48Y7|| D9109E|YVYB4W46SWYBMBBMBY48Y7P|| M9585E|KBGWY3YB2WBMBY48Y7P8S00|| | | | H3629M|-_P8VY-0WP8BMBY48Y7P8S0|| T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K| M9295K|Z5ZMPFV8M8BMBBMBY48Y7|| P3089F|K6KDNPJ28ZK4YBBMBY48Y7|| PO2499R|-K067P8KPYBMBY48Y7P8S0|| L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K| 1088L||| ; run; data want(keep=pan1 pan2 pan3 add1 household); if _n_ eq 1 then do; declare hash ha(hashexp: 16); ha.definekey('key'); ha.definedata('index'); ha.definedone(); end; length key $ 50; set temp; array _house{*} $ 32 pan1 pan2 pan3 add1; do i=1 to dim(_house); key=strip(_house{i});rc=ha.find(); if rc=0 then do; found=1; household=index;leave; end; end; if not found then do; n+1;household=n;index=n; end; do j=1 to dim(_house); if not missing(strip(_house{j})) and _house{j} not in ('09'x '00'x) then do; key=strip(_house{j});ha.replace();end; end; run;
Ksharp
消息编辑者为:xia keshan
Ksharp i have checked with your recent code in the given i am getting output with out any wrong household ids ,but in the above example i just kept "|" as delimiter to post the example observations
i the real time i am working with the sas dataset that is alredy created ,when i have ran this code on 8 laks obs the pan1 is getting correctly the obs in pan1 are going only in to one Househodld but while coming to add1,pan2,pan3 the pan2 obs also should have ex-BBB in pan2 should have only in one household like that
i am struggling for this..tnqs
My code is almost like Art's code.
I doubted that your data have some invisible character such as tab character '09'x and hex blank '00'x, which will
mislead to judge.
So I recommend you to clean data firstly.
And another underlying problem is whether there is a crossover problem . Such as
pan1 pan2 pan3 add1 household
1 2 3 4 1
5 6 7 8 2
10 2 7 80 ?
How do you count the third obs?
in third obs, 2 is in obs1, however 7 is in obs2 .
Ksharp
In the above condition the output should be like this
obs pan1 pan2 pan3 add1 household
1 1 2 3 4 1
2 5 6 7 8 1
3 10 2 7 80 1
first we will get houdhold 1 then it should check weather any of the four obs are there in the entire dataset then that obs also should get the same household id as 1 ,in this condition the obs3 aslo should get hid as 1 as pan2 having 2 is coming again so it shd get hid as 1. not only this if the rest of observations like 80.84,10,5,6 if there are
again repeating they should also go to hid 1 only...thqs
Oh. My God.
I understood what is your mean. How about this code:
data test; infile datalines dsd dlm='|' truncover; input (pan1 pan2 pan3 add1) (: $40.); datalines; M5669K|4HH5BFWBBMBY48Y7P8S|| I2175Q|P5P8YBBMBY48Y7P8S0005D|| S1495P|8BWP_Y8V3MYVWBBMBBMB|| G4909P|WN8Y4YFN48FYP884P83Y4|| S0560L|488Y8N8N8H-K56H4YP8P|| F0994B|R8YPF8Y4P-0BWP8BBMBY|| H6408L|HICPM4BBPWFP4WV4MP8YBB|| M2091M|48YJ23N3P8VWBBMBY48Y7|| S2426K|5V4PZHKB8YPNPVWPVBBMBY|| D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L| G9847L|XB3845-WM43YWBBMBY48Y7|| D3871H|D3PC34Z0-38W3YBBMBY48Y|| T0753D|2P8Y38Y84W3N4NPJ2R8P8Y|| P5328F|Z06V4B8YFYFYMY8FMYVWBB|| D3938K|KSJ2VBYWBY4P8YBBMBY48Y|| A3529D|K0K4F4BY8BBXVBBMBY48Y8|| P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L R1129A|Z0KMV4YWM483NBBMB8Y7P8S|| R|KK0ZJ2PYJ2NW8K5SH8BV84|| S7807B|YPFBYBP4WNWN88YBMBBMBY|| N8681Q|KHH4B8MMYNW3Y8YBBMBY48|| P8632Q|H550HFMBM4YPFYFPBBMBY4|| K3437H|DZ6MYVNN3YWMBY48Y7P8S0|| G5510C|ZH@YPBY84BB$$$$$$$$$|| R2518M|HZB8NV8MWXYBBY48Y7P8S00|| M1014A|B8XY5IZH_38VWBBMBBMBY4|| M0049P|KZYJ2C8Y3BNCVY4VP8YBBM|| PD4296R|K0KVPJ23YV43PFY3P8VWBB|| L5351M|KF47PP8NPCN4VWNWBBMS00|| D7744C|56PW3WF3W8P3P8VWBBMBY4|| U1088L|#8B8WP8FPMP3YWBBMS0005|| U1088L|#8B8WP8FPMP3YWBBMBY48Y|| U0564D|#8B8WP8FPMP3YWBBMBY48Y|| B0102C|@P7P8W4@PZIYBWDSZ4M_W|| U1088L|#8B8WP8FPMPWBBMBY48Y|| U0564D|#8B8WP8FPMPYWBBMBY48Y|| S9556K|@P7P8W46CN8SZ_WGP48PBB|| S9556K|@P7P8W46CN8SZ_WGP48PBB|| B1476E|S05YCGY4N4CYHZKPYBPBBMS|| C2940N|56PW3WF3W8P3P8VWBBMBY4|| U0564D|#8B8WP8FPMP3YWBBMBY48Y|| R0090B|_XN4S2YJ2F8MBY48Y7P8S0|| N8784Q|S0K4PY4WD65YWBBMBY48Y7|| G8553G|KZBP8SFYFPNYBBMBY48Y7P|| S5904H|60KZZ2Y3NCP5J2Y3N3P8VW|| S3482C|Z05Y4YJ24VVYWN884YBBMB|| U1088L|#8B8WP8FPMPYWBBMBY48Y|| K3031C|ZH0HF8YPFY84Y83NBBMBY|| R0090B|_XN4S2YJ2F8MBY48Y7P8S0|| P4466N|IHY48CH84YBBY48Y7P8S00|| J2314G|5WXBM8Y3YWBBBMBY48Y7P8|| R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0|| TI7823D|78MH-ZZ3M8GYBBMBBMBY48|| B0509R|50Z88B5V48YN4FYFPBBMB|| M0682N|B4YGWP443YPY3WMBBMBY48|| K0725D|ZH0HF8YPFY8Y4Y83NBBMBY|| P9185M|Z-8WX6SW8BPBMBY48Y7P8S|| V2636H|Z05WWMFJ2YC4P4WWMFBBMBY|| K9954E|MJ2FPFYBCBBY48Y7P8$$$$|| PS0210M|S0-SBBW88YW8BBMBY48Y7P|| PS5842M|38FW-WXBV4FWX_BNWXBBMB|| M5480A|50Z44YWVWNWMBY48Y7P8S0|| V2825L|SCF84C@WBWBYBYWP8YBBMB|| J9090J|BM83Y8P4NN6V883NBBMBY4|| G9332F|Z0SZ4W8HV8FPBBMBY48Y7P|| M2017K|K6KDNPJ28ZK4YBBMBY48Y7|| M6901D|I-NPX488V3YPBBMBY48Y7P8|| S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K| M4134E|DNB4FHZIHZ43Y4BBMBY48Y|| G7604R|5-Y8PRPMWPF4YBBMBY48Y7|| S3468A|HCFP48FWWV88BBMBY48Y7P|| S0796N|K0SP3KPVFYWBWP8BBMBY48|| D7781Q|H0DN48484P8883NBBMBY48|S2395K| B4607P|HBPFWIS3N3YBMBY48Y7P8S|| C4398Q|K0HKCYBX8YMKMYVWBBMBY4|| J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q| D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K K2110J|HKS6FYNWFYHMWBBMBY48Y7P|| M5116F|S5SIB8BP85GBBMBY48Y7P8|| M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J| S9598G|KZWXB48VYBPBBMBY48Y7P8|| K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M| S2395K|ZSZ48PF#NFVBW8BBMBBMBY|| M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K S5018D|KZ0HVMWM4Y3BBMBY48Y7P8|| K5943E|Z6PP8VPM83N8BPMV8BBMBY|| N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K| HJ7138K|MV4YK08W38BBMBY48Y7P8S|| | | | R9004J|K0KCNP7P85SBBMBY48Y7P|| F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q M2724Q|ZZZ54RFP8GYBMBBMBY48Y7|| D9109E|YVYB4W46SWYBMBBMBY48Y7P|| M9585E|KBGWY3YB2WBMBY48Y7P8S00|| | | | H3629M|-_P8VY-0WP8BMBY48Y7P8S0|| T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K| M9295K|Z5ZMPFV8M8BMBBMBY48Y7|| P3089F|K6KDNPJ28ZK4YBBMBY48Y7|| PO2499R|-K067P8KPYBMBY48Y7P8S0|| L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K| 1088L||| ; run; data want(keep=pan1 pan2 pan3 add1 household); if _n_ =1 then do; if 0 then set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1)); declare hash ha(hashexp: 16, dataset: 'work.test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1))', multidata: 'Y'); declare hiter hi('ha'); ha.definekey('_pan1'); ha.definedata('_pan1','_pan2','_pan3','_add1'); ha.definedone(); declare hash _ha(hashexp: 16); _ha.definekey('key'); _ha.definedata('household'); _ha.definedone(); end; set test ; length key $ 50; key=pan1;r1=_ha.find(); key=pan2;r2=_ha.find(); key=pan3;r3=_ha.find(); key=add1;r4=_ha.find(); if r1 and r2 and r3 and r4 then do; n+1; household=n; key=pan1;if not missing(key) then _ha.replace(); key=pan2;if not missing(key) then _ha.replace(); key=pan3;if not missing(key) then _ha.replace(); key=add1;if not missing(key) then _ha.replace(); end; rc=hi.first(); do while(rc=0); if ((pan1=_pan1 and not missing(_pan1)) or (pan1=_pan2 and not missing(_pan2)) or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or (pan2=_pan1 and not missing(_pan1)) or (pan2=_pan2 and not missing(_pan2)) or (pan2=_pan3 and not missing(_pan3)) or (pan2=_add1 and not missing(_add1)) or (pan3=_pan1 and not missing(_pan1)) or (pan3=_pan2 and not missing(_pan2)) or (pan3=_pan3 and not missing(_pan3)) or (pan3=_add1 and not missing(_add1)) or (add1=_pan1 and not missing(_pan1)) or (add1=_pan2 and not missing(_pan2)) or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1)) ) then do; key=_pan1;if not missing(key) then _ha.replace(); key=_pan2;if not missing(key) then _ha.replace(); key=_pan3;if not missing(key) then _ha.replace(); key=_add1;if not missing(key) then _ha.replace(); end; rc=hi.next(); end; run;
Ksharp
Ksharp i have tryed your new code with 80 thousand obs it was runiing for 40mins and not complted so i stopped it,i am having
8 crores of data ,previously you or art has given some code it has run in 20 sec for 8lakh data..can u check this..
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.