Grouping and Giving one id
Hi i am having pan1,pan2,pan3 and add1
now i want to generate household id on that
data test;
input pan1 pan2 pan3 add1
cards;
1 2 3 4
5 6 7 8
. . . .
1 2 7 8
1 5 9 12
13 14 15 16
12 18 19 .
run;
Here i want to create one unique number as Household id
first it should check the pan1 if any of the pan are in pan2 or pan3 or add1 it should give one
household id like versa pan2 should check the observations matching in pan1 or pan3 or add1
and pan3 and add1 to do the same like pan1 and pan2..
my final createria is to get one number unique if any of the pan1,pan2,pan3 and add1 obs matching
any where and i should generate one new number if all the observations are blank.
These is the process of creating household in Dataflux i am doing this in baase sas.
output:
pan1 pan2 pan3 add1 Household
1 2 3 4 1
5 6 7 8 1
. . . . 2
1 2 7 8 1
1 5 9 12 1
13 14 15 16 3
12 18 19 . 1
"my final createria is to get one number unique if any of the pan1,pan2,pan3 and add1 obs matching"
I do not understand what your mean is totally.
If there were two or more rows are all missing value, how do you count?
And Under what situation, the count will increase one?
Ksharp
IF any of two columns are missing and if the remaining two columns are matching with the obs in that then it should get the same Household id else it should get new Household_id.The count should increse based on the previuos Household_id
output:
obs pan1 pan2 pan3 add1 Household
1) 1 2 3 4 1 (is is the starting and got Householdid as 1)
2) 5 6 7 8 1 (It also got 1 becase in obs 5 the pan2 is 5 and pan1 is 1 and in first obs pan1 is also 1 any corresponding to Hidrelinking should get 1)
3) . . . . 2 (all are null and shd get new Hid)
4) 1 2 7 8 1( pan1 is matching with obs1)
5) 1 5 9 12 1 (add1 is matching with obs of 1 with pan1)
6)13 14 15 16 3 (unique no matcihng got new Hid)
7)12 18 19 . 1(in obs 5 the pan3 is matching with pan1 in this ibs so got Hid as 1)
The only way I can think to promote performance efficiency is to use Hash Table(see the following), but it need lots and lots of memory for your large table, I do not think your computer has so big memory. So If you could, split the large table into several small table.
About "all are character variables" , that is not a issue, my code is also suited for character variable.
data test; input pan1 pan2 pan3 add1 ; cards; 1 2 3 4 5 6 7 8 . . . . 1 2 7 8 1 5 9 12 13 14 15 16 12 18 19 . ; run; data want(keep=pan1 pan2 pan3 add1 household); declare hash ha(hashexp: 16,ordered: 'A'); declare hiter hi('ha'); ha.definekey('_count'); ha.definedata('_count','_pan1','_pan2','_pan3','_add1'); ha.definedone(); do until(last); set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1)) end=last; _count+1;ha.add(); end; do until(_last); set test end=_last; count+1; found=0; rc=hi.first(); do while(rc=0); if count ne _count and ((pan1=_pan1 and not missing(_pan1)) or (pan1=_pan2 and not missing(_pan2)) or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or (pan2=_pan1 and not missing(_pan1)) or (pan2=_pan2 and not missing(_pan2)) or (pan2=_pan3 and not missing(_pan3)) or (pan2=_add1 and not missing(_add1)) or (pan3=_pan1 and not missing(_pan1)) or (pan3=_pan2 and not missing(_pan2)) or (pan3=_pan3 and not missing(_pan3)) or (pan3=_add1 and not missing(_add1)) or (add1=_pan1 and not missing(_pan1)) or (add1=_pan2 and not missing(_pan2)) or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1)) ) then do; _n+1; if _n eq 1 then do; n+1; first=n; household=n; end; else household=first; found=1;leave; end; rc=hi.next(); end; if not found then do; n+1; household=n; end; output; end; stop; run;
Ksharp
Hi. I do not understand what is your mean.
The Hash Table method is limited by your PC memory.
So you must have large memory to run code for your large table.
Speed I think will be faster .
Ksharp
Can you post sample data for these 500 obs.
And if one obs has a missing value and another obs has also a missing value, how do you count?
I tested it , not found error. the following is output I get.
data temp; infile datalines dsd dlm='|' truncover; input (pan1 pan2 pan3 add1) (: $40.); datalines; M5669K|4HH5BFWBBMBY48Y7P8S|| I2175Q|P5P8YBBMBY48Y7P8S0005D|| S1495P|8BWP_Y8V3MYVWBBMBBMB|| G4909P|WN8Y4YFN48FYP884P83Y4|| S0560L|488Y8N8N8H-K56H4YP8P|| F0994B|R8YPF8Y4P-0BWP8BBMBY|| H6408L|HICPM4BBPWFP4WV4MP8YBB|| M2091M|48YJ23N3P8VWBBMBY48Y7|| S2426K|5V4PZHKB8YPNPVWPVBBMBY|| D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L| G9847L|XB3845-WM43YWBBMBY48Y7|| D3871H|D3PC34Z0-38W3YBBMBY48Y|| T0753D|2P8Y38Y84W3N4NPJ2R8P8Y|| P5328F|Z06V4B8YFYFYMY8FMYVWBB|| D3938K|KSJ2VBYWBY4P8YBBMBY48Y|| A3529D|K0K4F4BY8BBXVBBMBY48Y8|| P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L R1129A|Z0KMV4YWM483NBBMB8Y7P8S|| R|KK0ZJ2PYJ2NW8K5SH8BV84|| S7807B|YPFBYBP4WNWN88YBMBBMBY|| N8681Q|KHH4B8MMYNW3Y8YBBMBY48|| P8632Q|H550HFMBM4YPFYFPBBMBY4|| K3437H|DZ6MYVNN3YWMBY48Y7P8S0|| G5510C|ZH@YPBY84BB$$$$$$$$$|| R2518M|HZB8NV8MWXYBBY48Y7P8S00|| M1014A|B8XY5IZH_38VWBBMBBMBY4|| M0049P|KZYJ2C8Y3BNCVY4VP8YBBM|| PD4296R|K0KVPJ23YV43PFY3P8VWBB|| L5351M|KF47PP8NPCN4VWNWBBMS00|| D7744C|56PW3WF3W8P3P8VWBBMBY4|| U1088L|#8B8WP8FPMP3YWBBMS0005|| U1088L|#8B8WP8FPMP3YWBBMBY48Y|| U0564D|#8B8WP8FPMP3YWBBMBY48Y|| B0102C|@P7P8W4@PZIYBWDSZ4M_W|| U1088L|#8B8WP8FPMPWBBMBY48Y|| U0564D|#8B8WP8FPMPYWBBMBY48Y|| S9556K|@P7P8W46CN8SZ_WGP48PBB|| S9556K|@P7P8W46CN8SZ_WGP48PBB|| B1476E|S05YCGY4N4CYHZKPYBPBBMS|| C2940N|56PW3WF3W8P3P8VWBBMBY4|| U0564D|#8B8WP8FPMP3YWBBMBY48Y|| R0090B|_XN4S2YJ2F8MBY48Y7P8S0|| N8784Q|S0K4PY4WD65YWBBMBY48Y7|| G8553G|KZBP8SFYFPNYBBMBY48Y7P|| S5904H|60KZZ2Y3NCP5J2Y3N3P8VW|| S3482C|Z05Y4YJ24VVYWN884YBBMB|| U1088L|#8B8WP8FPMPYWBBMBY48Y|| K3031C|ZH0HF8YPFY84Y83NBBMBY|| R0090B|_XN4S2YJ2F8MBY48Y7P8S0|| P4466N|IHY48CH84YBBY48Y7P8S00|| J2314G|5WXBM8Y3YWBBBMBY48Y7P8|| R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0|| TI7823D|78MH-ZZ3M8GYBBMBBMBY48|| B0509R|50Z88B5V48YN4FYFPBBMB|| M0682N|B4YGWP443YPY3WMBBMBY48|| K0725D|ZH0HF8YPFY8Y4Y83NBBMBY|| P9185M|Z-8WX6SW8BPBMBY48Y7P8S|| V2636H|Z05WWMFJ2YC4P4WWMFBBMBY|| K9954E|MJ2FPFYBCBBY48Y7P8$$$$|| PS0210M|S0-SBBW88YW8BBMBY48Y7P|| PS5842M|38FW-WXBV4FWX_BNWXBBMB|| M5480A|50Z44YWVWNWMBY48Y7P8S0|| V2825L|SCF84C@WBWBYBYWP8YBBMB|| J9090J|BM83Y8P4NN6V883NBBMBY4|| G9332F|Z0SZ4W8HV8FPBBMBY48Y7P|| M2017K|K6KDNPJ28ZK4YBBMBY48Y7|| M6901D|I-NPX488V3YPBBMBY48Y7P8|| S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K| M4134E|DNB4FHZIHZ43Y4BBMBY48Y|| G7604R|5-Y8PRPMWPF4YBBMBY48Y7|| S3468A|HCFP48FWWV88BBMBY48Y7P|| S0796N|K0SP3KPVFYWBWP8BBMBY48|| D7781Q|H0DN48484P8883NBBMBY48|S2395K| B4607P|HBPFWIS3N3YBMBY48Y7P8S|| C4398Q|K0HKCYBX8YMKMYVWBBMBY4|| J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q| D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K K2110J|HKS6FYNWFYHMWBBMBY48Y7P|| M5116F|S5SIB8BP85GBBMBY48Y7P8|| M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J| S9598G|KZWXB48VYBPBBMBY48Y7P8|| K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M| S2395K|ZSZ48PF#NFVBW8BBMBBMBY|| M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K S5018D|KZ0HVMWM4Y3BBMBY48Y7P8|| K5943E|Z6PP8VPM83N8BPMV8BBMBY|| N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K| HJ7138K|MV4YK08W38BBMBY48Y7P8S|| | | | R9004J|K0KCNP7P85SBBMBY48Y7P|| F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q M2724Q|ZZZ54RFP8GYBMBBMBY48Y7|| D9109E|YVYB4W46SWYBMBBMBY48Y7P|| M9585E|KBGWY3YB2WBMBY48Y7P8S00|| | | | H3629M|-_P8VY-0WP8BMBY48Y7P8S0|| T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K| M9295K|Z5ZMPFV8M8BMBBMBY48Y7|| P3089F|K6KDNPJ28ZK4YBBMBY48Y7|| PO2499R|-K067P8KPYBMBY48Y7P8S0|| L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K| 1088L||| ; run; data test; set temp; pan1=compress(pan1,' '); pan2=compress(pan2,' '); pan3=compress(pan3,' '); add1=compress(add1,' '); run; data want(keep=pan1 pan2 pan3 add1 household); declare hash ha(hashexp: 16,ordered: 'A'); declare hiter hi('ha'); ha.definekey('_count'); ha.definedata('_count','_pan1','_pan2','_pan3','_add1'); ha.definedone(); do until(last); set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1)) end=last; _count+1;ha.add(); end; do until(_last); set test end=_last; count+1; found=0; rc=hi.first(); do while(rc=0); if count ne _count and ((pan1=_pan1 and not missing(_pan1)) or (pan1=_pan2 and not missing(_pan2)) or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or (pan2=_pan1 and not missing(_pan1)) or (pan2=_pan2 and not missing(_pan2)) or (pan2=_pan3 and not missing(_pan3)) or (pan2=_add1 and not missing(_add1)) or (pan3=_pan1 and not missing(_pan1)) or (pan3=_pan2 and not missing(_pan2)) or (pan3=_pan3 and not missing(_pan3)) or (pan3=_add1 and not missing(_add1)) or (add1=_pan1 and not missing(_pan1)) or (add1=_pan2 and not missing(_pan2)) or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1)) ) then do; _n+1; if _n eq 1 then do; n+1; first=n; household=n; end; else household=first; found=1;leave; end; rc=hi.next(); end; if not found then do; n+1; household=n; end; output; end; stop; run; pan1 pan2 pan3 add1 household M5669K 4HH5BFWBBMBY48Y7P8S 1 I2175Q P5P8YBBMBY48Y7P8S0005D 2 S1495P 8BWP_Y8V3MYVWBBMBBMB 3 G4909P WN8Y4YFN48FYP884P83Y4 4 S0560L 488Y8N8N8H-K56H4YP8P 5 F0994B R8YPF8Y4P-0BWP8BBMBY 6 H6408L HICPM4BBPWFP4WV4MP8YBB 1 M2091M 48YJ23N3P8VWBBMBY48Y7 7 S2426K 5V4PZHKB8YPNPVWPVBBMBY 8 D7907L SBMBMVYYBBM8PBY48Y7P8S0 H6408L 1 G9847L XB3845-WM43YWBBMBY48Y7 9 D3871H D3PC34Z0-38W3YBBMBY48Y 10 T0753D 2P8Y38Y84W3N4NPJ2R8P8Y 11 P5328F Z06V4B8YFYFYMY8FMYVWBB 12 D3938K KSJ2VBYWBY4P8YBBMBY48Y 13 A3529D K0K4F4BY8BBXVBBMBY48Y8 14 P8617A I0S3YPV4P8BYVWMYVWBBM4 H6408L 1 R1129A Z0KMV4YWM483NBBMB8Y7P8S 15 R KK0ZJ2PYJ2NW8K5SH8BV84 16 S7807B YPFBYBP4WNWN88YBMBBMBY 17 N8681Q KHH4B8MMYNW3Y8YBBMBY48 18 P8632Q H550HFMBM4YPFYFPBBMBY4 19 K3437H DZ6MYVNN3YWMBY48Y7P8S0 20 G5510C ZH@YPBY84BB$$$$$$$$$ 21 R2518M HZB8NV8MWXYBBY48Y7P8S00 22 M1014A B8XY5IZH_38VWBBMBBMBY4 23 M0049P KZYJ2C8Y3BNCVY4VP8YBBM 24 PD4296R K0KVPJ23YV43PFY3P8VWBB 25 L5351M KF47PP8NPCN4VWNWBBMS00 26 D7744C 56PW3WF3W8P3P8VWBBMBY4 1 U1088L #8B8WP8FPMP3YWBBMS0005 1 U1088L #8B8WP8FPMP3YWBBMBY48Y 1 U0564D #8B8WP8FPMP3YWBBMBY48Y 1 B0102C @P7P8W4@PZIYBWDSZ4M_W 27 U1088L #8B8WP8FPMPWBBMBY48Y 1 U0564D #8B8WP8FPMPYWBBMBY48Y 1 S9556K @P7P8W46CN8SZ_WGP48PBB 1 S9556K @P7P8W46CN8SZ_WGP48PBB 1 B1476E S05YCGY4N4CYHZKPYBPBBMS 28 C2940N 56PW3WF3W8P3P8VWBBMBY4 1 U0564D #8B8WP8FPMP3YWBBMBY48Y 1 R0090B _XN4S2YJ2F8MBY48Y7P8S0 1 N8784Q S0K4PY4WD65YWBBMBY48Y7 29 G8553G KZBP8SFYFPNYBBMBY48Y7P 30 S5904H 60KZZ2Y3NCP5J2Y3N3P8VW 31 S3482C Z05Y4YJ24VVYWN884YBBMB 32 U1088L #8B8WP8FPMPYWBBMBY48Y 1 K3031C ZH0HF8YPFY84Y83NBBMBY 33 R0090B _XN4S2YJ2F8MBY48Y7P8S0 1 P4466N IHY48CH84YBBY48Y7P8S00 34 J2314G 5WXBM8Y3YWBBBMBY48Y7P8 35 R0090B _XN4SJ2YJ2F8MBY48Y7P8S0 1 TI7823D 78MH-ZZ3M8GYBBMBBMBY48 36 B0509R 50Z88B5V48YN4FYFPBBMB 37 M0682N B4YGWP443YPY3WMBBMBY48 38 K0725D ZH0HF8YPFY8Y4Y83NBBMBY 39 P9185M Z-8WX6SW8BPBMBY48Y7P8S 40 V2636H Z05WWMFJ2YC4P4WWMFBBMBY 41 K9954E MJ2FPFYBCBBY48Y7P8$$$$ 42 PS0210M S0-SBBW88YW8BBMBY48Y7P 43 PS5842M 38FW-WXBV4FWX_BNWXBBMB 44 M5480A 50Z44YWVWNWMBY48Y7P8S0 45 V2825L SCF84C@WBWBYBYWP8YBBMB 46 J9090J BM83Y8P4NN6V883NBBMBY4 47 G9332F Z0SZ4W8HV8FPBBMBY48Y7P 48 M2017K K6KDNPJ28ZK4YBBMBY48Y7 1 M6901D I-NPX488V3YPBBMBY48Y7P8 49 S0689N Z0ZZP2YF367YVWBBMBBMBY M5669K 1 M4134E DNB4FHZIHZ43Y4BBMBY48Y 50 G7604R 5-Y8PRPMWPF4YBBMBY48Y7 51 S3468A HCFP48FWWV88BBMBY48Y7P 52 S0796N K0SP3KPVFYWBWP8BBMBY48 53 D7781Q H0DN48484P8883NBBMBY48 S2395K 1 B4607P HBPFWIS3N3YBMBY48Y7P8S 54 C4398Q K0HKCYBX8YMKMYVWBBMBY4 55 J9787E SNMZD4PB8PFBBMBY48Y7P8 D7781Q 1 D1187E BYP8BWBY4FPP8YBBMBBMBY S2395K 1 K2110J HKS6FYNWFYHMWBBMBY48Y7P 56 M5116F S5SIB8BP85GBBMBY48Y7P8 57 M3134J 6HWJ2P78NVBBY48Y7P8S00 D7781Q 1 P1590P S0Z4PYVY84PBBMBY48Y7P8 M3134J 1 S9598G KZWXB48VYBPBBMBY48Y7P8 1 K9168B -HJ2M88MJ2WMBY48Y7P8S0 H3629M 1 S2395K ZSZ48PF#NFVBW8BBMBBMBY 1 M2140L 48VP33W8BNB8BW8BBMBBMB H3629M 1 B4952H YJ2PB3CN8MYBBMBY48Y7P8 S9598G S2395K 1 S5018D KZ0HVMWM4Y3BBMBY48Y7P8 58 K5943E Z6PP8VPM83N8BPMV8BBMBY 59 N6677M ZKYCSVW8BF8VWBBMBY48Y7 M5669K 1 HJ7138K MV4YK08W38BBMBY48Y7P8S 60 61 R9004J K0KCNP7P85SBBMBY48Y7P 1 F0109Q Z8W8SSVY4VYNM8VBBMBBMB R9004J M2724Q 1 M2724Q ZZZ54RFP8GYBMBBMBY48Y7 1 D9109E YVYB4W46SWYBMBBMBY48Y7P 62 M9585E KBGWY3YB2WBMBY48Y7P8S00 63 64 H3629M -_P8VY-0WP8BMBY48Y7P8S0 1 T2426B 50ZYC8NHI8YBCVCFYP8BBM M9295K 1 M9295K Z5ZMPFV8M8BMBBMBY48Y7 1 P3089F K6KDNPJ28ZK4YBBMBY48Y7 1 PO2499R -K067P8KPYBMBY48Y7P8S0 65 L3447Q 33PV4V5I633F4YBBMBY48Y M5669K 1 1088L 66
Ksharp
Hi sharp it worked the output for household was wrong for example the pan1 R0090B is going to Hid 1 but it is not having any linkage in Hid 1 the corresponding pan2 is not having any linkage in the Hid of 1
My answer: Actually i wnat to create the Household id to customers having any linkage between them in pan1,2,3 and add1
if they have linkage they should get my Hid .if pan1 is having any linkage between them and the correspoding pan2 if present in any of the four variables should get the same Hid and if next to that if any pan is there and linking with any of the four variables it should also get linked and get the same Hid...
Ksharp Thqs for your patience and Help...
Or you can use compress function to remove the invisibility characher Such as '09'x (tab character).
May be that is reason.
data test; set temp; pan1=compress(pan1, ,'s'); pan2=compress(pan2, ,'s'); pan3=compress(pan3, ,'s'); add1=compress(add1, ,'s'); run;
Ksharp
if pan1 and add1 are not missing and if the value of pan1 and add1 matched in any of the four values it should get the same Household id else it should get new houshold id...
Oh. You have change the rules. You only want pan1 and add1 to compare?
How about this:
data test; infile datalines dsd dlm='|' truncover; input (pan1 add1 pan2 pan3) (: $40.); datalines; M5669K|4HH5BFWBBMBY48Y7P8S|| I2175Q|P5P8YBBMBY48Y7P8S0005D|| S1495P|8BWP_Y8V3MYVWBBMBBMB|| G4909P|WN8Y4YFN48FYP884P83Y4|| S0560L|488Y8N8N8H-K56H4YP8P|| F0994B|R8YPF8Y4P-0BWP8BBMBY|| H6408L|HICPM4BBPWFP4WV4MP8YBB|| M2091M|48YJ23N3P8VWBBMBY48Y7|| S2426K|5V4PZHKB8YPNPVWPVBBMBY|| D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L| G9847L|XB3845-WM43YWBBMBY48Y7|| D3871H|D3PC34Z0-38W3YBBMBY48Y|| T0753D|2P8Y38Y84W3N4NPJ2R8P8Y|| P5328F|Z06V4B8YFYFYMY8FMYVWBB|| D3938K|KSJ2VBYWBY4P8YBBMBY48Y|| A3529D|K0K4F4BY8BBXVBBMBY48Y8|| P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L R1129A|Z0KMV4YWM483NBBMB8Y7P8S|| R|KK0ZJ2PYJ2NW8K5SH8BV84|| S7807B|YPFBYBP4WNWN88YBMBBMBY|| N8681Q|KHH4B8MMYNW3Y8YBBMBY48|| P8632Q|H550HFMBM4YPFYFPBBMBY4|| K3437H|DZ6MYVNN3YWMBY48Y7P8S0|| G5510C|ZH@YPBY84BB$$$$$$$$$|| R2518M|HZB8NV8MWXYBBY48Y7P8S00|| M1014A|B8XY5IZH_38VWBBMBBMBY4|| M0049P|KZYJ2C8Y3BNCVY4VP8YBBM|| PD4296R|K0KVPJ23YV43PFY3P8VWBB|| L5351M|KF47PP8NPCN4VWNWBBMS00|| D7744C|56PW3WF3W8P3P8VWBBMBY4|| U1088L|#8B8WP8FPMP3YWBBMS0005|| U1088L|#8B8WP8FPMP3YWBBMBY48Y|| U0564D|#8B8WP8FPMP3YWBBMBY48Y|| B0102C|@P7P8W4@PZIYBWDSZ4M_W|| U1088L|#8B8WP8FPMPWBBMBY48Y|| U0564D|#8B8WP8FPMPYWBBMBY48Y|| S9556K|@P7P8W46CN8SZ_WGP48PBB|| S9556K|@P7P8W46CN8SZ_WGP48PBB|| B1476E|S05YCGY4N4CYHZKPYBPBBMS|| C2940N|56PW3WF3W8P3P8VWBBMBY4|| U0564D|#8B8WP8FPMP3YWBBMBY48Y|| R0090B|_XN4S2YJ2F8MBY48Y7P8S0|| N8784Q|S0K4PY4WD65YWBBMBY48Y7|| G8553G|KZBP8SFYFPNYBBMBY48Y7P|| S5904H|60KZZ2Y3NCP5J2Y3N3P8VW|| S3482C|Z05Y4YJ24VVYWN884YBBMB|| U1088L|#8B8WP8FPMPYWBBMBY48Y|| K3031C|ZH0HF8YPFY84Y83NBBMBY|| R0090B|_XN4S2YJ2F8MBY48Y7P8S0|| P4466N|IHY48CH84YBBY48Y7P8S00|| J2314G|5WXBM8Y3YWBBBMBY48Y7P8|| R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0|| TI7823D|78MH-ZZ3M8GYBBMBBMBY48|| B0509R|50Z88B5V48YN4FYFPBBMB|| M0682N|B4YGWP443YPY3WMBBMBY48|| K0725D|ZH0HF8YPFY8Y4Y83NBBMBY|| P9185M|Z-8WX6SW8BPBMBY48Y7P8S|| V2636H|Z05WWMFJ2YC4P4WWMFBBMBY|| K9954E|MJ2FPFYBCBBY48Y7P8$$$$|| PS0210M|S0-SBBW88YW8BBMBY48Y7P|| PS5842M|38FW-WXBV4FWX_BNWXBBMB|| M5480A|50Z44YWVWNWMBY48Y7P8S0|| V2825L|SCF84C@WBWBYBYWP8YBBMB|| J9090J|BM83Y8P4NN6V883NBBMBY4|| G9332F|Z0SZ4W8HV8FPBBMBY48Y7P|| M2017K|K6KDNPJ28ZK4YBBMBY48Y7|| M6901D|I-NPX488V3YPBBMBY48Y7P8|| S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K| M4134E|DNB4FHZIHZ43Y4BBMBY48Y|| G7604R|5-Y8PRPMWPF4YBBMBY48Y7|| S3468A|HCFP48FWWV88BBMBY48Y7P|| S0796N|K0SP3KPVFYWBWP8BBMBY48|| D7781Q|H0DN48484P8883NBBMBY48|S2395K| B4607P|HBPFWIS3N3YBMBY48Y7P8S|| C4398Q|K0HKCYBX8YMKMYVWBBMBY4|| J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q| D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K K2110J|HKS6FYNWFYHMWBBMBY48Y7P|| M5116F|S5SIB8BP85GBBMBY48Y7P8|| M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J| S9598G|KZWXB48VYBPBBMBY48Y7P8|| K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M| S2395K|ZSZ48PF#NFVBW8BBMBBMBY|| M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K S5018D|KZ0HVMWM4Y3BBMBY48Y7P8|| K5943E|Z6PP8VPM83N8BPMV8BBMBY|| N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K| HJ7138K|MV4YK08W38BBMBY48Y7P8S|| | | | R9004J|K0KCNP7P85SBBMBY48Y7P|| F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q M2724Q|ZZZ54RFP8GYBMBBMBY48Y7|| D9109E|YVYB4W46SWYBMBBMBY48Y7P|| M9585E|KBGWY3YB2WBMBY48Y7P8S00|| | | | H3629M|-_P8VY-0WP8BMBY48Y7P8S0|| T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K| M9295K|Z5ZMPFV8M8BMBBMBY48Y7|| P3089F|K6KDNPJ28ZK4YBBMBY48Y7|| PO2499R|-K067P8KPYBMBY48Y7P8S0|| L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K| 1088L||| ; run; data want(keep=pan1 pan2 pan3 add1 household); declare hash ha(hashexp: 16,ordered: 'A'); declare hiter hi('ha'); ha.definekey('_count'); ha.definedata('_count','_pan1','_pan2','_pan3','_add1'); ha.definedone(); do until(last); set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1)) end=last; _count+1;ha.add(); end; do until(_last); set test end=_last; count+1; found=0; rc=hi.first(); do while(rc=0); if count ne _count and ((pan1=_pan1 and not missing(_pan1)) or (pan1=_pan2 and not missing(_pan2)) or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or (add1=_pan1 and not missing(_pan1)) or (add1=_pan2 and not missing(_pan2)) or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1)) ) then do; _n+1; if _n eq 1 then do; n+1; first=n; household=n; end; else household=first; found=1;leave; end; rc=hi.next(); end; if not found then do; n+1; household=n; end; output; end; stop; run;
Ksharp
Ksharp that is not only for pan1,add1 this is for all the four variables
actually my concept is to get the linkages if pan1 is again repeated in pan2 pan3 or add1 then it should get one Hid,not only that
ex obs pan1 pan2 pan3 add1 hid
1. aaa bbb ccc ddd 1
2. qqq rrr www aaa 1
3. rrr ppp mmm lll 1
4. uuu zzz ffff ppp 1
5 p l m n 2
6 jjjj eee rrr ooo 1
7 <all blanks > 3
8 sss www . . 1
9 . . . eee 1
in this example the first obs is hid is 1
in obs2 add1 aaa is matching with pan1 aaa of obs1
in obs 3 pan1 rrr is matching in pan2 rrr of obs 2
in obs 4 add1 ppp is matching with pan2 ppp of obs 3
in obs 5 it is unique with no matching
in obs 6 pan3 rrr is matcing with pan1 rrr of obs 3
in obs 7 is having no values it got Hid as 3
obs 8 is having pan2 www matching with pan2 www of obs 2 so it got Hid 1
obs 9 add1 eee is matching with obs6 pan2 ee so it got Hid 1
Like this i want the linkages
Ksharp i am tring this from 3 days i am not getting so please help me thanks for your help in advance....
OK.
You want to query the value from the previous obs not whole obs? just as Art'T said.
That would be more simple.
How about this code:
data test; input ( pan1 pan2 pan3 add1 ) ($); datalines; aaa bbb ccc ddd qqq rrr www aaa rrr ppp mmm lll uuu zzz ffff ppp p l m n jjjj eee rrr ooo . . . . sss www . . . . . eee ; run; data want(keep=pan1 pan2 pan3 add1 household); if _n_ eq 1 then do; declare hash ha(hashexp: 16); ha.definekey('key'); ha.definedone(); end; set test; array _house{*} $ 32 _character_; array _save{4} $ 32 ; do i=1 to dim(_house); key=_house{i};rc=ha.check(); if rc=0 then do; household=1;found=1; end; else _save{i}=_house{i}; end; do j=1 to 4; if not missing(_save{j}) then do; key=_save{j}; ha.replace(); end; end; call missing( of _save{*}); if not found then do; n+1;household=n; end; run;
Ksharp
Actually my code can be refined as:
data test; input ( pan1 pan2 pan3 add1 ) ($); datalines; aaa bbb ccc ddd qqq rrr www aaa rrr ppp mmm lll uuu zzz ffff ppp p l m n jjjj eee rrr ooo . . . . sss www . . . . . eee ; run; data want(keep=pan1 pan2 pan3 add1 household); if _n_ eq 1 then do; declare hash ha(hashexp: 16); ha.definekey('key'); ha.definedone(); end; set test; array _house{*} $ 32 _character_; do i=1 to dim(_house); key=_house{i};rc=ha.check(); if rc=0 then do; household=1;found=1;leave; end; end; do j=1 to dim(_house); if not missing(_house{j}) then do; key=_house{j}; ha.replace(); end; end; if not found then do; n+1;household=n; end; run;
Ksharp
Ksharp,
The following code is WRONG, but I'd like to know how to fix it. After reconsidering what I was planning to do, I decided that learning to include a hash was the most sensible way to go.
However, your code is incomplete, and I don't know how to fix it. I do know what it is missing: Your code does not assign households to the hash thus, when it does a lookup, it merely assigns a "1" if it finds a match.
I tried to correct it, below, but I am apparently missing or misunderstanding the concept.
Would definitely appreciate seeing how it SHOULD/COULD have been written correctly:
data test;
infile cards dsd dlm='|' truncover;
input (pan1 add1 pan2 pan3) (: $40.);
cards;
M5669K|4HH5BFWBBMBY48Y7P8S|P5P8YBBMBY48Y7P8S0005D|
I2175Q|P5P8YBBMBY48Y7P8S0005D||
S1495P|8BWP_Y8V3MYVWBBMBBMB||
G4909P|WN8Y4YFN48FYP884P83Y4||
|S1495P||
S0560L|488Y8N8N8H-K56H4YP8P||
F0994B|R8YPF8Y4P-0BWP8BBMBY||
H6408L|HICPM4BBPWFP4WV4MP8YBB||
M2091M|48YJ23N3P8VWBBMBY48Y7||
S2426K|5V4PZHKB8YPNPVWPVBBMBY||
D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L|
G9847L|XB3845-WM43YWBBMBY48Y7||
D3871H|D3PC34Z0-38W3YBBMBY48Y||
T0753D|2P8Y38Y84W3N4NPJ2R8P8Y||
P5328F|Z06V4B8YFYFYMY8FMYVWBB||
D3938K|KSJ2VBYWBY4P8YBBMBY48Y||
A3529D|K0K4F4BY8BBXVBBMBY48Y8||
P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L
R1129A|Z0KMV4YWM483NBBMB8Y7P8S||
R|KK0ZJ2PYJ2NW8K5SH8BV84||
S7807B|YPFBYBP4WNWN88YBMBBMBY||
N8681Q|KHH4B8MMYNW3Y8YBBMBY48||
P8632Q|H550HFMBM4YPFYFPBBMBY4||
K3437H|DZ6MYVNN3YWMBY48Y7P8S0||
G5510C|ZH@YPBY84BB$$$$$$$$$||
R2518M|HZB8NV8MWXYBBY48Y7P8S00||
M1014A|B8XY5IZH_38VWBBMBBMBY4||
M0049P|KZYJ2C8Y3BNCVY4VP8YBBM||
PD4296R|K0KVPJ23YV43PFY3P8VWBB||
L5351M|KF47PP8NPCN4VWNWBBMS00||
D7744C|56PW3WF3W8P3P8VWBBMBY4||
U1088L|#8B8WP8FPMP3YWBBMS0005||
U1088L|#8B8WP8FPMP3YWBBMBY48Y||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
B0102C|@P7P8W4@PZIYBWDSZ4M_W||
U1088L|#8B8WP8FPMPWBBMBY48Y||
U0564D|#8B8WP8FPMPYWBBMBY48Y||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
B1476E|S05YCGY4N4CYHZKPYBPBBMS||
C2940N|56PW3WF3W8P3P8VWBBMBY4||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
N8784Q|S0K4PY4WD65YWBBMBY48Y7||
G8553G|KZBP8SFYFPNYBBMBY48Y7P||
S5904H|60KZZ2Y3NCP5J2Y3N3P8VW||
S3482C|Z05Y4YJ24VVYWN884YBBMB||
U1088L|#8B8WP8FPMPYWBBMBY48Y||
K3031C|ZH0HF8YPFY84Y83NBBMBY||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
P4466N|IHY48CH84YBBY48Y7P8S00||
J2314G|5WXBM8Y3YWBBBMBY48Y7P8||
R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0||
TI7823D|78MH-ZZ3M8GYBBMBBMBY48||
B0509R|50Z88B5V48YN4FYFPBBMB||
M0682N|B4YGWP443YPY3WMBBMBY48||
K0725D|ZH0HF8YPFY8Y4Y83NBBMBY||
P9185M|Z-8WX6SW8BPBMBY48Y7P8S||
V2636H|Z05WWMFJ2YC4P4WWMFBBMBY||
K9954E|MJ2FPFYBCBBY48Y7P8$$$$||
PS0210M|S0-SBBW88YW8BBMBY48Y7P||
PS5842M|38FW-WXBV4FWX_BNWXBBMB||
M5480A|50Z44YWVWNWMBY48Y7P8S0||
V2825L|SCF84C@WBWBYBYWP8YBBMB||
J9090J|BM83Y8P4NN6V883NBBMBY4||
G9332F|Z0SZ4W8HV8FPBBMBY48Y7P||
M2017K|K6KDNPJ28ZK4YBBMBY48Y7||
M6901D|I-NPX488V3YPBBMBY48Y7P8||
S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K|
M4134E|DNB4FHZIHZ43Y4BBMBY48Y||
G7604R|5-Y8PRPMWPF4YBBMBY48Y7||
S3468A|HCFP48FWWV88BBMBY48Y7P||
S0796N|K0SP3KPVFYWBWP8BBMBY48||
D7781Q|H0DN48484P8883NBBMBY48|S2395K|
B4607P|HBPFWIS3N3YBMBY48Y7P8S||
C4398Q|K0HKCYBX8YMKMYVWBBMBY4||
J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q|
D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K
K2110J|HKS6FYNWFYHMWBBMBY48Y7P||
M5116F|S5SIB8BP85GBBMBY48Y7P8||
M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q
P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J|
S9598G|KZWXB48VYBPBBMBY48Y7P8||
K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M|
S2395K|ZSZ48PF#NFVBW8BBMBBMBY||
M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M
B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K
S5018D|KZ0HVMWM4Y3BBMBY48Y7P8||
K5943E|Z6PP8VPM83N8BPMV8BBMBY||
N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K|
HJ7138K|MV4YK08W38BBMBY48Y7P8S||
| | |
R9004J|K0KCNP7P85SBBMBY48Y7P||
F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q
M2724Q|ZZZ54RFP8GYBMBBMBY48Y7||
D9109E|YVYB4W46SWYBMBBMBY48Y7P||
M9585E|KBGWY3YB2WBMBY48Y7P8S00||
| | |
H3629M|-_P8VY-0WP8BMBY48Y7P8S0||
T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K|
M9295K|Z5ZMPFV8M8BMBBMBY48Y7||
P3089F|K6KDNPJ28ZK4YBBMBY48Y7||
PO2499R|-K067P8KPYBMBY48Y7P8S0||
L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K|
1088L|||
;
run;
data want (keep=pan1 pan2 pan3 add1 household);
if _n_ eq 1 then do;
declare hash ha(hashexp: 16);
ha.definekey('key');
ha.definedata('hhold');
ha.definedone();
end;
set test;
array _house{*} $ 40 pan1--pan3;
do i=1 to dim(_house);
key=_house{i};
rc=ha.check();
if rc=0 then do;
found=1;
household=hhold;
leave;
end;
end;
if not found then do;
n+1;
household=n;
end;
do j=1 to dim(_house);
if not missing(_house{j}) then do;
key=_house{j};
hhold=household;
ha.replace();
end;
end;
run;
.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.