BookmarkSubscribeRSS Feed
sas_Forum
Calcite | Level 5

Grouping and Giving one id
Hi i am having pan1,pan2,pan3 and add1
now i want to generate household id on that

data test;
input pan1 pan2 pan3 add1
cards;
1 2 3 4
5 6 7 8
. . . .
1 2 7 8
1 5 9 12
13 14 15 16
12 18 19 .
run;

Here i want to create one unique number as Household id
first it should check the pan1 if any of the pan are in pan2 or pan3 or add1 it should give one
household id like versa pan2 should check the observations matching in pan1 or pan3 or add1
and pan3 and add1 to do the same like pan1 and pan2..
my final createria is to get one number unique if any of the pan1,pan2,pan3 and add1 obs matching
any where and i should generate one new number if all the observations are blank.
These is the process of creating household in Dataflux i am doing this in baase sas.

output:
pan1 pan2 pan3 add1 Household
1 2 3 4  1    
5 6 7 8  1
. . . .  2
1 2 7 8  1
1 5 9 12 1
13 14 15 16 3
12 18 19 . 1

124 REPLIES 124
Ksharp
Super User

"my final createria is to get one number unique if any of the pan1,pan2,pan3 and add1 obs matching"

I do not understand what your mean is totally.

If there were two or more rows are all missing value, how do you count?

And Under what situation, the count will increase one?

Ksharp

sas_Forum
Calcite | Level 5

IF any of two columns are missing and if the remaining two columns are matching with the obs in that then it should get the same Household id else it should get new Household_id.The count should increse based on the previuos Household_id

output:
obs  pan1 pan2 pan3 add1 Household
1) 1 2 3 4  1  (is is the starting and got Householdid as 1)
2) 5 6 7 8  1 (It also got 1 becase in obs 5 the pan2 is 5 and pan1 is 1 and in first obs pan1 is also 1 any corresponding to Hidrelinking should get 1)
3) . . . .  2  (all are null and shd get new Hid)
4) 1 2 7 8  1( pan1 is matching with obs1)
5) 1 5 9 12 1  (add1 is matching with obs of 1 with pan1)
6)13 14 15 16 3 (unique no matcihng got new Hid)
7)12 18 19 . 1(in obs 5 the pan3 is matching with pan1 in this ibs so got Hid as 1)

Ksharp
Super User

The only way I can think to promote performance efficiency is to use Hash Table(see the following), but it need lots and lots of memory for your large table, I do not think your computer has so big memory. So If you could, split the large table into several small table.

About "all are character variables" , that is not a issue, my code is also suited for character variable.

data test;
input pan1 pan2 pan3 add1 ;
cards;
1 2 3 4
5 6 7 8
. . . .
1 2 7 8
1 5 9 12
13 14 15 16
12 18 19 .
;
run;
data want(keep=pan1 pan2 pan3 add1 household);
 declare hash ha(hashexp: 16,ordered: 'A');
 declare hiter hi('ha');
  ha.definekey('_count');
  ha.definedata('_count','_pan1','_pan2','_pan3','_add1');
  ha.definedone();

  do until(last);
   set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1))  end=last;
   _count+1;ha.add();
  end;

  do until(_last);
   set test end=_last;
   count+1; found=0;
   rc=hi.first();
   do while(rc=0);
   if count ne _count and
   ((pan1=_pan1 and not missing(_pan1))  or  (pan1=_pan2 and not missing(_pan2))  or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or 
    (pan2=_pan1 and not missing(_pan1))  or  (pan2=_pan2 and not missing(_pan2))  or (pan2=_pan3 and not missing(_pan3)) or (pan2=_add1 and not missing(_add1)) or 
    (pan3=_pan1 and not missing(_pan1))  or  (pan3=_pan2 and not missing(_pan2))  or (pan3=_pan3 and not missing(_pan3)) or (pan3=_add1 and not missing(_add1)) or 
    (add1=_pan1 and not missing(_pan1))  or  (add1=_pan2 and not missing(_pan2))  or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1))  )
    then do; _n+1;
           if _n eq 1 then do;
                             n+1; first=n;
                             household=n;
                           end;
             else household=first;
           found=1;leave;
          end;
   rc=hi.next();
   end;
 if not found then do; n+1; household=n; end;
 output;
 end;
stop;
run;


Ksharp

Ksharp
Super User

Hi. I do not understand what is your mean.

The Hash Table method is limited by your PC memory.

So you must have large memory to run code for your large table.

Speed I think will be faster .

Ksharp

Ksharp
Super User

Can you post sample data for these 500 obs.

And if one obs has a missing value and another obs has also a missing value,   how do you count?

Ksharp
Super User

I tested it , not found error. the following is output I get.

data temp;
infile datalines dsd dlm='|' truncover;
input (pan1 pan2 pan3 add1) (: $40.);
datalines;
M5669K|4HH5BFWBBMBY48Y7P8S||
I2175Q|P5P8YBBMBY48Y7P8S0005D||
S1495P|8BWP_Y8V3MYVWBBMBBMB||
G4909P|WN8Y4YFN48FYP884P83Y4||
S0560L|488Y8N8N8H-K56H4YP8P||
F0994B|R8YPF8Y4P-0BWP8BBMBY||
H6408L|HICPM4BBPWFP4WV4MP8YBB||
M2091M|48YJ23N3P8VWBBMBY48Y7||
S2426K|5V4PZHKB8YPNPVWPVBBMBY||
D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L|
G9847L|XB3845-WM43YWBBMBY48Y7||
D3871H|D3PC34Z0-38W3YBBMBY48Y||
T0753D|2P8Y38Y84W3N4NPJ2R8P8Y||
P5328F|Z06V4B8YFYFYMY8FMYVWBB||
D3938K|KSJ2VBYWBY4P8YBBMBY48Y||
A3529D|K0K4F4BY8BBXVBBMBY48Y8||
P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L
R1129A|Z0KMV4YWM483NBBMB8Y7P8S||
R|KK0ZJ2PYJ2NW8K5SH8BV84||
S7807B|YPFBYBP4WNWN88YBMBBMBY||
N8681Q|KHH4B8MMYNW3Y8YBBMBY48||
P8632Q|H550HFMBM4YPFYFPBBMBY4||
K3437H|DZ6MYVNN3YWMBY48Y7P8S0||
G5510C|ZH@YPBY84BB$$$$$$$$$||
R2518M|HZB8NV8MWXYBBY48Y7P8S00||
M1014A|B8XY5IZH_38VWBBMBBMBY4||
M0049P|KZYJ2C8Y3BNCVY4VP8YBBM||
PD4296R|K0KVPJ23YV43PFY3P8VWBB||
L5351M|KF47PP8NPCN4VWNWBBMS00||
D7744C|56PW3WF3W8P3P8VWBBMBY4||
U1088L|#8B8WP8FPMP3YWBBMS0005||
U1088L|#8B8WP8FPMP3YWBBMBY48Y||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
B0102C|@P7P8W4@PZIYBWDSZ4M_W||
U1088L|#8B8WP8FPMPWBBMBY48Y||
U0564D|#8B8WP8FPMPYWBBMBY48Y||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
B1476E|S05YCGY4N4CYHZKPYBPBBMS||
C2940N|56PW3WF3W8P3P8VWBBMBY4||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
N8784Q|S0K4PY4WD65YWBBMBY48Y7||
G8553G|KZBP8SFYFPNYBBMBY48Y7P||
S5904H|60KZZ2Y3NCP5J2Y3N3P8VW||
S3482C|Z05Y4YJ24VVYWN884YBBMB||
U1088L|#8B8WP8FPMPYWBBMBY48Y||
K3031C|ZH0HF8YPFY84Y83NBBMBY||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
P4466N|IHY48CH84YBBY48Y7P8S00||
J2314G|5WXBM8Y3YWBBBMBY48Y7P8||
R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0||
TI7823D|78MH-ZZ3M8GYBBMBBMBY48||
B0509R|50Z88B5V48YN4FYFPBBMB||
M0682N|B4YGWP443YPY3WMBBMBY48||
K0725D|ZH0HF8YPFY8Y4Y83NBBMBY||
P9185M|Z-8WX6SW8BPBMBY48Y7P8S||
V2636H|Z05WWMFJ2YC4P4WWMFBBMBY||
K9954E|MJ2FPFYBCBBY48Y7P8$$$$||
PS0210M|S0-SBBW88YW8BBMBY48Y7P||
PS5842M|38FW-WXBV4FWX_BNWXBBMB||
M5480A|50Z44YWVWNWMBY48Y7P8S0||
V2825L|SCF84C@WBWBYBYWP8YBBMB||
J9090J|BM83Y8P4NN6V883NBBMBY4||
G9332F|Z0SZ4W8HV8FPBBMBY48Y7P||
M2017K|K6KDNPJ28ZK4YBBMBY48Y7||
M6901D|I-NPX488V3YPBBMBY48Y7P8||
S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K|
M4134E|DNB4FHZIHZ43Y4BBMBY48Y||
G7604R|5-Y8PRPMWPF4YBBMBY48Y7||
S3468A|HCFP48FWWV88BBMBY48Y7P||
S0796N|K0SP3KPVFYWBWP8BBMBY48||
D7781Q|H0DN48484P8883NBBMBY48|S2395K|
B4607P|HBPFWIS3N3YBMBY48Y7P8S||
C4398Q|K0HKCYBX8YMKMYVWBBMBY4||
J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q|
D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K
K2110J|HKS6FYNWFYHMWBBMBY48Y7P||
M5116F|S5SIB8BP85GBBMBY48Y7P8||
M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q
P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J|
S9598G|KZWXB48VYBPBBMBY48Y7P8||
K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M|
S2395K|ZSZ48PF#NFVBW8BBMBBMBY||
M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M
B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K
S5018D|KZ0HVMWM4Y3BBMBY48Y7P8||
K5943E|Z6PP8VPM83N8BPMV8BBMBY||
N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K|
HJ7138K|MV4YK08W38BBMBY48Y7P8S||
  | | |
R9004J|K0KCNP7P85SBBMBY48Y7P||
F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q
M2724Q|ZZZ54RFP8GYBMBBMBY48Y7||
D9109E|YVYB4W46SWYBMBBMBY48Y7P||
M9585E|KBGWY3YB2WBMBY48Y7P8S00||
  | | |
H3629M|-_P8VY-0WP8BMBY48Y7P8S0||
T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K|
M9295K|Z5ZMPFV8M8BMBBMBY48Y7||
P3089F|K6KDNPJ28ZK4YBBMBY48Y7||
PO2499R|-K067P8KPYBMBY48Y7P8S0||
L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K|
1088L|||
;
run;
data test;
 set temp;
 pan1=compress(pan1,' ');
 pan2=compress(pan2,' ');
 pan3=compress(pan3,' ');
 add1=compress(add1,' ');
run;
data want(keep=pan1 pan2 pan3 add1 household);
 declare hash ha(hashexp: 16,ordered: 'A');
 declare hiter hi('ha');
  ha.definekey('_count');
  ha.definedata('_count','_pan1','_pan2','_pan3','_add1');
  ha.definedone();

  do until(last);
   set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1))  end=last;
   _count+1;ha.add();
  end;

  do until(_last);
   set test end=_last;
   count+1; found=0;
   rc=hi.first();
   do while(rc=0);
   if count ne _count and
   ((pan1=_pan1 and not missing(_pan1))  or  (pan1=_pan2 and not missing(_pan2))  or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or 
    (pan2=_pan1 and not missing(_pan1))  or  (pan2=_pan2 and not missing(_pan2))  or (pan2=_pan3 and not missing(_pan3)) or (pan2=_add1 and not missing(_add1)) or 
    (pan3=_pan1 and not missing(_pan1))  or  (pan3=_pan2 and not missing(_pan2))  or (pan3=_pan3 and not missing(_pan3)) or (pan3=_add1 and not missing(_add1)) or 
    (add1=_pan1 and not missing(_pan1))  or  (add1=_pan2 and not missing(_pan2))  or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1))  )
    then do; _n+1;
           if _n eq 1 then do;
                             n+1; first=n;
                             household=n;
                           end;
             else household=first;
           found=1;leave;
          end;
   rc=hi.next();
   end;
 if not found then do; n+1; household=n; end;
 output;
 end;
stop;
run;













pan1     pan2     pan3     add1     household
M5669K     4HH5BFWBBMBY48Y7P8S               1
I2175Q     P5P8YBBMBY48Y7P8S0005D               2
S1495P     8BWP_Y8V3MYVWBBMBBMB               3
G4909P     WN8Y4YFN48FYP884P83Y4               4
S0560L     488Y8N8N8H-K56H4YP8P               5
F0994B     R8YPF8Y4P-0BWP8BBMBY               6
H6408L     HICPM4BBPWFP4WV4MP8YBB               1
M2091M     48YJ23N3P8VWBBMBY48Y7               7
S2426K     5V4PZHKB8YPNPVWPVBBMBY               8
D7907L     SBMBMVYYBBM8PBY48Y7P8S0     H6408L          1
G9847L     XB3845-WM43YWBBMBY48Y7               9
D3871H     D3PC34Z0-38W3YBBMBY48Y               10
T0753D     2P8Y38Y84W3N4NPJ2R8P8Y               11
P5328F     Z06V4B8YFYFYMY8FMYVWBB               12
D3938K     KSJ2VBYWBY4P8YBBMBY48Y               13
A3529D     K0K4F4BY8BBXVBBMBY48Y8               14
P8617A     I0S3YPV4P8BYVWMYVWBBM4          H6408L     1
R1129A     Z0KMV4YWM483NBBMB8Y7P8S               15
R     KK0ZJ2PYJ2NW8K5SH8BV84               16
S7807B     YPFBYBP4WNWN88YBMBBMBY               17
N8681Q     KHH4B8MMYNW3Y8YBBMBY48               18
P8632Q     H550HFMBM4YPFYFPBBMBY4               19
K3437H     DZ6MYVNN3YWMBY48Y7P8S0               20
G5510C     ZH@YPBY84BB$$$$$$$$$               21
R2518M     HZB8NV8MWXYBBY48Y7P8S00               22
M1014A     B8XY5IZH_38VWBBMBBMBY4               23
M0049P     KZYJ2C8Y3BNCVY4VP8YBBM               24
PD4296R     K0KVPJ23YV43PFY3P8VWBB               25
L5351M     KF47PP8NPCN4VWNWBBMS00               26
D7744C     56PW3WF3W8P3P8VWBBMBY4               1
U1088L     #8B8WP8FPMP3YWBBMS0005               1
U1088L     #8B8WP8FPMP3YWBBMBY48Y               1
U0564D     #8B8WP8FPMP3YWBBMBY48Y               1
B0102C     @P7P8W4@PZIYBWDSZ4M_W               27
U1088L     #8B8WP8FPMPWBBMBY48Y               1
U0564D     #8B8WP8FPMPYWBBMBY48Y               1
S9556K     @P7P8W46CN8SZ_WGP48PBB               1
S9556K     @P7P8W46CN8SZ_WGP48PBB               1
B1476E     S05YCGY4N4CYHZKPYBPBBMS               28
C2940N     56PW3WF3W8P3P8VWBBMBY4               1
U0564D     #8B8WP8FPMP3YWBBMBY48Y               1
R0090B     _XN4S2YJ2F8MBY48Y7P8S0               1
N8784Q     S0K4PY4WD65YWBBMBY48Y7               29
G8553G     KZBP8SFYFPNYBBMBY48Y7P               30
S5904H     60KZZ2Y3NCP5J2Y3N3P8VW               31
S3482C     Z05Y4YJ24VVYWN884YBBMB               32
U1088L     #8B8WP8FPMPYWBBMBY48Y               1
K3031C     ZH0HF8YPFY84Y83NBBMBY               33
R0090B     _XN4S2YJ2F8MBY48Y7P8S0               1
P4466N     IHY48CH84YBBY48Y7P8S00               34
J2314G     5WXBM8Y3YWBBBMBY48Y7P8               35
R0090B     _XN4SJ2YJ2F8MBY48Y7P8S0               1
TI7823D     78MH-ZZ3M8GYBBMBBMBY48               36
B0509R     50Z88B5V48YN4FYFPBBMB               37
M0682N     B4YGWP443YPY3WMBBMBY48               38
K0725D     ZH0HF8YPFY8Y4Y83NBBMBY               39
P9185M     Z-8WX6SW8BPBMBY48Y7P8S               40
V2636H     Z05WWMFJ2YC4P4WWMFBBMBY               41
K9954E     MJ2FPFYBCBBY48Y7P8$$$$               42
PS0210M     S0-SBBW88YW8BBMBY48Y7P               43
PS5842M     38FW-WXBV4FWX_BNWXBBMB               44
M5480A     50Z44YWVWNWMBY48Y7P8S0               45
V2825L     SCF84C@WBWBYBYWP8YBBMB               46
J9090J     BM83Y8P4NN6V883NBBMBY4               47
G9332F     Z0SZ4W8HV8FPBBMBY48Y7P               48
M2017K     K6KDNPJ28ZK4YBBMBY48Y7               1
M6901D     I-NPX488V3YPBBMBY48Y7P8               49
S0689N     Z0ZZP2YF367YVWBBMBBMBY     M5669K          1
M4134E     DNB4FHZIHZ43Y4BBMBY48Y               50
G7604R     5-Y8PRPMWPF4YBBMBY48Y7               51
S3468A     HCFP48FWWV88BBMBY48Y7P               52
S0796N     K0SP3KPVFYWBWP8BBMBY48               53
D7781Q     H0DN48484P8883NBBMBY48     S2395K          1
B4607P     HBPFWIS3N3YBMBY48Y7P8S               54
C4398Q     K0HKCYBX8YMKMYVWBBMBY4               55
J9787E     SNMZD4PB8PFBBMBY48Y7P8     D7781Q          1
D1187E     BYP8BWBY4FPP8YBBMBBMBY          S2395K     1
K2110J     HKS6FYNWFYHMWBBMBY48Y7P               56
M5116F     S5SIB8BP85GBBMBY48Y7P8               57
M3134J     6HWJ2P78NVBBY48Y7P8S00          D7781Q     1
P1590P     S0Z4PYVY84PBBMBY48Y7P8     M3134J          1
S9598G     KZWXB48VYBPBBMBY48Y7P8               1
K9168B     -HJ2M88MJ2WMBY48Y7P8S0     H3629M          1
S2395K     ZSZ48PF#NFVBW8BBMBBMBY               1
M2140L     48VP33W8BNB8BW8BBMBBMB          H3629M     1
B4952H     YJ2PB3CN8MYBBMBY48Y7P8     S9598G     S2395K     1
S5018D     KZ0HVMWM4Y3BBMBY48Y7P8               58
K5943E     Z6PP8VPM83N8BPMV8BBMBY               59
N6677M     ZKYCSVW8BF8VWBBMBY48Y7     M5669K          1
HJ7138K     MV4YK08W38BBMBY48Y7P8S               60
                    61
R9004J     K0KCNP7P85SBBMBY48Y7P               1
F0109Q     Z8W8SSVY4VYNM8VBBMBBMB     R9004J     M2724Q     1
M2724Q     ZZZ54RFP8GYBMBBMBY48Y7               1
D9109E     YVYB4W46SWYBMBBMBY48Y7P               62
M9585E     KBGWY3YB2WBMBY48Y7P8S00               63
                    64
H3629M     -_P8VY-0WP8BMBY48Y7P8S0               1
T2426B     50ZYC8NHI8YBCVCFYP8BBM     M9295K          1
M9295K     Z5ZMPFV8M8BMBBMBY48Y7               1
P3089F     K6KDNPJ28ZK4YBBMBY48Y7               1
PO2499R     -K067P8KPYBMBY48Y7P8S0               65
L3447Q     33PV4V5I633F4YBBMBY48Y     M5669K          1
1088L                    66


Ksharp

sas_Forum
Calcite | Level 5

Hi sharp it worked the output for household was wrong for example the pan1 R0090B is going to Hid 1 but it is not having any linkage in Hid 1 the corresponding pan2 is not having any linkage in the Hid of 1

My answer: Actually i wnat to create the Household id to customers having any linkage between them in pan1,2,3 and add1

if they have linkage they should get my Hid .if pan1 is having any linkage between them and the correspoding pan2 if present in any of the four variables should get the same Hid and if next to that if any pan is there and linking with any of the four variables it should also get linked  and get the same Hid...

              Ksharp Thqs for your patience and Help...

Ksharp
Super User

Or you can use compress function to remove the invisibility characher Such as '09'x (tab character).

May be that is reason.

data test;
 set temp;
 pan1=compress(pan1, ,'s');
 pan2=compress(pan2, ,'s');
 pan3=compress(pan3, ,'s');
 add1=compress(add1, ,'s');
run;

Ksharp

Ksharp
Super User

if pan1 and add1 are not missing and if the value of pan1 and add1 matched in any of the four values it should get the same Household id else it should get new houshold id...

Oh. You have change the rules. You only want pan1 and add1 to compare?

How about this:

data test;
infile datalines dsd dlm='|' truncover;
input (pan1 add1 pan2 pan3) (: $40.);
datalines;
M5669K|4HH5BFWBBMBY48Y7P8S||
I2175Q|P5P8YBBMBY48Y7P8S0005D||
S1495P|8BWP_Y8V3MYVWBBMBBMB||
G4909P|WN8Y4YFN48FYP884P83Y4||
S0560L|488Y8N8N8H-K56H4YP8P||
F0994B|R8YPF8Y4P-0BWP8BBMBY||
H6408L|HICPM4BBPWFP4WV4MP8YBB||
M2091M|48YJ23N3P8VWBBMBY48Y7||
S2426K|5V4PZHKB8YPNPVWPVBBMBY||
D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L|
G9847L|XB3845-WM43YWBBMBY48Y7||
D3871H|D3PC34Z0-38W3YBBMBY48Y||
T0753D|2P8Y38Y84W3N4NPJ2R8P8Y||
P5328F|Z06V4B8YFYFYMY8FMYVWBB||
D3938K|KSJ2VBYWBY4P8YBBMBY48Y||
A3529D|K0K4F4BY8BBXVBBMBY48Y8||
P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L
R1129A|Z0KMV4YWM483NBBMB8Y7P8S||
R|KK0ZJ2PYJ2NW8K5SH8BV84||
S7807B|YPFBYBP4WNWN88YBMBBMBY||
N8681Q|KHH4B8MMYNW3Y8YBBMBY48||
P8632Q|H550HFMBM4YPFYFPBBMBY4||
K3437H|DZ6MYVNN3YWMBY48Y7P8S0||
G5510C|ZH@YPBY84BB$$$$$$$$$||
R2518M|HZB8NV8MWXYBBY48Y7P8S00||
M1014A|B8XY5IZH_38VWBBMBBMBY4||
M0049P|KZYJ2C8Y3BNCVY4VP8YBBM||
PD4296R|K0KVPJ23YV43PFY3P8VWBB||
L5351M|KF47PP8NPCN4VWNWBBMS00||
D7744C|56PW3WF3W8P3P8VWBBMBY4||
U1088L|#8B8WP8FPMP3YWBBMS0005||
U1088L|#8B8WP8FPMP3YWBBMBY48Y||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
B0102C|@P7P8W4@PZIYBWDSZ4M_W||
U1088L|#8B8WP8FPMPWBBMBY48Y||
U0564D|#8B8WP8FPMPYWBBMBY48Y||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
S9556K|@P7P8W46CN8SZ_WGP48PBB||
B1476E|S05YCGY4N4CYHZKPYBPBBMS||
C2940N|56PW3WF3W8P3P8VWBBMBY4||
U0564D|#8B8WP8FPMP3YWBBMBY48Y||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
N8784Q|S0K4PY4WD65YWBBMBY48Y7||
G8553G|KZBP8SFYFPNYBBMBY48Y7P||
S5904H|60KZZ2Y3NCP5J2Y3N3P8VW||
S3482C|Z05Y4YJ24VVYWN884YBBMB||
U1088L|#8B8WP8FPMPYWBBMBY48Y||
K3031C|ZH0HF8YPFY84Y83NBBMBY||
R0090B|_XN4S2YJ2F8MBY48Y7P8S0||
P4466N|IHY48CH84YBBY48Y7P8S00||
J2314G|5WXBM8Y3YWBBBMBY48Y7P8||
R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0||
TI7823D|78MH-ZZ3M8GYBBMBBMBY48||
B0509R|50Z88B5V48YN4FYFPBBMB||
M0682N|B4YGWP443YPY3WMBBMBY48||
K0725D|ZH0HF8YPFY8Y4Y83NBBMBY||
P9185M|Z-8WX6SW8BPBMBY48Y7P8S||
V2636H|Z05WWMFJ2YC4P4WWMFBBMBY||
K9954E|MJ2FPFYBCBBY48Y7P8$$$$||
PS0210M|S0-SBBW88YW8BBMBY48Y7P||
PS5842M|38FW-WXBV4FWX_BNWXBBMB||
M5480A|50Z44YWVWNWMBY48Y7P8S0||
V2825L|SCF84C@WBWBYBYWP8YBBMB||
J9090J|BM83Y8P4NN6V883NBBMBY4||
G9332F|Z0SZ4W8HV8FPBBMBY48Y7P||
M2017K|K6KDNPJ28ZK4YBBMBY48Y7||
M6901D|I-NPX488V3YPBBMBY48Y7P8||
S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K|
M4134E|DNB4FHZIHZ43Y4BBMBY48Y||
G7604R|5-Y8PRPMWPF4YBBMBY48Y7||
S3468A|HCFP48FWWV88BBMBY48Y7P||
S0796N|K0SP3KPVFYWBWP8BBMBY48||
D7781Q|H0DN48484P8883NBBMBY48|S2395K|
B4607P|HBPFWIS3N3YBMBY48Y7P8S||
C4398Q|K0HKCYBX8YMKMYVWBBMBY4||
J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q|
D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K
K2110J|HKS6FYNWFYHMWBBMBY48Y7P||
M5116F|S5SIB8BP85GBBMBY48Y7P8||
M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q
P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J|
S9598G|KZWXB48VYBPBBMBY48Y7P8||
K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M|
S2395K|ZSZ48PF#NFVBW8BBMBBMBY||
M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M
B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K
S5018D|KZ0HVMWM4Y3BBMBY48Y7P8||
K5943E|Z6PP8VPM83N8BPMV8BBMBY||
N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K|
HJ7138K|MV4YK08W38BBMBY48Y7P8S||
  | | |
R9004J|K0KCNP7P85SBBMBY48Y7P||
F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q
M2724Q|ZZZ54RFP8GYBMBBMBY48Y7||
D9109E|YVYB4W46SWYBMBBMBY48Y7P||
M9585E|KBGWY3YB2WBMBY48Y7P8S00||
  | | |
H3629M|-_P8VY-0WP8BMBY48Y7P8S0||
T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K|
M9295K|Z5ZMPFV8M8BMBBMBY48Y7||
P3089F|K6KDNPJ28ZK4YBBMBY48Y7||
PO2499R|-K067P8KPYBMBY48Y7P8S0||
L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K|
1088L|||
;
run;

data want(keep=pan1 pan2 pan3 add1 household);
 declare hash ha(hashexp: 16,ordered: 'A');
 declare hiter hi('ha');
  ha.definekey('_count');
  ha.definedata('_count','_pan1','_pan2','_pan3','_add1');
  ha.definedone();

  do until(last);
   set test(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1))  end=last;
   _count+1;ha.add();
  end;

  do until(_last);
   set test end=_last;
   count+1; found=0;
   rc=hi.first();
   do while(rc=0);
   if count ne _count and
   ((pan1=_pan1 and not missing(_pan1))  or  (pan1=_pan2 and not missing(_pan2))  or (pan1=_pan3 and not missing(_pan3)) or (pan1=_add1 and not missing(_add1)) or 
    (add1=_pan1 and not missing(_pan1))  or  (add1=_pan2 and not missing(_pan2))  or (add1=_pan3 and not missing(_pan3)) or (add1=_add1 and not missing(_add1))  )
    then do; _n+1;
           if _n eq 1 then do;
                             n+1; first=n;
                             household=n;
                           end;
             else household=first;
           found=1;leave;
          end;
   rc=hi.next();
   end;
 if not found then do; n+1; household=n; end;
 output;
 end;
stop;
run;



Ksharp

sas_Forum
Calcite | Level 5

Ksharp that is not only for pan1,add1 this is for all the four variables

actually my concept is to get the linkages if pan1 is again repeated in pan2 pan3 or add1 then it should get one Hid,not only that

ex  obs  pan1 pan2 pan3 add1  hid

          1.    aaa   bbb    ccc     ddd    1

           2.  qqq   rrr       www   aaa   1

           3.   rrr     ppp    mmm lll       1

           4.   uuu   zzz    ffff      ppp     1

          5     p       l        m          n      2

         6     jjjj    eee     rrr       ooo    1

         7   <all    blanks  >              3

        8    sss   www  .    .             1

        9  .         .        .        eee       1

in this example the    first obs is hid is 1

in obs2  add1 aaa is matching with pan1  aaa of obs1

in obs 3  pan1  rrr is matching in pan2 rrr of obs 2

in obs 4 add1  ppp is matching with pan2 ppp of obs 3

in obs 5 it is unique with no matching

in obs 6 pan3 rrr is matcing with pan1 rrr of obs 3

in obs 7 is having no values it got Hid as 3

obs 8 is having pan2 www matching with pan2 www of obs 2 so it got Hid 1

obs 9 add1 eee is matching with obs6 pan2 ee so it got Hid 1

Like this i want the linkages  

Ksharp i am tring this from 3 days i am not getting so please help me thanks for your help in advance....

Ksharp
Super User

OK.

You want to query the value from the previous obs not whole obs? just as Art'T said.

That would be more simple.

How about this code:

data test;
input  ( pan1 pan2 pan3 add1  ) ($);
datalines;
aaa   bbb    ccc     ddd    
qqq   rrr       www   aaa   
rrr     ppp    mmm lll       
uuu   zzz    ffff      ppp     
p       l        m          n      
jjjj    eee     rrr       ooo    
.    .   .   .             
sss   www  .    .             
 .         .        .        eee       
;
run;
data want(keep=pan1 pan2 pan3 add1 household);
if _n_ eq 1 then do;
 declare hash ha(hashexp: 16);
  ha.definekey('key');
  ha.definedone();
end;

set test;
array _house{*} $ 32 _character_;
array _save{4} $ 32 ;
do i=1 to dim(_house);
 key=_house{i};rc=ha.check();
 if rc=0 then do;
                household=1;found=1;
                 end;
     else _save{i}=_house{i};
end;
do j=1 to 4;
 if not missing(_save{j}) then do;
                                key=_save{j};
                                ha.replace();
                                      end;
end;
call missing( of _save{*});
if not found then do;
                         n+1;household=n;
                      end;
run;

Ksharp

Ksharp
Super User

Actually my code can be refined as:

data test;
input  ( pan1 pan2 pan3 add1  ) ($);
datalines;
aaa   bbb    ccc     ddd    
qqq   rrr       www   aaa   
rrr     ppp    mmm lll       
uuu   zzz    ffff      ppp     
p       l        m          n      
jjjj    eee     rrr       ooo    
.    .   .   .             
sss   www  .    .             
 .         .        .        eee       
;
run;
data want(keep=pan1 pan2 pan3 add1 household);
if _n_ eq 1 then do;
 declare hash ha(hashexp: 16);
  ha.definekey('key');
  ha.definedone();
end;

set test;
array _house{*} $ 32 _character_;
do i=1 to dim(_house);
 key=_house{i};rc=ha.check();
 if rc=0 then do;
                household=1;found=1;leave;
              end;
end;
do j=1 to dim(_house);
 if not missing(_house{j}) then do;
                                key=_house{j};
                                ha.replace();
                                end;
end;
if not found then do;
                   n+1;household=n;
                  end;
run;

Ksharp

art297
Opal | Level 21

Ksharp,

The following code is WRONG, but I'd like to know how to fix it.  After reconsidering what I was planning to do, I decided that learning to include a hash was the most sensible way to go.

However, your code is incomplete, and I don't know how to fix it.  I do know what it is missing:  Your code does not assign households to the hash thus, when it does a lookup, it merely assigns a "1" if it  finds a match.

I tried to correct it, below, but I am apparently missing or misunderstanding the concept.

Would definitely appreciate seeing how it SHOULD/COULD have been written correctly:

data test;

  infile cards dsd dlm='|' truncover;

  input (pan1 add1 pan2 pan3) (: $40.);

  cards;

M5669K|4HH5BFWBBMBY48Y7P8S|P5P8YBBMBY48Y7P8S0005D|

I2175Q|P5P8YBBMBY48Y7P8S0005D||

S1495P|8BWP_Y8V3MYVWBBMBBMB||

G4909P|WN8Y4YFN48FYP884P83Y4||

|S1495P||

S0560L|488Y8N8N8H-K56H4YP8P||

F0994B|R8YPF8Y4P-0BWP8BBMBY||

H6408L|HICPM4BBPWFP4WV4MP8YBB||

M2091M|48YJ23N3P8VWBBMBY48Y7||

S2426K|5V4PZHKB8YPNPVWPVBBMBY||

D7907L|SBMBMVYYBBM8PBY48Y7P8S0|H6408L|

G9847L|XB3845-WM43YWBBMBY48Y7||

D3871H|D3PC34Z0-38W3YBBMBY48Y||

T0753D|2P8Y38Y84W3N4NPJ2R8P8Y||

P5328F|Z06V4B8YFYFYMY8FMYVWBB||

D3938K|KSJ2VBYWBY4P8YBBMBY48Y||

A3529D|K0K4F4BY8BBXVBBMBY48Y8||

P8617A|I0S3YPV4P8BYVWMYVWBBM4||H6408L

R1129A|Z0KMV4YWM483NBBMB8Y7P8S||

R|KK0ZJ2PYJ2NW8K5SH8BV84||

S7807B|YPFBYBP4WNWN88YBMBBMBY||

N8681Q|KHH4B8MMYNW3Y8YBBMBY48||

P8632Q|H550HFMBM4YPFYFPBBMBY4||

K3437H|DZ6MYVNN3YWMBY48Y7P8S0||

G5510C|ZH@YPBY84BB$$$$$$$$$||

R2518M|HZB8NV8MWXYBBY48Y7P8S00||

M1014A|B8XY5IZH_38VWBBMBBMBY4||

M0049P|KZYJ2C8Y3BNCVY4VP8YBBM||

PD4296R|K0KVPJ23YV43PFY3P8VWBB||

L5351M|KF47PP8NPCN4VWNWBBMS00||

D7744C|56PW3WF3W8P3P8VWBBMBY4||

U1088L|#8B8WP8FPMP3YWBBMS0005||

U1088L|#8B8WP8FPMP3YWBBMBY48Y||

U0564D|#8B8WP8FPMP3YWBBMBY48Y||

B0102C|@P7P8W4@PZIYBWDSZ4M_W||

U1088L|#8B8WP8FPMPWBBMBY48Y||

U0564D|#8B8WP8FPMPYWBBMBY48Y||

S9556K|@P7P8W46CN8SZ_WGP48PBB||

S9556K|@P7P8W46CN8SZ_WGP48PBB||

B1476E|S05YCGY4N4CYHZKPYBPBBMS||

C2940N|56PW3WF3W8P3P8VWBBMBY4||

U0564D|#8B8WP8FPMP3YWBBMBY48Y||

R0090B|_XN4S2YJ2F8MBY48Y7P8S0||

N8784Q|S0K4PY4WD65YWBBMBY48Y7||

G8553G|KZBP8SFYFPNYBBMBY48Y7P||

S5904H|60KZZ2Y3NCP5J2Y3N3P8VW||

S3482C|Z05Y4YJ24VVYWN884YBBMB||

U1088L|#8B8WP8FPMPYWBBMBY48Y||

K3031C|ZH0HF8YPFY84Y83NBBMBY||

R0090B|_XN4S2YJ2F8MBY48Y7P8S0||

P4466N|IHY48CH84YBBY48Y7P8S00||

J2314G|5WXBM8Y3YWBBBMBY48Y7P8||

R0090B|_XN4SJ2YJ2F8MBY48Y7P8S0||

TI7823D|78MH-ZZ3M8GYBBMBBMBY48||

B0509R|50Z88B5V48YN4FYFPBBMB||

M0682N|B4YGWP443YPY3WMBBMBY48||

K0725D|ZH0HF8YPFY8Y4Y83NBBMBY||

P9185M|Z-8WX6SW8BPBMBY48Y7P8S||

V2636H|Z05WWMFJ2YC4P4WWMFBBMBY||

K9954E|MJ2FPFYBCBBY48Y7P8$$$$||

PS0210M|S0-SBBW88YW8BBMBY48Y7P||

PS5842M|38FW-WXBV4FWX_BNWXBBMB||

M5480A|50Z44YWVWNWMBY48Y7P8S0||

V2825L|SCF84C@WBWBYBYWP8YBBMB||

J9090J|BM83Y8P4NN6V883NBBMBY4||

G9332F|Z0SZ4W8HV8FPBBMBY48Y7P||

M2017K|K6KDNPJ28ZK4YBBMBY48Y7||

M6901D|I-NPX488V3YPBBMBY48Y7P8||

S0689N|Z0ZZP2YF367YVWBBMBBMBY|M5669K|

M4134E|DNB4FHZIHZ43Y4BBMBY48Y||

G7604R|5-Y8PRPMWPF4YBBMBY48Y7||

S3468A|HCFP48FWWV88BBMBY48Y7P||

S0796N|K0SP3KPVFYWBWP8BBMBY48||

D7781Q|H0DN48484P8883NBBMBY48|S2395K|

B4607P|HBPFWIS3N3YBMBY48Y7P8S||

C4398Q|K0HKCYBX8YMKMYVWBBMBY4||

J9787E|SNMZD4PB8PFBBMBY48Y7P8|D7781Q|

D1187E|BYP8BWBY4FPP8YBBMBBMBY||S2395K

K2110J|HKS6FYNWFYHMWBBMBY48Y7P||

M5116F|S5SIB8BP85GBBMBY48Y7P8||

M3134J|6HWJ2P78NVBBY48Y7P8S00||D7781Q

P1590P|S0Z4PYVY84PBBMBY48Y7P8|M3134J|

S9598G|KZWXB48VYBPBBMBY48Y7P8||

K9168B|-HJ2M88MJ2WMBY48Y7P8S0|H3629M|

S2395K|ZSZ48PF#NFVBW8BBMBBMBY||

M2140L|48VP33W8BNB8BW8BBMBBMB||H3629M

B4952H|YJ2PB3CN8MYBBMBY48Y7P8|S9598G|S2395K

S5018D|KZ0HVMWM4Y3BBMBY48Y7P8||

K5943E|Z6PP8VPM83N8BPMV8BBMBY||

N6677M|ZKYCSVW8BF8VWBBMBY48Y7|M5669K|

HJ7138K|MV4YK08W38BBMBY48Y7P8S||

  | | |

R9004J|K0KCNP7P85SBBMBY48Y7P||

F0109Q|Z8W8SSVY4VYNM8VBBMBBMB|R9004J|M2724Q

M2724Q|ZZZ54RFP8GYBMBBMBY48Y7||

D9109E|YVYB4W46SWYBMBBMBY48Y7P||

M9585E|KBGWY3YB2WBMBY48Y7P8S00||

  | | |

H3629M|-_P8VY-0WP8BMBY48Y7P8S0||

T2426B|50ZYC8NHI8YBCVCFYP8BBM|M9295K|

M9295K|Z5ZMPFV8M8BMBBMBY48Y7||

P3089F|K6KDNPJ28ZK4YBBMBY48Y7||

PO2499R|-K067P8KPYBMBY48Y7P8S0||

L3447Q|33PV4V5I633F4YBBMBY48Y|M5669K|

1088L|||

;

run;

data want (keep=pan1 pan2 pan3 add1 household);

  if _n_ eq 1 then do;

    declare hash ha(hashexp: 16);

    ha.definekey('key');

    ha.definedata('hhold');

    ha.definedone();

  end;

 

  set test;

  array _house{*} $ 40 pan1--pan3;

  do i=1 to dim(_house);

    key=_house{i};

    rc=ha.check();

    if rc=0 then do;

      found=1;

      household=hhold;

      leave;

    end;

  end;

  if not found then do;

    n+1;

    household=n;

  end;

  do j=1 to dim(_house);

    if not missing(_house{j}) then do;

      key=_house{j};

      hhold=household;

      ha.replace();

    end;

  end;

run;

.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 124 replies
  • 3696 views
  • 4 likes
  • 7 in conversation