Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Re: data normalisation

Accepted Solution Solved
Reply
Contributor
Posts: 57
Accepted Solution

Re: data normalisation

Dear All,

Here i have small problem in the arranging the data because my data is family and health surveys in individual level it looks like

case_ID member1 age1 sex1 dob1  member2 age2 sex2 dob2  member3 age3 sex3 dob3    member4 age4 sex4 dob4

but i need in the belowed format

case_ID    member    age  sex  dob

101                    1      54    m    4443

102                    2      32      f      5443

103                    3      43    m      3234

104                    4      21    m      2123

so please give me suggestion how to go with it.

here i attached data sample for your information.

Thanks in advance....

Anil

Attachment

Accepted Solutions
Solution
‎05-19-2014 09:02 AM
Super User
Posts: 9,671

Re: data normalisation

Your data is a little dirty .

    data WORK.HAVE                                    ;
    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
    infile 'c:\temp\sample data.txt' delimiter='09'x MISSOVER DSD lrecl=32767 firstobs=2 ;
       informat CASEID comma32. ;
       informat country_code $3. ;
       informat cluster best32. ;
       informat Household_no best32. ;
       informat Individual_no best32. ;
       informat BIDX_01 best32. ;
       informat BORD_01 best32. ;
       informat B0_01 best32. ;
       informat B1_01 best32. ;
       informat B2_01 best32. ;
       informat B3_01 best32. ;
       informat B4_01 best32. ;
       informat B5_01 best32. ;
       informat B6_01 best32. ;
       informat B7_01 best32. ;
       informat B8_01 best32. ;
       informat B9_01 best32. ;
       informat B10_01 best32. ;
       informat B11_01 best32. ;
       informat B12_01 best32. ;
       informat B13_01 best32. ;
       informat BIDX_02 best32. ;
       informat BORD_02 best32. ;
       informat B0_02 best32. ;
       informat B1_02 best32. ;
       informat B2_02 best32. ;
       informat B3_02 best32. ;
       informat B4_02 best32. ;
       informat B5_02 best32. ;
       informat B6_02 best32. ;
       informat B7_02 best32. ;
       informat B8_02 best32. ;
       informat B9_02 best32. ;
       informat B10_02 best32. ;
       informat B11_02 best32. ;
       informat B12_02 best32. ;
       informat B13_02 best32. ;
       informat BIDX_03 best32. ;
       informat BORD_03 best32. ;
       informat B0_03 best32. ;
       informat B1_03 best32. ;
       informat B2_03 best32. ;
       informat B3_03 best32. ;
       informat B4_03 best32. ;
       informat B5_03 best32. ;
       informat B6_03 best32. ;
       informat B7_03 best32. ;
       informat B8_03 best32. ;
       informat B9_03 best32. ;
       informat B10_03 best32. ;
       informat B11_03 best32. ;
       informat B12_03 best32. ;
       informat B13_03 best32. ;
       informat BIDX_04 best32. ;
       informat BORD_04 best32. ;
       informat B0_04 best32. ;
       informat B1_04 best32. ;
       informat B2_04 best32. ;
       informat B3_04 best32. ;
       informat B4_04 best32. ;
       informat B5_04 best32. ;
       informat B6_04 best32. ;
       informat B7_04 best32. ;
       informat B8_04 best32. ;
       informat B9_04 best32. ;
       informat B10_04 best32. ;
       informat B11_04 best32. ;
       informat B12_04 best32. ;
       informat B13_04 best32. ;
       format CASEID comma12. ;
       format country_code $3. ;
       format cluster best12. ;
       format Household_no best12. ;
       format Individual_no best12. ;
       format BIDX_01 best12. ;
       format BORD_01 best12. ;
       format B0_01 best12. ;
       format B1_01 best12. ;
       format B2_01 best12. ;
       format B3_01 best12. ;
       format B4_01 best12. ;
       format B5_01 best12. ;
       format B6_01 best12. ;
       format B7_01 best12. ;
       format B8_01 best12. ;
       format B9_01 best12. ;
       format B10_01 best12. ;
       format B11_01 best12. ;
       format B12_01 $1. ;
       format B13_01 best12. ;
       format BIDX_02 best12. ;
       format BORD_02 best12. ;
       format B0_02 best12. ;
       format B1_02 best12. ;
       format B2_02 best12. ;
       format B3_02 best12. ;
       format B4_02 best12. ;
       format B5_02 best12. ;
       format B6_02 best12. ;
       format B7_02 best12. ;
       format B8_02 best12. ;
       format B9_02 best12. ;
       format B10_02 best12. ;
       format B11_02 best12. ;
       format B12_02 best12. ;
       format B13_02 best12. ;
       format BIDX_03 best12. ;
       format BORD_03 best12. ;
       format B0_03 best12. ;
       format B1_03 best12. ;
       format B2_03 best12. ;
       format B3_03 best12. ;
       format B4_03 best12. ;
       format B5_03 best12. ;
       format B6_03 best12. ;
       format B7_03 best12. ;
       format B8_03 best12. ;
       format B9_03 best12. ;
       format B10_03 best12. ;
       format B11_03 best12. ;
       format B12_03 best12. ;
       format B13_03 best12. ;
       format BIDX_04 best12. ;
       format BORD_04 best12. ;
       format B0_04 best12. ;
       format B1_04 best12. ;
       format B2_04 best12. ;
       format B3_04 best12. ;
       format B4_04 best12. ;
       format B5_04 best12. ;
       format B6_04 best12. ;
       format B7_04 best12. ;
       format B8_04 best12. ;
       format B9_04 best12. ;
       format B10_04 best12. ;
       format B11_04 best12. ;
       format B12_04 best12. ;
       format B13_04 best12. ;
    input
                CASEID
                country_code $
                cluster
                Household_no
                Individual_no
                BIDX_01
                BORD_01
                B0_01
                B1_01
                B2_01
                B3_01
                B4_01
                B5_01
                B6_01
                B7_01
                B8_01
                B9_01
                B10_01
                B11_01
                B12_01 
                B13_01
                BIDX_02
                BORD_02
                B0_02
                B1_02
                B2_02
                B3_02
                B4_02
                B5_02
                B6_02
                B7_02
                B8_02
                B9_02
                B10_02
                B11_02
                B12_02
                B13_02
                BIDX_03
                BORD_03
                B0_03
                B1_03
                B2_03
                B3_03
                B4_03
                B5_03
                B6_03
                B7_03
                B8_03
                B9_03
                B10_03
                B11_03
                B12_03
                B13_03
                BIDX_04
                BORD_04
                B0_04
                B1_04
                B2_04
                B3_04
                B4_04
                B5_04
                B6_04
                B7_04
                B8_04
                B9_04
                B10_04
                B11_04
                B12_04
                B13_04
    ;
    if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
    run;


data temp;
 set have;
 array _b{*} b: ;
 do i=1 to dim(_b);
  _name=vname(_b{i});
  index=scan(_name,-1,'_');
  name=scan(_name,1,'_');
  value=_b{i};
  output;
 end;
drop i b: _name ;
run;
proc transpose data=temp out=want(drop=_name_);
by CASEID  country_code  cluster Household_no Individual_no index notsorted;
var value;
id name;
run;

Xia Keshan

View solution in original post


All Replies
Super User
Posts: 5,255

Re: data normalisation

Use the data step with arrays, do-loop and explicit output.

Search support.sas.com for "transposing data set" - there are plenty of examples and doc around this topic.

Data never sleeps
Super User
Posts: 6,928

Re: data normalisation

How about this:

data want (keep=case_ID member age sex dob);

set have;

array members{*} member1-member4;

array ages{*} age1-age4;

array sexs{*} sex1-sex4;

array dobs{*} dob1-dob4;

do i = 1 to 4;

  if members{i} ne .

  then do;

    member = members{i};

    age = ages{i};

    sex = sexs{i};

    dob = dobs{i};

    output;

  end;

end;

run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 57

Re: data normalisation

Deal All ,

My question is raw data looks like this

Case IDHousehold head No. of memners in HHMem1mem2mem3sex1sex2sex3age1age2age3

but i need data arrange like in member variables i need all members names and under sex variable i need to arrange all members sex  under one by one.

like

Case IDHousehold head No. of memners in HHMemberssexAge
101AB3mem1sex1age1
102BC2mem2sex2age2
103CD1mem3sex3age3
104AC5
105AD2
106BD4

so please give some idea. its huge data set please help me in this. because i am dealing with health survey data.i need to match mother and child data with household information .

Thanks and Regards,

Anil

Super User
Posts: 6,928

Re: data normalisation

Since the data in your .zip look different from what you describe here, I suggest you post sample "have" and "want" data here.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 57

Re: data normalisation

CASEIDcountry_codeclusterHousehold_noIndividual_noBIDX_01BORD_01B0_01B1_01B2_01B3_01B4_01B5_01B6_01B7_01B8_01B9_01B10_01B11_01B12_01B13_01BIDX_02BORD_02B0_02B1_02B2_02B3_02B4_02B5_02B6_02B7_02B8_02B9_02B10_02B11_02B12_02B13_02BIDX_03BORD_03B0_03B1_03B2_03B3_03B4_03B5_03B6_03B7_03B8_03B9_03B10_03B11_03B12_03B13_03BIDX_04BORD_04B0_04B1_04B2_04B3_04B4_04B5_04B6_04B7_04B8_04B9_04B10_04B11_04B12_04B13_04
2  1 9  1IA2200491120888106421 30137 21078510271030336 1 370
2  1 17  2IA22004172140292110611 00137 230189106911 3014837 320185102121 7019148 41067793011 1401 91
2  1 33  1IA2200433113027792611 150144 22067388221 18317744 31016780521 2501 77
2  1 33  2IA22004332
2  1 33  3IA22004333110189106911 301
2  1 41  2IA22004412130691109811 00155 2201186104311 5015255 31078299121 901 52
2  1 49  2IA2200449213086477621 273131 22016274511 30013731 310125870821 3331 37
2  1 57  2IA220045721501288106821 30137 2401185103111 6011537 3308841016201070 1231504209829932030448 124230
2  1 65  2IA22004652130891110011 00166 220286103411 6011066 3104851024101050 5 100
2  1 73  2IA2200473214011911103202055 124 02301189107921 2014224 32058610372030224 1184204101184101921 701 18
2  1 89  2IA2200489216097186111 200116 25057084511 22013216 34096781311 24312532 43086578811 26311525

the above data is my raw data which i have.

But i want  format like this

there are variables in the above data is BIDX_01 BIDX_02  BIDX_03 BIDX_04

and B0_01---B10_1  and B0_02---- B10_02

so i need to arrange data like

Case_ID     BIDX     B1     B2     B3     B4

101              1          1       1       1        1

102              3          2       4        2       1

i mean the variables instead of side by side i need to bring one below the another.

Thanks and Regards,

Anil

Super User
Posts: 6,928

Re: data normalisation

Is the target Case_ID just counting up from 101, or is there a rule for generating it?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,392

Re: data normalisation

Well, one way to do it would be to do the following - note that if you have lots of combinations, then pop the sql in a macro and call it each time, or generate the code using call execute.:

proc sql;

     create table WANT as

     select     CASEID,

                    BIDX_01 as BIDx,

                    B0_1 as B1,

                    B0_2 as B2,

                    ...

     from        HAVE

     union all

     select     CASEID,

                    BIDX_02 as BIDX,

                    B0_2 as B1,

                    ...

     from       HAVE

     union all

     ...

quit;

Solution
‎05-19-2014 09:02 AM
Super User
Posts: 9,671

Re: data normalisation

Your data is a little dirty .

    data WORK.HAVE                                    ;
    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
    infile 'c:\temp\sample data.txt' delimiter='09'x MISSOVER DSD lrecl=32767 firstobs=2 ;
       informat CASEID comma32. ;
       informat country_code $3. ;
       informat cluster best32. ;
       informat Household_no best32. ;
       informat Individual_no best32. ;
       informat BIDX_01 best32. ;
       informat BORD_01 best32. ;
       informat B0_01 best32. ;
       informat B1_01 best32. ;
       informat B2_01 best32. ;
       informat B3_01 best32. ;
       informat B4_01 best32. ;
       informat B5_01 best32. ;
       informat B6_01 best32. ;
       informat B7_01 best32. ;
       informat B8_01 best32. ;
       informat B9_01 best32. ;
       informat B10_01 best32. ;
       informat B11_01 best32. ;
       informat B12_01 best32. ;
       informat B13_01 best32. ;
       informat BIDX_02 best32. ;
       informat BORD_02 best32. ;
       informat B0_02 best32. ;
       informat B1_02 best32. ;
       informat B2_02 best32. ;
       informat B3_02 best32. ;
       informat B4_02 best32. ;
       informat B5_02 best32. ;
       informat B6_02 best32. ;
       informat B7_02 best32. ;
       informat B8_02 best32. ;
       informat B9_02 best32. ;
       informat B10_02 best32. ;
       informat B11_02 best32. ;
       informat B12_02 best32. ;
       informat B13_02 best32. ;
       informat BIDX_03 best32. ;
       informat BORD_03 best32. ;
       informat B0_03 best32. ;
       informat B1_03 best32. ;
       informat B2_03 best32. ;
       informat B3_03 best32. ;
       informat B4_03 best32. ;
       informat B5_03 best32. ;
       informat B6_03 best32. ;
       informat B7_03 best32. ;
       informat B8_03 best32. ;
       informat B9_03 best32. ;
       informat B10_03 best32. ;
       informat B11_03 best32. ;
       informat B12_03 best32. ;
       informat B13_03 best32. ;
       informat BIDX_04 best32. ;
       informat BORD_04 best32. ;
       informat B0_04 best32. ;
       informat B1_04 best32. ;
       informat B2_04 best32. ;
       informat B3_04 best32. ;
       informat B4_04 best32. ;
       informat B5_04 best32. ;
       informat B6_04 best32. ;
       informat B7_04 best32. ;
       informat B8_04 best32. ;
       informat B9_04 best32. ;
       informat B10_04 best32. ;
       informat B11_04 best32. ;
       informat B12_04 best32. ;
       informat B13_04 best32. ;
       format CASEID comma12. ;
       format country_code $3. ;
       format cluster best12. ;
       format Household_no best12. ;
       format Individual_no best12. ;
       format BIDX_01 best12. ;
       format BORD_01 best12. ;
       format B0_01 best12. ;
       format B1_01 best12. ;
       format B2_01 best12. ;
       format B3_01 best12. ;
       format B4_01 best12. ;
       format B5_01 best12. ;
       format B6_01 best12. ;
       format B7_01 best12. ;
       format B8_01 best12. ;
       format B9_01 best12. ;
       format B10_01 best12. ;
       format B11_01 best12. ;
       format B12_01 $1. ;
       format B13_01 best12. ;
       format BIDX_02 best12. ;
       format BORD_02 best12. ;
       format B0_02 best12. ;
       format B1_02 best12. ;
       format B2_02 best12. ;
       format B3_02 best12. ;
       format B4_02 best12. ;
       format B5_02 best12. ;
       format B6_02 best12. ;
       format B7_02 best12. ;
       format B8_02 best12. ;
       format B9_02 best12. ;
       format B10_02 best12. ;
       format B11_02 best12. ;
       format B12_02 best12. ;
       format B13_02 best12. ;
       format BIDX_03 best12. ;
       format BORD_03 best12. ;
       format B0_03 best12. ;
       format B1_03 best12. ;
       format B2_03 best12. ;
       format B3_03 best12. ;
       format B4_03 best12. ;
       format B5_03 best12. ;
       format B6_03 best12. ;
       format B7_03 best12. ;
       format B8_03 best12. ;
       format B9_03 best12. ;
       format B10_03 best12. ;
       format B11_03 best12. ;
       format B12_03 best12. ;
       format B13_03 best12. ;
       format BIDX_04 best12. ;
       format BORD_04 best12. ;
       format B0_04 best12. ;
       format B1_04 best12. ;
       format B2_04 best12. ;
       format B3_04 best12. ;
       format B4_04 best12. ;
       format B5_04 best12. ;
       format B6_04 best12. ;
       format B7_04 best12. ;
       format B8_04 best12. ;
       format B9_04 best12. ;
       format B10_04 best12. ;
       format B11_04 best12. ;
       format B12_04 best12. ;
       format B13_04 best12. ;
    input
                CASEID
                country_code $
                cluster
                Household_no
                Individual_no
                BIDX_01
                BORD_01
                B0_01
                B1_01
                B2_01
                B3_01
                B4_01
                B5_01
                B6_01
                B7_01
                B8_01
                B9_01
                B10_01
                B11_01
                B12_01 
                B13_01
                BIDX_02
                BORD_02
                B0_02
                B1_02
                B2_02
                B3_02
                B4_02
                B5_02
                B6_02
                B7_02
                B8_02
                B9_02
                B10_02
                B11_02
                B12_02
                B13_02
                BIDX_03
                BORD_03
                B0_03
                B1_03
                B2_03
                B3_03
                B4_03
                B5_03
                B6_03
                B7_03
                B8_03
                B9_03
                B10_03
                B11_03
                B12_03
                B13_03
                BIDX_04
                BORD_04
                B0_04
                B1_04
                B2_04
                B3_04
                B4_04
                B5_04
                B6_04
                B7_04
                B8_04
                B9_04
                B10_04
                B11_04
                B12_04
                B13_04
    ;
    if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
    run;


data temp;
 set have;
 array _b{*} b: ;
 do i=1 to dim(_b);
  _name=vname(_b{i});
  index=scan(_name,-1,'_');
  name=scan(_name,1,'_');
  value=_b{i};
  output;
 end;
drop i b: _name ;
run;
proc transpose data=temp out=want(drop=_name_);
by CASEID  country_code  cluster Household_no Individual_no index notsorted;
var value;
id name;
run;

Xia Keshan

Contributor
Posts: 57

Re: data normalisation

Dear Ksharp,

It's working now thanks a lot for your idea. but my raw data is in SAS 6 version files so its wont work in SAS 9.3 so i need to migrate and transpose. the datasets memory is in GB's i don't know how to process this data.

Thanks a lot once again....

Regards,

Anil

Super User
Posts: 6,928

Re: data normalisation

SAS can automatically deal with older dataset formats. Just read the inputs and write the new data to another library that contains 9.3 data.

Of course you WILL need enough disk space anyway.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 420 views
  • 1 like
  • 5 in conversation