Desktop productivity for business analysts and programmers

Split dataset in multiple datasets

Reply
Occasional Contributor
Posts: 14

Split dataset in multiple datasets

I'm trying to split a dataset in mutliple datasets based on macrovariables.

How should it work:  I have an Excel-file RV.xls with the following description

Dataset RV:

TypeDecision_dateDecision_numberAmount_in_EUROBBANIBANBICAccount_in_the_name_ofBeneficiaryNotificationPayable_day
2201303223031872425063007-2567914-12 xxxJos20090417
2201303223031882441577001-0987654-32 yyyPol20090417
22013032230318933000 GB44CITI26408067654932DRESDEFFzzzFrans20091204
22013032230319066800001-0077540-69 xyzAlain20091204
22013032230319122200 NL65DEUT1234567890FTSBNL2RzyxFritz20091204
2201303223031923000 NL32FTSB1387056270ABNANL2AxzxEmma20091204
220130322303193700001-0987654-32 yxyLucy20091205

and a second Excel-file Bank.xls like this.

Dataset  BankList:

BankNaamRek1Rek2Taal

B1

Test1GB44CITI26408067654932NL32FTSB1387056270EN
B2Test2007-2567914-12001-0077540-69FR
B3Test3301-1234567-89NL
B4Test4NL65DEUT1234567890EN

I can read both file and make macrovariables from the Bank-file.

data _null_;

  set RvIn.BankList;

    if not missing(Naam) then call symput(strip(Bank) || '_Naam', Naam);

    if not missing(Rek1) then call symput(strip(Bank) || '_Rek1', Rek1);

    if not missing(Rek2) then call symput(strip(Bank) || '_Rek2', Rek2);

    if not missing(Taal) then call symput(strip(Bank) || '_Taal', Taal);

run;

Then I'm counting the records in my BankList-file

proc sql noprint;

  select count(Bank)

    into :n

  from RvIn.BankList;

  /* %put aantal: &n; */

quit;

After this I would like to split the dataset RV in multiple set based on the data from BankList.

So I would like to make for every Bank a single set IF it occurs offcourse. In this example I

would make 3 new datasets Test1 with 2 records / Test2 with 1 record and Test4 also with 1.

%macro Split_DataSets;

  %do i=1 %to &n;

    proc sql;

      create table &&B&i._Naam as

        select *

        from RvIn.RV as RV

        where RV.IBAN IN ("&&B&i._Rek1","coalesce(&&B&i._Rek2,0)")

             or RV.BBAN IN ("&&B&i._Rek1","coalesce(&&B&i._Rek2,0)");

    quit;

  %end;

%mend;

%Split_DataSets;

And that doesn't work. I need to find some routine to make these sets but the problem is

that I don't know wether they're using IBAN / BBAN  or they have 2 numbers.

Can somebody help me with this?

Regular Contributor
Posts: 214

Re: Split dataset in multiple datasets

Hi Liej,

Try to approach this in a different way.

Use Data Set Hash Object to load the BankList and use it as a Lookup table. You'll need two instances, with different Key variable in each (one with Rek1 as key variable,  second with Rek2 as key variable). Read in your Execel file RV.xls once and try to check your lookup variables (BBAN,IBAN) against the two Hash Object.

There are several examples of how to use the Hash Object,

- Go to www.lexjansen.com and Search for Hash Object

- Go to support.sas.com and Search for Hash Object

Good Luck

Occasional Contributor
Posts: 14

Re: Split dataset in multiple datasets

Hi AhmedAl-Attar,

I ones used the Hash fuction. But then I splitted the file based on 1 number in that file.

Now I have to check if the bank has 2 numbers (BBAN - BBAN / IBAN - IBAN / BBAN - IBAN) or not.

Then search all these records and write them as one new dataset with the bank name.

The problem is that not all records has 2 number and that the combinations aren't fixed.

Respected Advisor
Posts: 3,156

Re: Split dataset in multiple datasets

Your description is somehow not lining up with your data. Your data suggests that you are using TWO Bank variable ('rek2' and 'taal') to match ONE RV variable ('bban'). Maybe you also need to match 'iban', but your data did not show it. So the following Hash of Hash (HOH) approach is based on  your data as is, and it will bring in 3 outcome tables, with Test2 having 2 obs, while it was 1 obs in your expectation. With some tweak, you can also use it to check more than one variables, eg. 'iban',  in RV.

data rv;

input (Type Decision_date Decision_number Amount_in_EURO BBAN IBAN) (:$25.);

cards;

2 20130322 303187 2425063 007-2567914-12 xxx

2 20130322 303188 2441577 001-0987654-32 yyy

2 20130322 303189 33000 GB44CITI26408067654932 DRESDEFF

2 20130322 303190 66800 001-0077540-69 xyz

2 20130322 303191 22200 NL65DEUT1234567890 FTSBNL2R

2 20130322 303192 3000 NL32FTSB1387056270 ABNANL2A

2 20130322 303193 700 001-0987654-32 yxy

;

data bank;

input (Naam Rek1 Rek2 Taal) (:$25.);

cards;

B1 Test1 GB44CITI26408067654932 NL32FTSB1387056270 EN

B2 Test2 007-2567914-12 001-0077540-69 FR

B3 Test3 301-1234567-89 NL

B4 Test4 NL65DEUT1234567890 EN

;

data _null_;

  if 0 then set bank rv;

declare hash bk(dataset:'rv', multidata:'y');

bk.definekey('bban');

bk.definedata(all:'y');

bk.definedone();

declare hash hoh(ordered:'a');

declare hiter hih('hoh');

hoh.definekey('rek1');

hoh.definedata('rek1' ,'hh');

hoh.definedone();

declare hash hh();

do until (last);

  set bank end=last;

  do rc=bk.find(key:rek2) by 0 while (rc=0);

  i+1;

  if hoh.find() ne 0 then do;

  hh=_new_ hash(ordered: 'a');

hh.definekey ('i');

hh.definedata('rek1', 'Type' ,'Decision_date', 'Decision_number', 'Amount_in_EURO', 'BBAN', 'IBAN');

hh.definedone();

hoh.replace();

  end;

  hh.replace();

  rc=bk.find_next(key:rek2);

  end;

  do rc=bk.find(key:taal) by 0 while (rc=0);

  i+1;

  if hoh.find() ne 0 then do;

  hh=_new_ hash(ordered: 'a');

hh.definekey ('i');

hh.definedata('rek1', 'Type' ,'Decision_date', 'Decision_number', 'Amount_in_EURO', 'BBAN', 'IBAN', 'i');

hh.definedone();

hoh.replace();

  end;

  hh.replace();

  rc=bk.find_next(key:taal);

  end;

end;

do rc=hih.next() by 0 while (rc=0);

  hh.output (dataset: rek1);

  rc=hih.next();

end;

stop;

run;

Haikuo

Occasional Contributor
Posts: 14

Re: Split dataset in multiple datasets

 

ERROR: Argument length greater than length of key variable BBAN at line 29 column 9.

ERROR: Keys and data must be specified in the same order and have the same types as given in DefineKey/DefineData at line 29 column 9.

ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.

->   29                     do rc=bk.find(key:rek2) by 0 while (rc=0);

I guess this is because not every bank has a Rek2.

If every bank had only Rek1 or both Rek1 and Rek2 then I had no problem. But it's because some banks don't have a Rek2 I'm dealing

with this problem.

Occasional Contributor
Posts: 14

Re: Split dataset in multiple datasets

I changed my bank-file in a way I'm now able to spit the RV-file.

Dataset  BankList:

BankNaamRek1Taal
B1AA1GB44CITI26408067654932EN
B2AA2NL32FTSB1387056271EN
B3BB1007-2567914-12FR
B4BB2001-0077540-70FR
B5CC301-1234567-89NL
B6DDNL65DEUT1234567890EN

Instead of using Rek2 I added a line with the second Rek. I had to change the Naam and added a number.

After my program I get 6 datasets AA1 - AA2 - BB1 - BB2 - CC - DD ( in this example)

Then I'm wrote a proc datasets to  append each datasets2 to the 1.

Proc datasets;

     append base=Lib.AA1 data =Lib.AA2 force;

run;

Can I write something to do this automatically?

So I would like to add a program step in my project that appends the 2-datasets to the 1's.

But AA1 append AA2 should become AA-dataset.

I would like to do this in a program because sometimes a bank gets a second Rek and then

I don't need to change my project.

Can this be done in a program

Who v

Ask a Question
Discussion stats
  • 5 replies
  • 898 views
  • 3 likes
  • 3 in conversation