BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vickys
Obsidian | Level 7

Hi guru's Could some one help me to code one of our requirement. I tried my not iterative and have some challanne as I am new.

I have this data in CSV file below are the cell numbers i.e data in one column and three rows. There are some thousands of rows.

I have to remove that delimiter <tt> and load the data to corresponding fileds. Attaced is the input file but in notepad

 

1ST CELL A1:A1

Name: abc<tt>
Product: loan<tt>
accno: 123456<tt>
amount: 389.56<tt>
bonus: 3000<tt>
Product: CCard_plat<tt>
accno: 5689<tt>
amount: 478.96<tt>
Product: gold_insurance<tt>
accno: 1010102<tt>
amount: 103<tt>
bankcde: ICI<tt>
payment: 3094

 

2nd CELL A1:B2

Name: vbsd<tt>
Product: CCard_plat<tt>
accno: 74123<tt>
amount: 56.21<tt>
Product: gold_insurance<tt>
accno: 2020201<tt>
amount: 189<tt>
bankcde: hdfc<tt>
payment: 605


3rd CELL A1:B3

Name: klhj<tt>
Product: gold_insurance<tt>
accno: 3030312<tt>
amount: 563<tt>
bankcde: kk<tt>
payment: 489


I am expecting the output of the data as below: Where there is no data for any of the product then it should load with blank and fill the remaining fields.

 

Name Product Accno Amount Bonus Card_type CC_Accno CC_Amount_no Insurance_Type Insurance_No Insurance_Amt Bank_Cde Payment
abc      loan    123456     389.56      3000     ccard_plat     5689      478.96       gold_insurance       1010102     103      ICI      3094
vbsd                                                              ccard_plat     74123      56.21       gold_insurance        2020201     189     hdfc     605
klhj                                                                                                                      gold_insurance        3030312     563     kk        489

 

Really appreciate your inputs. 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The code is just a proof of concept, and incomplete at that.

This now deals correctly with all data:

data want;
retain
  Name pr Product Accno Amount Bonus Card_type CC_Accno CC_Amount_no
  Insurance_Type Insurance_No Insurance_Amt Bank_Cde Payment
;
length
  Name $10
  pr Product Card_type Insurance_Type $20
  Accno CC_Accno Insurance_No $6
  Bank_Cde $10
;
infile datalines eof=done;
input;
select (scan(_infile_,1,':'));
  when ('Name') do;
    if name ne ''
    then do;
      output;
      call missing(
        Name,pr,Product,Accno,Amount,Bonus,Card_type,CC_Accno,CC_Amount_no,
        Insurance_Type,Insurance_No,Insurance_Amt,Bank_Cde,Payment
      );
    end;
    name = strip(scan(scan(_infile_,2,':'),1,'<'));
  end;
  when ('Product') do;
    pr = strip(scan(scan(_infile_,2,':'),1,'<'));
    select (pr);
      when ('loan') product = pr;
      when ('CCard_plat') card_type = pr;
      when ('gold_insurance') insurance_type = pr;
    end;
  end;
  when ('accno') select (pr);
    when ('loan') accno = strip(scan(scan(_infile_,2,':'),1,'<'));
    when ('CCard_plat') CC_Accno = strip(scan(scan(_infile_,2,':'),1,'<'));
    when ('gold_insurance') Insurance_No = strip(scan(scan(_infile_,2,':'),1,'<'));
  end;
  when ('amount') select (pr);
    when ('loan') amount = input(scan(scan(_infile_,2,':'),1,'<'),best.);
    when ('CCard_plat') CC_Amount_no = input(scan(scan(_infile_,2,':'),1,'<'),best.);
    when ('gold_insurance') Insurance_Amt = input(scan(scan(_infile_,2,':'),1,'<'),best.);
  end;
  when ('bonus') bonus = input(scan(scan(_infile_,2,':'),1,'<'),best.);
  when ('bankcde') Bank_Cde = strip(scan(scan(_infile_,2,':'),1,'<'));
  when ('payment') Payment = input(scan(_infile_,2,':'),best.);
  otherwise;
end;
goto dsend;
done: output;
dsend:
drop pr;
datalines;
1ST CELL A1:A1

Name: abc<tt>
Product: loan<tt>
accno: 123456<tt>
amount: 389.56<tt>
bonus: 3000<tt>
Product: CCard_plat<tt>
accno: 5689<tt>
amount: 478.96<tt>
Product: gold_insurance<tt>
accno: 1010102<tt>
amount: 103<tt>
bankcde: ICI<tt>
payment: 3094

 

2nd CELL A1:B2

Name: vbsd<tt>
Product: CCard_plat<tt>
accno: 74123<tt>
amount: 56.21<tt>
Product: gold_insurance<tt>
accno: 2020201<tt>
amount: 189<tt>
bankcde: hdfc<tt>
payment: 605


3rd CELL A1:B3

Name: klhj<tt>
Product: gold_insurance<tt>
accno: 3030312<tt>
amount: 563<tt>
bankcde: kk<tt>
payment: 489
;

View solution in original post

5 REPLIES 5
Kurt_Bremser
Super User

See:

data want;
length
  Name $10
  Product $20
  Accno $6
;
retain
  Name Product Accno Amount Bonus Card_type CC_Accno CC_Amount_no
  Insurance_Type Insurance_No Insurance_Amt Bank_Cde Payment
;
infile datalines eof=done;
input;
select (scan(_infile_,1,':'));
  when ('Name') do;
    if name ne ''
    then do;
      output;
      call missing(
        Name,Product,Accno,Amount,Bonus,Card_type,CC_Accno,CC_Amount_no,
        Insurance_Type,Insurance_No,Insurance_Amt,Bank_Cde,Payment
      );
    end;
    name = scan(scan(_infile_,2,':'),1,'<');
  end;
  when ('Product') product = scan(scan(_infile_,2,':'),1,'<');
  when ('accno') accno = scan(scan(_infile_,2,':'),1,'<');
  when ('amount') amount = input(scan(scan(_infile_,2,':'),1,'<'),best.);
  otherwise;
end;
goto dsend;
done: output;
dsend:
datalines;
1ST CELL A1:A1

Name: abc<tt>
Product: loan<tt>
accno: 123456<tt>
amount: 389.56<tt>
bonus: 3000<tt>
Product: CCard_plat<tt>
accno: 5689<tt>
amount: 478.96<tt>
Product: gold_insurance<tt>
accno: 1010102<tt>
amount: 103<tt>
bankcde: ICI<tt>
payment: 3094

 

2nd CELL A1:B2

Name: vbsd<tt>
Product: CCard_plat<tt>
accno: 74123<tt>
amount: 56.21<tt>
Product: gold_insurance<tt>
accno: 2020201<tt>
amount: 189<tt>
bankcde: hdfc<tt>
payment: 605


3rd CELL A1:B3

Name: klhj<tt>
Product: gold_insurance<tt>
accno: 3030312<tt>
amount: 563<tt>
bankcde: kk<tt>
payment: 489
;
vickys
Obsidian | Level 7

Thank you, Kurt.

Really appreciate your input.

The data has been loaded for product, accno and amount, while  leaving other columns blank.

The prroduct gold_insurance was populated in all the three rows.

 

Thanks in advance

Kurt_Bremser
Super User

The code is just a proof of concept, and incomplete at that.

This now deals correctly with all data:

data want;
retain
  Name pr Product Accno Amount Bonus Card_type CC_Accno CC_Amount_no
  Insurance_Type Insurance_No Insurance_Amt Bank_Cde Payment
;
length
  Name $10
  pr Product Card_type Insurance_Type $20
  Accno CC_Accno Insurance_No $6
  Bank_Cde $10
;
infile datalines eof=done;
input;
select (scan(_infile_,1,':'));
  when ('Name') do;
    if name ne ''
    then do;
      output;
      call missing(
        Name,pr,Product,Accno,Amount,Bonus,Card_type,CC_Accno,CC_Amount_no,
        Insurance_Type,Insurance_No,Insurance_Amt,Bank_Cde,Payment
      );
    end;
    name = strip(scan(scan(_infile_,2,':'),1,'<'));
  end;
  when ('Product') do;
    pr = strip(scan(scan(_infile_,2,':'),1,'<'));
    select (pr);
      when ('loan') product = pr;
      when ('CCard_plat') card_type = pr;
      when ('gold_insurance') insurance_type = pr;
    end;
  end;
  when ('accno') select (pr);
    when ('loan') accno = strip(scan(scan(_infile_,2,':'),1,'<'));
    when ('CCard_plat') CC_Accno = strip(scan(scan(_infile_,2,':'),1,'<'));
    when ('gold_insurance') Insurance_No = strip(scan(scan(_infile_,2,':'),1,'<'));
  end;
  when ('amount') select (pr);
    when ('loan') amount = input(scan(scan(_infile_,2,':'),1,'<'),best.);
    when ('CCard_plat') CC_Amount_no = input(scan(scan(_infile_,2,':'),1,'<'),best.);
    when ('gold_insurance') Insurance_Amt = input(scan(scan(_infile_,2,':'),1,'<'),best.);
  end;
  when ('bonus') bonus = input(scan(scan(_infile_,2,':'),1,'<'),best.);
  when ('bankcde') Bank_Cde = strip(scan(scan(_infile_,2,':'),1,'<'));
  when ('payment') Payment = input(scan(_infile_,2,':'),best.);
  otherwise;
end;
goto dsend;
done: output;
dsend:
drop pr;
datalines;
1ST CELL A1:A1

Name: abc<tt>
Product: loan<tt>
accno: 123456<tt>
amount: 389.56<tt>
bonus: 3000<tt>
Product: CCard_plat<tt>
accno: 5689<tt>
amount: 478.96<tt>
Product: gold_insurance<tt>
accno: 1010102<tt>
amount: 103<tt>
bankcde: ICI<tt>
payment: 3094

 

2nd CELL A1:B2

Name: vbsd<tt>
Product: CCard_plat<tt>
accno: 74123<tt>
amount: 56.21<tt>
Product: gold_insurance<tt>
accno: 2020201<tt>
amount: 189<tt>
bankcde: hdfc<tt>
payment: 605


3rd CELL A1:B3

Name: klhj<tt>
Product: gold_insurance<tt>
accno: 3030312<tt>
amount: 563<tt>
bankcde: kk<tt>
payment: 489
;
vickys
Obsidian | Level 7

Many Thanks, Kurt.

Though I posted my requirement  slight bit different, but it was 95% same.

Really appreciate your inputs

Tom
Super User Tom
Super User

I think your attempt to describe the CSV file is adding confusion to your request.  Can't you just paste a couple of lines from the CSV into your question?  Use the Insert Code button (looks like {i} on the menu) to get a pop-up window to paste the text.  Make sure to open the CSV file with a text editor and not with Excel so that you can see the actual content and not have Excel mangle it into a spreadsheet.

 

So you have three columns in the CSV file?  And each one have these name-value pairs?  Is there any significance to the pair being in the 2nd column instead of the 1st or 3rd column?

 

It is probably going to be easiest to read it into a tall structure.  Perhaps something like:

ROW|COL|ORD|NAME|VALUE
1|1|1|Name|abc
1|1|2|Product|loan
...
1|2|1|Name|vbsd
1|2|2|Product|CCard_plat
...

So assuming each row has only three columns your code might look something like:

data want;
  length row col ord 8 name value $200 ;
  infile 'myfile.csv' dsd truncover ;
  length cell pair $32767 ;
  row+1;
  do cell=1 to 3 ;
    input cell @;
    cell = tranwrd(cell,'<tt>','FF'x);
    do ord=1 to countw(cell,'FF'x);
      pair=scan(cell,ord,'FF'x);
      name=scan(pair,1,':');
      value =left(scan(pair,2,':'));
      output;
    end;
  end;
  drop cell pair;
run;

Now you could use PROC TRANSPOSE with to create a wide dataset.  To use the value of NAME as the name of the variable to create use the ID statement in the PROC TRANSPOSE step.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 533 views
  • 0 likes
  • 3 in conversation