Hi guru's Could some one help me to code one of our requirement. I tried my not iterative and have some challanne as I am new.
I have this data in CSV file below are the cell numbers i.e data in one column and three rows. There are some thousands of rows.
I have to remove that delimiter <tt> and load the data to corresponding fileds. Attaced is the input file but in notepad
1ST CELL A1:A1
Name: abc<tt>
Product: loan<tt>
accno: 123456<tt>
amount: 389.56<tt>
bonus: 3000<tt>
Product: CCard_plat<tt>
accno: 5689<tt>
amount: 478.96<tt>
Product: gold_insurance<tt>
accno: 1010102<tt>
amount: 103<tt>
bankcde: ICI<tt>
payment: 3094
2nd CELL A1:B2
Name: vbsd<tt>
Product: CCard_plat<tt>
accno: 74123<tt>
amount: 56.21<tt>
Product: gold_insurance<tt>
accno: 2020201<tt>
amount: 189<tt>
bankcde: hdfc<tt>
payment: 605
3rd CELL A1:B3
Name: klhj<tt>
Product: gold_insurance<tt>
accno: 3030312<tt>
amount: 563<tt>
bankcde: kk<tt>
payment: 489
I am expecting the output of the data as below: Where there is no data for any of the product then it should load with blank and fill the remaining fields.
Name Product Accno Amount Bonus Card_type CC_Accno CC_Amount_no Insurance_Type Insurance_No Insurance_Amt Bank_Cde Payment
abc loan 123456 389.56 3000 ccard_plat 5689 478.96 gold_insurance 1010102 103 ICI 3094
vbsd ccard_plat 74123 56.21 gold_insurance 2020201 189 hdfc 605
klhj gold_insurance 3030312 563 kk 489
Really appreciate your inputs.
Thanks in advance.
The code is just a proof of concept, and incomplete at that.
This now deals correctly with all data:
data want;
retain
Name pr Product Accno Amount Bonus Card_type CC_Accno CC_Amount_no
Insurance_Type Insurance_No Insurance_Amt Bank_Cde Payment
;
length
Name $10
pr Product Card_type Insurance_Type $20
Accno CC_Accno Insurance_No $6
Bank_Cde $10
;
infile datalines eof=done;
input;
select (scan(_infile_,1,':'));
when ('Name') do;
if name ne ''
then do;
output;
call missing(
Name,pr,Product,Accno,Amount,Bonus,Card_type,CC_Accno,CC_Amount_no,
Insurance_Type,Insurance_No,Insurance_Amt,Bank_Cde,Payment
);
end;
name = strip(scan(scan(_infile_,2,':'),1,'<'));
end;
when ('Product') do;
pr = strip(scan(scan(_infile_,2,':'),1,'<'));
select (pr);
when ('loan') product = pr;
when ('CCard_plat') card_type = pr;
when ('gold_insurance') insurance_type = pr;
end;
end;
when ('accno') select (pr);
when ('loan') accno = strip(scan(scan(_infile_,2,':'),1,'<'));
when ('CCard_plat') CC_Accno = strip(scan(scan(_infile_,2,':'),1,'<'));
when ('gold_insurance') Insurance_No = strip(scan(scan(_infile_,2,':'),1,'<'));
end;
when ('amount') select (pr);
when ('loan') amount = input(scan(scan(_infile_,2,':'),1,'<'),best.);
when ('CCard_plat') CC_Amount_no = input(scan(scan(_infile_,2,':'),1,'<'),best.);
when ('gold_insurance') Insurance_Amt = input(scan(scan(_infile_,2,':'),1,'<'),best.);
end;
when ('bonus') bonus = input(scan(scan(_infile_,2,':'),1,'<'),best.);
when ('bankcde') Bank_Cde = strip(scan(scan(_infile_,2,':'),1,'<'));
when ('payment') Payment = input(scan(_infile_,2,':'),best.);
otherwise;
end;
goto dsend;
done: output;
dsend:
drop pr;
datalines;
1ST CELL A1:A1
Name: abc<tt>
Product: loan<tt>
accno: 123456<tt>
amount: 389.56<tt>
bonus: 3000<tt>
Product: CCard_plat<tt>
accno: 5689<tt>
amount: 478.96<tt>
Product: gold_insurance<tt>
accno: 1010102<tt>
amount: 103<tt>
bankcde: ICI<tt>
payment: 3094
2nd CELL A1:B2
Name: vbsd<tt>
Product: CCard_plat<tt>
accno: 74123<tt>
amount: 56.21<tt>
Product: gold_insurance<tt>
accno: 2020201<tt>
amount: 189<tt>
bankcde: hdfc<tt>
payment: 605
3rd CELL A1:B3
Name: klhj<tt>
Product: gold_insurance<tt>
accno: 3030312<tt>
amount: 563<tt>
bankcde: kk<tt>
payment: 489
;
See:
data want;
length
Name $10
Product $20
Accno $6
;
retain
Name Product Accno Amount Bonus Card_type CC_Accno CC_Amount_no
Insurance_Type Insurance_No Insurance_Amt Bank_Cde Payment
;
infile datalines eof=done;
input;
select (scan(_infile_,1,':'));
when ('Name') do;
if name ne ''
then do;
output;
call missing(
Name,Product,Accno,Amount,Bonus,Card_type,CC_Accno,CC_Amount_no,
Insurance_Type,Insurance_No,Insurance_Amt,Bank_Cde,Payment
);
end;
name = scan(scan(_infile_,2,':'),1,'<');
end;
when ('Product') product = scan(scan(_infile_,2,':'),1,'<');
when ('accno') accno = scan(scan(_infile_,2,':'),1,'<');
when ('amount') amount = input(scan(scan(_infile_,2,':'),1,'<'),best.);
otherwise;
end;
goto dsend;
done: output;
dsend:
datalines;
1ST CELL A1:A1
Name: abc<tt>
Product: loan<tt>
accno: 123456<tt>
amount: 389.56<tt>
bonus: 3000<tt>
Product: CCard_plat<tt>
accno: 5689<tt>
amount: 478.96<tt>
Product: gold_insurance<tt>
accno: 1010102<tt>
amount: 103<tt>
bankcde: ICI<tt>
payment: 3094
2nd CELL A1:B2
Name: vbsd<tt>
Product: CCard_plat<tt>
accno: 74123<tt>
amount: 56.21<tt>
Product: gold_insurance<tt>
accno: 2020201<tt>
amount: 189<tt>
bankcde: hdfc<tt>
payment: 605
3rd CELL A1:B3
Name: klhj<tt>
Product: gold_insurance<tt>
accno: 3030312<tt>
amount: 563<tt>
bankcde: kk<tt>
payment: 489
;
Thank you, Kurt.
Really appreciate your input.
The data has been loaded for product, accno and amount, while leaving other columns blank.
The prroduct gold_insurance was populated in all the three rows.
Thanks in advance
The code is just a proof of concept, and incomplete at that.
This now deals correctly with all data:
data want;
retain
Name pr Product Accno Amount Bonus Card_type CC_Accno CC_Amount_no
Insurance_Type Insurance_No Insurance_Amt Bank_Cde Payment
;
length
Name $10
pr Product Card_type Insurance_Type $20
Accno CC_Accno Insurance_No $6
Bank_Cde $10
;
infile datalines eof=done;
input;
select (scan(_infile_,1,':'));
when ('Name') do;
if name ne ''
then do;
output;
call missing(
Name,pr,Product,Accno,Amount,Bonus,Card_type,CC_Accno,CC_Amount_no,
Insurance_Type,Insurance_No,Insurance_Amt,Bank_Cde,Payment
);
end;
name = strip(scan(scan(_infile_,2,':'),1,'<'));
end;
when ('Product') do;
pr = strip(scan(scan(_infile_,2,':'),1,'<'));
select (pr);
when ('loan') product = pr;
when ('CCard_plat') card_type = pr;
when ('gold_insurance') insurance_type = pr;
end;
end;
when ('accno') select (pr);
when ('loan') accno = strip(scan(scan(_infile_,2,':'),1,'<'));
when ('CCard_plat') CC_Accno = strip(scan(scan(_infile_,2,':'),1,'<'));
when ('gold_insurance') Insurance_No = strip(scan(scan(_infile_,2,':'),1,'<'));
end;
when ('amount') select (pr);
when ('loan') amount = input(scan(scan(_infile_,2,':'),1,'<'),best.);
when ('CCard_plat') CC_Amount_no = input(scan(scan(_infile_,2,':'),1,'<'),best.);
when ('gold_insurance') Insurance_Amt = input(scan(scan(_infile_,2,':'),1,'<'),best.);
end;
when ('bonus') bonus = input(scan(scan(_infile_,2,':'),1,'<'),best.);
when ('bankcde') Bank_Cde = strip(scan(scan(_infile_,2,':'),1,'<'));
when ('payment') Payment = input(scan(_infile_,2,':'),best.);
otherwise;
end;
goto dsend;
done: output;
dsend:
drop pr;
datalines;
1ST CELL A1:A1
Name: abc<tt>
Product: loan<tt>
accno: 123456<tt>
amount: 389.56<tt>
bonus: 3000<tt>
Product: CCard_plat<tt>
accno: 5689<tt>
amount: 478.96<tt>
Product: gold_insurance<tt>
accno: 1010102<tt>
amount: 103<tt>
bankcde: ICI<tt>
payment: 3094
2nd CELL A1:B2
Name: vbsd<tt>
Product: CCard_plat<tt>
accno: 74123<tt>
amount: 56.21<tt>
Product: gold_insurance<tt>
accno: 2020201<tt>
amount: 189<tt>
bankcde: hdfc<tt>
payment: 605
3rd CELL A1:B3
Name: klhj<tt>
Product: gold_insurance<tt>
accno: 3030312<tt>
amount: 563<tt>
bankcde: kk<tt>
payment: 489
;
Many Thanks, Kurt.
Though I posted my requirement slight bit different, but it was 95% same.
Really appreciate your inputs
I think your attempt to describe the CSV file is adding confusion to your request. Can't you just paste a couple of lines from the CSV into your question? Use the Insert Code button (looks like {i} on the menu) to get a pop-up window to paste the text. Make sure to open the CSV file with a text editor and not with Excel so that you can see the actual content and not have Excel mangle it into a spreadsheet.
So you have three columns in the CSV file? And each one have these name-value pairs? Is there any significance to the pair being in the 2nd column instead of the 1st or 3rd column?
It is probably going to be easiest to read it into a tall structure. Perhaps something like:
ROW|COL|ORD|NAME|VALUE 1|1|1|Name|abc 1|1|2|Product|loan ... 1|2|1|Name|vbsd 1|2|2|Product|CCard_plat ...
So assuming each row has only three columns your code might look something like:
data want;
length row col ord 8 name value $200 ;
infile 'myfile.csv' dsd truncover ;
length cell pair $32767 ;
row+1;
do cell=1 to 3 ;
input cell @;
cell = tranwrd(cell,'<tt>','FF'x);
do ord=1 to countw(cell,'FF'x);
pair=scan(cell,ord,'FF'x);
name=scan(pair,1,':');
value =left(scan(pair,2,':'));
output;
end;
end;
drop cell pair;
run;
Now you could use PROC TRANSPOSE with to create a wide dataset. To use the value of NAME as the name of the variable to create use the ID statement in the PROC TRANSPOSE step.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.