Hi Experts,
I want to read following records using column pointer and create dataset. Please help me.
file:
student_id,mark1,mark2,mark3
1A,58,52,
85,2B,78,
43,,1C,58,98,65,4A,
87,99,95,5D,45,,43,6S,
23,
45,,7F,32,56,45,
1D,
32,45,67,2C,45,67,33,6G,43
37,87
Required Output:
student_id | mark1 | mark2 | mark3 |
1A | 58 | 52 | 85 |
2B | 78 | 43 | 0 |
1C | 58 | 98 | 65 |
4A | 87 | 99 | 95 |
5D | 45 | 0 | 43 |
6S | 23 | 45 | 0 |
and so on.... for missing marks put 0.
please help to solve
Thanks in Advance
1. This look like badly imported data. Import it correctly rather than try to fix the mess.
2. Your example makes no sense as you have 4 variables and some lines have more than 4 values.
Hi,
Very ugly data buy try this:
data have;
infile cards dsd dlm="0ad0"x firstobs=2;
input text : $ 1024.;
text = tranwrd(text,",,",",0,");
i=1;
var = scan(text,i,",");
do while(var ne "");
output;
i+1;
var = scan(text,i,",");
end;
keep var;
cards;
student_id,mark1,mark2,mark3
1A,58,52,
85,2B,78,
43,,1C,58,98,65,4A,
87,99,95,5D,45,,43,6S,
23,
45,,7F,32,56,45,
1D,
32,45,67,2C,45,67,33,6G,43
37,87
;
run;
data want;
length student_id $ 10;
array M[2:4] mark1 mark2 mark3;
do _N_=1 to 4;
set have;
if _N_ = 1 then student_id = var;
else
do;
M[_N_] = input(var, ??best32.);
end;
end;
drop var;
run;
proc print;
run;
All the best
Bart
It looks like uneasy.
the last two obs, Do you miss a comma ',' ?
32,45,67,2C,45,67,33,6G,43
37,87
-->
32,45,67,2C,45,67,33,6G,43,
37,87
filename have temp;
data _null_;
input;
file have;
put _infile_;
cards;
student_id,mark1,mark2,mark3
1A,58,52,
85,2B,78,
43,,1C,58,98,65,4A,
87,99,95,5D,45,,43,6S,
23,
45,,7F,32,56,45,
1D,
32,45,67,2C,45,67,33,6G,43,
37,87
;
filename have2 temp;
data _null_;
infile have;
input;
file have2;
if _n_>1 then put _infile_;
run;
data want;
infile have2 dsd recfm=n ;
input (student_id mark1-mark3) ($);
run;
I saved this text:
1A,58,52, 85,2B,78, 43,,1C,58,98,65,4A, 87,99,95,5D,45,,43,6S, 23, 45,,7F,32,56,45, 1D, 32,45,67,2C,45,67,33,6G,43, 37,87
(note the comma at the end of the next-to-last line, this is missing from your text and would comlicate things quite a lot)
to a text file and ran this code:
data want;
infile '/folders/myfolders/messy.txt' end=done;
length line $100 student $2;
retain line;
do while (countw(line,',','m') <= 4 and not done);
input;
line = cats(line,_infile_);
end;
do while (countw(line,',','m') >= 4);
student = scan(line,1,',','m');
score1 = input(scan(line,2,',','m'),best.);
score2 = input(scan(line,3,',','m'),best.);
score3 = input(scan(line,4,',','m'),best.);
call scan(line,4,pos,len,',','m');
if pos + len le lengthc(line)
then line = substr(line,pos+len+1);
else line = "";
output;
end;
if done then stop;
keep student score:;
run;
resulting in this dataset:
1 1A 58 52 85 2 2B 78 43 . 3 1C 58 98 65 4 4A 87 99 95 5 5D 45 . 43 6 6S 23 45 . 7 7F 32 56 45 8 1D 32 45 67 9 2C 45 67 33 10 6G 43 37 87
Since this data looks extremely familiar, have you determined where the extra line feeds came from yet?
Column (and row) pointer inputs is used when you know ahead of time which column and/or row a value will appear.
This does not appear to be the case with this data.
If this were my project I would trace the history of the file back to the idiot that managed to insert a whole bunch of linefeed/carriage control (which ever) characters and beat them severely with wet noodles. Then try to find a version without the irregular line behavior.
Note: train people very carefully when doing data entry not to randomly use the Enter, Tab or other keys.
Looking back, I see that you posted the exact same question in https://communities.sas.com/t5/SAS-Programming/Read-nonstandard-csv-files/m-p/645923 and accepted a solution there.
What did not work out with that solution, so that you posted this again, or were you just a little bored?
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.