BookmarkSubscribeRSS Feed
Navnath_Gajare
Calcite | Level 5

Hi Experts,

I want to read following records using column pointer and create dataset. Please help me.

file:

student_id,mark1,mark2,mark3
1A,58,52,
85,2B,78,
43,,1C,58,98,65,4A,
87,99,95,5D,45,,43,6S,
23,
45,,7F,32,56,45,
1D,
32,45,67,2C,45,67,33,6G,43
37,87

 

 

Required Output:

student_idmark1mark2mark3
1A585285
2B78430
1C589865
4A879995
5D45043
6S23450

and so on.... for missing marks put 0.

please help to solve

Thanks in Advance

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

1. This look like badly imported data. Import it correctly rather than try to fix the mess.

2. Your example makes no sense as you have 4 variables and some lines have more than 4 values.

 

yabwon
Amethyst | Level 16

Hi,

Very ugly data buy try this:

data have;
  infile cards dsd dlm="0ad0"x firstobs=2;
  input text : $ 1024.;
  text = tranwrd(text,",,",",0,");
    i=1;
    var = scan(text,i,",");
    do while(var ne "");
      output;
      i+1;
      var = scan(text,i,",");
    end;
  keep var;
cards;
student_id,mark1,mark2,mark3
1A,58,52,
85,2B,78,
43,,1C,58,98,65,4A,
87,99,95,5D,45,,43,6S,
23,
45,,7F,32,56,45,
1D,
32,45,67,2C,45,67,33,6G,43
37,87
;
run;

data want;
  length student_id $ 10;
  array M[2:4] mark1 mark2 mark3;
  do _N_=1 to 4;
    set have;
    if _N_ = 1 then student_id = var;
               else
                do;
                  M[_N_] = input(var, ??best32.);
                end;
  end;
  drop var;
run;
proc print;
run;

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Astounding
PROC Star
On the INFILE statement, be sure to use the DSD option. Then:

input student_id $ mark1 mark2 mark3 @@ ;

If the file actually contains the variable names on the first line also add the option FIRSTOBS=2 to the INFILE statement.
yabwon
Amethyst | Level 16
I think that DSD isn't any help here. With DSD you will handle double comma (`,,`) but at the same time that comma which sometime is at the end of lines will mess the data.

Bart
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ksharp
Super User

It looks like uneasy.

the last two obs, Do you miss a comma ',' ?

32,45,67,2C,45,67,33,6G,43
37,87

-->

32,45,67,2C,45,67,33,6G,43,
37,87

 

 

filename have temp;
data _null_;
input;
file have;
put _infile_;
cards;
student_id,mark1,mark2,mark3
1A,58,52,
85,2B,78,
43,,1C,58,98,65,4A,
87,99,95,5D,45,,43,6S,
23,
45,,7F,32,56,45,
1D,
32,45,67,2C,45,67,33,6G,43,
37,87
;






filename have2 temp;
data _null_;
 infile have;
 input;
 file have2;
 if _n_>1 then put _infile_;
run;
data want;
infile have2 dsd recfm=n ;
input (student_id mark1-mark3) ($);
run;

Kurt_Bremser
Super User

I saved this text:

1A,58,52,
85,2B,78,
43,,1C,58,98,65,4A,
87,99,95,5D,45,,43,6S,
23,
45,,7F,32,56,45,
1D,
32,45,67,2C,45,67,33,6G,43,
37,87

(note the comma at the end of the next-to-last line, this is missing from your text and would comlicate things quite a lot)

to a text file and ran this code:

data want;
infile '/folders/myfolders/messy.txt' end=done;
length line $100 student $2;
retain line;
do while (countw(line,',','m') <= 4 and not done);
  input;
  line = cats(line,_infile_);
end;
do while (countw(line,',','m') >= 4);
  student = scan(line,1,',','m');
  score1 = input(scan(line,2,',','m'),best.);
  score2 = input(scan(line,3,',','m'),best.);
  score3 = input(scan(line,4,',','m'),best.);
  call scan(line,4,pos,len,',','m');
  if pos + len le lengthc(line)
  then line = substr(line,pos+len+1);
  else line = "";
  output;
end;
if done then stop;
keep student score:;
run;

resulting in this dataset:

1	1A	58	52	85	
2	2B	78	43	.	
3	1C	58	98	65	
4	4A	87	99	95	
5	5D	45	.	43	
6	6S	23	45	.	
7	7F	32	56	45	
8	1D	32	45	67	
9	2C	45	67	33	
10	6G	43	37	87	
ballardw
Super User

Since this data looks extremely familiar, have you determined where the extra line feeds came from yet?

 

Column (and row) pointer inputs is used when you know ahead of time which column and/or row a value will appear.

This does not appear to be the case with this data.

 

If this were my project I would trace the history of the file back to the idiot that managed to insert a whole bunch of linefeed/carriage control (which ever) characters and beat them severely with wet noodles. Then try to find a version without the irregular line behavior.

 

Note: train people very carefully when doing data entry not to randomly use the Enter, Tab or other keys.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2389 views
  • 1 like
  • 7 in conversation