I have delimited data with multiple obs per record. Plus, the number of records may vary from 3 to 8. The unique identifier only appears once per record. I want the output to read as such:
T00000000 Tom Bryson Freshman ENG 101 MAT 110
Here's a sample of the csv file
T00000000, Tom, Bryson, Freshman,
ENG 101,
MAT 110,
T00000001, Gregg, Smith, Senior,
ACC 101,
ENG 101,
MAT 110,
MAT 120,
T00000002, Roger Sparks, Junior,
ART 101
That is easy, you just need to read enough to figure out whether the line has the student information or the course name.
Note that your sample was poorly formed. I added a comma between first and last names for Roger Sparks. If you real data is that inconsistent then your program will need to be more complex.
data want ;
infile cards dsd truncover ;
length id $20 fname lname $25 class $10 course $10 dummy $10;
retain id fname lname class ;
input course dummy @1 @ ;
drop dummy ;
if dummy ne ' ' then input id fname lname class ;
else output;
cards;
T00000000, Tom, Bryson, Freshman,
ENG 101,
MAT 110,
T00000001, Gregg, Smith, Senior,
ACC 101,
ENG 101,
MAT 110,
MAT 120,
T00000002, Roger, Sparks, Junior,
ART 101
run;
proc print; run;
That will generate one observation per course, which is probably what you really want in order to be able to do anything with it. But if you really want to create multiple course variables so that you can get down to one observation per student then you could just add a PROC TRANSPOSE step to do that for you.
Does the first line actually vary in the number of elements as shown by your third record with Roger Sparks or was that just a typo in the formum and should read: Roger, Sparks ?
What do you want your output to look like?
I recommend giving a read to Bruno's response in
https://communities.sas.com/message/179168#179168
There is a lot of information as to how to handle reading multiline inputs with some dummy variables and whatnot.
I would opt for a totally different approach of reading the text as a giant stream and reformating it using PRX to match T\d{8} as a delimiter when parsing (basically reading a text file and outputing a new text file with appropriate format to read) and then read it as a normal csv, by line with enough dummy variables for the classes each student follows.
If you plan to do further analyses on the data, I'd think that you would be better off with a different resulting output file, namely multiple records for each id. The following approach first produces that kind of file and then transposes it into the kind of output file you indicated that you wanted:
filename have 'c:\art\have.csv';
data need;
infile have;
retain id first last grade;
input;
if substr(_infile_,1,1) eq 'T' then do;
id=scan(_infile_,1,',');
first=scan(_infile_,2,',');
last=scan(_infile_,3,',');
grade=scan(_infile_,4,',');
end;
else do;
course=compress(strip(_infile_),',');
output;
end;
run;
proc transpose data=need out=want (drop=_:) prefix=course;
var course;
by id first last grade notsorted;
run;
That is easy, you just need to read enough to figure out whether the line has the student information or the course name.
Note that your sample was poorly formed. I added a comma between first and last names for Roger Sparks. If you real data is that inconsistent then your program will need to be more complex.
data want ;
infile cards dsd truncover ;
length id $20 fname lname $25 class $10 course $10 dummy $10;
retain id fname lname class ;
input course dummy @1 @ ;
drop dummy ;
if dummy ne ' ' then input id fname lname class ;
else output;
cards;
T00000000, Tom, Bryson, Freshman,
ENG 101,
MAT 110,
T00000001, Gregg, Smith, Senior,
ACC 101,
ENG 101,
MAT 110,
MAT 120,
T00000002, Roger, Sparks, Junior,
ART 101
run;
proc print; run;
That will generate one observation per course, which is probably what you really want in order to be able to do anything with it. But if you really want to create multiple course variables so that you can get down to one observation per student then you could just add a PROC TRANSPOSE step to do that for you.
This one worked. Thank you. I haven't tried the other replies yet.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.