Help using Base SAS procedures

reading multiple records per obs from a csv file

Accepted Solution Solved
Reply
Super Contributor
Posts: 268
Accepted Solution

reading multiple records per obs from a csv file

I have delimited data with multiple obs per record.  Plus, the number of records may vary from 3 to 8.  The unique identifier only appears once per record.  I want the output to read as such:

T00000000 Tom Bryson  Freshman ENG 101 MAT 110

Here's a sample of the csv file

T00000000, Tom, Bryson, Freshman,

                    ENG 101,

                  MAT 110,

T00000001, Gregg, Smith, Senior,

                  ACC 101,

                  ENG  101,

                  MAT 110,

                  MAT 120,

T00000002, Roger Sparks, Junior,

                  ART  101


Accepted Solutions
Solution
‎09-06-2013 01:31 PM
Super User
Super User
Posts: 6,500

Re: reading multiple records per obs from a csv file

That is easy, you just need to read enough to figure out whether the line has the student information or the course name.

Note that your sample was poorly formed.  I added a comma between first and last names for Roger Sparks.  If you real data is that inconsistent then your program will need to be more complex.

data want ;

  infile cards dsd truncover ;

  length id $20 fname lname $25 class $10 course $10 dummy $10;

  retain id fname lname class ;

  input course dummy @1 @ ;

  drop dummy ;

  if dummy ne ' ' then input id fname lname class ;

  else output;

cards;

T00000000, Tom, Bryson, Freshman,

                    ENG 101,

                  MAT 110,

T00000001, Gregg, Smith, Senior,

                  ACC 101,

                  ENG  101,

                  MAT 110,

                  MAT 120,

T00000002, Roger, Sparks, Junior,

                  ART  101

run;

proc print; run;

That will generate one observation per course, which is probably what you really want in order to be able to do anything with it.  But if you really want to create multiple course variables so that you can get down to one observation per student then you could just add a PROC TRANSPOSE step to do that for you.

View solution in original post


All Replies
Super User
Posts: 10,500

Re: reading multiple records per obs from a csv file

Does the first line actually vary in the number of elements as shown by your third record with Roger Sparks or was that just a typo in the formum and should read: Roger, Sparks ?

What do you want your output to look like?


Super Contributor
Posts: 339

Re: reading multiple records per obs from a csv file

I recommend giving a read to Bruno's response in

https://communities.sas.com/message/179168#179168

There is a lot of information as to how to handle reading multiline inputs with some dummy variables and whatnot.

I would opt for a totally different approach of reading the text as a giant stream and reformating it using PRX to match T\d{8} as a delimiter when parsing (basically reading a text file and outputing a new text file with appropriate format to read) and then read it as a normal csv, by line with enough dummy variables for the classes each student follows.

PROC Star
Posts: 7,363

Re: reading multiple records per obs from a csv file

If you plan to do further analyses on the data, I'd think that you would be better off with a different resulting output file, namely multiple records for each id. The following approach first produces that kind of file and then transposes it into the kind of output file you indicated that you wanted:

filename have 'c:\art\have.csv';

data need;

  infile have;

  retain id first last grade;

  input;

  if substr(_infile_,1,1) eq 'T' then do;

    id=scan(_infile_,1,',');

    first=scan(_infile_,2,',');

    last=scan(_infile_,3,',');

    grade=scan(_infile_,4,',');

  end;

  else do;

    course=compress(strip(_infile_),',');

    output;

  end;

run;

proc transpose data=need out=want (drop=_Smiley Happy prefix=course;

  var course;

  by id first last grade notsorted;

run;

Solution
‎09-06-2013 01:31 PM
Super User
Super User
Posts: 6,500

Re: reading multiple records per obs from a csv file

That is easy, you just need to read enough to figure out whether the line has the student information or the course name.

Note that your sample was poorly formed.  I added a comma between first and last names for Roger Sparks.  If you real data is that inconsistent then your program will need to be more complex.

data want ;

  infile cards dsd truncover ;

  length id $20 fname lname $25 class $10 course $10 dummy $10;

  retain id fname lname class ;

  input course dummy @1 @ ;

  drop dummy ;

  if dummy ne ' ' then input id fname lname class ;

  else output;

cards;

T00000000, Tom, Bryson, Freshman,

                    ENG 101,

                  MAT 110,

T00000001, Gregg, Smith, Senior,

                  ACC 101,

                  ENG  101,

                  MAT 110,

                  MAT 120,

T00000002, Roger, Sparks, Junior,

                  ART  101

run;

proc print; run;

That will generate one observation per course, which is probably what you really want in order to be able to do anything with it.  But if you really want to create multiple course variables so that you can get down to one observation per student then you could just add a PROC TRANSPOSE step to do that for you.

Super Contributor
Posts: 268

Re: reading multiple records per obs from a csv file

This one worked.  Thank you.  I haven't tried the other replies yet.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 425 views
  • 2 likes
  • 5 in conversation