BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

I want to Import the CSV file via data step. As there are two Header lines, I want to don't want to consider the first header line and also I don't want to read the trailer line and the first field(HE). Following is the structure of CSV file.

 

Filename looks like ITR_GTS_MSTT_1_1_20210825T072410.csv. While reading the file, I don't want to mention the date time stamp in the file name. I just want to read like ITR_GTS_MSTT_1_1_*.csv. Is this possible?

 

Fields RTKKRS and VALUE_ORG_ID should be character.

IN	ITR	GTS	MSTT	1	1	20210825T072410	FULL	Job_ID
HE	RTKKRS	VALUE_ORG_ID
PO	11	5050
PO	12	5030
PO	13	5070
PO	69	5020
PO	163	5035
PO	169	6800
TR    9

Excepted Results:

 

RTKKRS	VALUE_ORG_ID
11	5050
12	5030
13	5070
69	5020
163	5035
169	6800

 

3 REPLIES 3
andreas_lds
Jade | Level 19

The file you have posted does not look like a csv-file. Please open the file with a text editor, not with excel.

The firstobs option in the infile statement can be used to skip header lines.

If you don't want a variable in the dataset, just drop it. There is no easy feasible way to skip reading the variable.

When a wildcard is used in the infile statement all files matching the name are processed.

ChrisNZ
Tourmaline | Level 20

Like this?

data WANT;
  infile "&wdir/t.txt" firstobs=3 end=LASTREC dlm='09'x;
  input @4 RTKKRS $3. VALUE_ORG_ID $4.;
  if ^LASTREC;
run;

 

Tom
Super User Tom
Super User

So that is not a CSV file. First I don't see any commas at all in there.  Second the structure is not rectangular.

First step is to understand your file structure.  I saved your posted lines a file and ran this data step so I could see what it contains.

data _null_;
  infile 'c:\downloads\sample.txt';
  input;
  list;
run;

Results:

23    data _null_;
24      infile 'c:\downloads\sample.txt';
25      input;
26      list;
27    run;

NOTE: The infile 'c:\downloads\sample.txt' is:
      Filename=c:\downloads\sample.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=154,
      Last Modified=29Aug2021:14:54:42,
      Create Time=29Aug2021:14:54:42

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

1   CHAR  IN.ITR.GTS.MSTT.1.1.20210825T072410.FULL.Job_ID 47
    ZONE  44045504550455503030333333335333333045440466544
    NUMR  9E994297439D34491919202108254072410965CC9AF2F94

2   CHAR  HE.RTKKRS.VALUE_ORG_ID 22
    ZONE  4405544550544545454544
    NUMR  85924BB23961C55FF27F94

3   CHAR  PO.11.5050 10
    ZONE  5403303333
    NUMR  0F91195050

4   CHAR  PO.12.5030 10
    ZONE  5403303333
    NUMR  0F91295030

5   CHAR  PO.13.5070 10
    ZONE  5403303333
    NUMR  0F91395070

6   CHAR  PO.69.5020 10
    ZONE  5403303333
    NUMR  0F96995020

7   CHAR  PO.163.5035 11
    ZONE  54033303333
    NUMR  0F916395035

8   CHAR  PO.169.6800 11
    ZONE  54033303333
    NUMR  0F916996800
9         TR    9 7
NOTE: 9 records were read from the infile 'c:\downloads\sample.txt'.
      The minimum record length was 7.
      The maximum record length was 47.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds

So most of the lines (all but the last) are using TAB between fields.

The first column seems to indicate the records type. "IN", "HE", "PO" or "TR    9"

Not sure what the IN record means but the HE record seems to have the field names (he=Header?).

The PO records seem to have the actual data.

 

If you want to read only this exact type of file (with RTKKRS and VALUE_ORG_ID variables) then you might use something like:

data want;
  infile 'myfile.txt' dsd dlm='09'x truncover;
  input type :$9. @;
  if type='PO' then input RTKKRS VALUE_ORG_ID;
  else delete;
run;

You can include a single wildcard, * , in the physical filename used in the INFILE statement.  

So if there is only one file than matches TR_GTS_MSTT_1_1_*.csv in your source directory then

infile '<mydirectory>TR_GTS_MSTT_1_1_*.csv' dsd dlm='09'x truncover;

should work.  If there are multiple files that match that pattern then you might need to make your program smart enough to either remember the filename or remember the values from the "IN" record.

 

If the files can have different fields (different HE records) then the program will need to be smarter.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 701 views
  • 0 likes
  • 4 in conversation