I want to Import the CSV file via data step. As there are two Header lines, I want to don't want to consider the first header line and also I don't want to read the trailer line and the first field(HE). Following is the structure of CSV file.
Filename looks like ITR_GTS_MSTT_1_1_20210825T072410.csv. While reading the file, I don't want to mention the date time stamp in the file name. I just want to read like ITR_GTS_MSTT_1_1_*.csv. Is this possible?
Fields RTKKRS and VALUE_ORG_ID should be character.
IN ITR GTS MSTT 1 1 20210825T072410 FULL Job_ID HE RTKKRS VALUE_ORG_ID PO 11 5050 PO 12 5030 PO 13 5070 PO 69 5020 PO 163 5035 PO 169 6800 TR 9
Excepted Results:
RTKKRS VALUE_ORG_ID 11 5050 12 5030 13 5070 69 5020 163 5035 169 6800
The file you have posted does not look like a csv-file. Please open the file with a text editor, not with excel.
The firstobs option in the infile statement can be used to skip header lines.
If you don't want a variable in the dataset, just drop it. There is no easy feasible way to skip reading the variable.
When a wildcard is used in the infile statement all files matching the name are processed.
Like this?
data WANT;
infile "&wdir/t.txt" firstobs=3 end=LASTREC dlm='09'x;
input @4 RTKKRS $3. VALUE_ORG_ID $4.;
if ^LASTREC;
run;
So that is not a CSV file. First I don't see any commas at all in there. Second the structure is not rectangular.
First step is to understand your file structure. I saved your posted lines a file and ran this data step so I could see what it contains.
data _null_;
infile 'c:\downloads\sample.txt';
input;
list;
run;
Results:
23 data _null_; 24 infile 'c:\downloads\sample.txt'; 25 input; 26 list; 27 run; NOTE: The infile 'c:\downloads\sample.txt' is: Filename=c:\downloads\sample.txt, RECFM=V,LRECL=32767,File Size (bytes)=154, Last Modified=29Aug2021:14:54:42, Create Time=29Aug2021:14:54:42 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 1 CHAR IN.ITR.GTS.MSTT.1.1.20210825T072410.FULL.Job_ID 47 ZONE 44045504550455503030333333335333333045440466544 NUMR 9E994297439D34491919202108254072410965CC9AF2F94 2 CHAR HE.RTKKRS.VALUE_ORG_ID 22 ZONE 4405544550544545454544 NUMR 85924BB23961C55FF27F94 3 CHAR PO.11.5050 10 ZONE 5403303333 NUMR 0F91195050 4 CHAR PO.12.5030 10 ZONE 5403303333 NUMR 0F91295030 5 CHAR PO.13.5070 10 ZONE 5403303333 NUMR 0F91395070 6 CHAR PO.69.5020 10 ZONE 5403303333 NUMR 0F96995020 7 CHAR PO.163.5035 11 ZONE 54033303333 NUMR 0F916395035 8 CHAR PO.169.6800 11 ZONE 54033303333 NUMR 0F916996800 9 TR 9 7 NOTE: 9 records were read from the infile 'c:\downloads\sample.txt'. The minimum record length was 7. The maximum record length was 47. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.00 seconds
So most of the lines (all but the last) are using TAB between fields.
The first column seems to indicate the records type. "IN", "HE", "PO" or "TR 9"
Not sure what the IN record means but the HE record seems to have the field names (he=Header?).
The PO records seem to have the actual data.
If you want to read only this exact type of file (with RTKKRS and VALUE_ORG_ID variables) then you might use something like:
data want;
infile 'myfile.txt' dsd dlm='09'x truncover;
input type :$9. @;
if type='PO' then input RTKKRS VALUE_ORG_ID;
else delete;
run;
You can include a single wildcard, * , in the physical filename used in the INFILE statement.
So if there is only one file than matches TR_GTS_MSTT_1_1_*.csv in your source directory then
infile '<mydirectory>TR_GTS_MSTT_1_1_*.csv' dsd dlm='09'x truncover;
should work. If there are multiple files that match that pattern then you might need to make your program smart enough to either remember the filename or remember the values from the "IN" record.
If the files can have different fields (different HE records) then the program will need to be smarter.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.