Solved: Re: Error in pulling over column names using proc import multiple csv ...

Chelsea8 · Posted 03-20-2024 12:46 PM

I am pulling 11 csv files over from my local c: and the first row is not becoming the sas column names. The first field name is 'sep=' and the following field names are 'var1, var2, var3....etc'. I've tried using 'getnames' and the column names are still sep=, VAR1, VAR2...ect. Also the log shows informat as sep=n, VAR1, VAR2...etc. I've attached a test file as an example of my data. I'm not sure how to get SAS to grab the first row and make them column names.

Part of the log:

informat "sep="N $12. ;
678 informat VAR2 $94. ;
679 informat VAR3 $16. ;
680 informat VAR4 $11. ;
681 informat VAR5 $11. ;
682 informat VAR6 $11. ;
683 informat VAR7 $11. ;

Import code:

%macro drive(dir,ext);

%local cnt filrf rc did memcnt name;

%let cnt=0;

%let filrf=mydir;

%let rc=%sysfunc(filename(filrf,&dir));

%let did=%sysfunc(dopen(&filrf));

%if &did ne 0 %then %do;

%let memcnt=%sysfunc(dnum(&did));

%do i=1 %to &memcnt;

%let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);

%if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;

%if %superq(ext) = %superq(name) %then %do;

%let cnt=%eval(&cnt+1);

%put %qsysfunc(dread(&did,&i));

proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt

dbms=csv replace;

getnames=yes;

run;

%end;

%else %put &dir cannot be opened.;

%let rc=%sysfunc(dclose(&did));

%mend drive;

%drive(C:\Users\Chelsea.H\Desktop\QSS_2022,csv);

ballardw · Posted 03-20-2024 02:53 PM

Perhaps this gives you an alternate:

data example;
   fname='C:\Users\Chelsea.H\Desktop\QRR_2022\2022_*.csv';
   length QHP $ 4.;
   qhp = scan(fname,-1,'\');
run;

Scan doesn't mind negatives as long you don't exceed the number of substrings separated by the provided delimiter (at which point you get missing values). -1 asks for the last "word" where words in this string are separated by the \ character only.

View solution in original post

Tom · Posted 03-20-2024 01:03 PM

The file you showed does not have a variable named SEP=.

Did you try to read in your SAS program as if it was a CSV file? That might have SEP= in it.

1    data _null_;
2      infile 'c:\downloads\testsas.csv' obs=5;
3      input;
4      list;
5    run;

NOTE: A byte-order mark in the file "c:\downloads\testsas.csv" (for fileref "#LN00102") indicates that the data is encoded in
      "utf-8".  This encoding will be used to process the file.
NOTE: The infile 'c:\downloads\testsas.csv' is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1         Identifier,HierarchyComponent,ComponentScore,CutPoint1,CutPoint2,CutPoint3,CutPoint4,Rating,MeasureS
     101  core,RawValue,Denominator,EligiblePopulation,Percentile5th,Percentile10th,Percentile25th,Percentile5
     201  0th,Percentile75th,Percentile90th,Percentile95th,Mean,StdDeviation,Min,Max,MeasureBenchmark 291
2         GLOBAL,Global,56.87,60,70,80,90,2,,,,,,,,,,,,,,,, 49
3         S1,SI: Clinical Quality Management,14.987,60,70,80,90,3,,,,,,,,,,,,,,,, 71
4         S1M49,M: Asthma Medication Ratio,,,,,,,64.333,0.99,354,654,0.688,0.72,0.768,0.845,0.892,0.915,0.945,
     101  0.827,0.084,0.57,1,0.925 124
5         S1M49a,MI: Asthma Medication Ratio (5-11),,,,,,,,0.75,8,8,,,,,,,,,,,, 69
NOTE: 5 records were read from the infile (system-specific pathname).
      The minimum record length was 49.
      The maximum record length was 291.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Note that if you are reading many CSV files that all have the same variables in the same order then just read them all in one data step. You can use the FILENAME= option on the INFILE statement to create a variable with the name of the current file being read.

So if all of the files look like the one you shared the program might look like this:

data fromcsv;
  infile 'C:\Users\Chelsea.H\Desktop\QSS_*.csv' dlm=',' dsd truncover filename=fname;
  length Identifier $6 HierarchyComponent $35 ComponentScore 8 CutPoint1 8
    CutPoint2 8 CutPoint3 8 CutPoint4 8 Rating 8 MeasureScore 8 RawValue 8
    Denominator 8 EligiblePopulation 8 Percentile5th 8 Percentile10th 8
    Percentile25th 8 Percentile50th 8 Percentile75th 8 Percentile90th 8
    Percentile95th 8 Mean 8 StdDeviation 8 Min 8 Max 8 MeasureBenchmark 8
  ;
  input @;
  if fname ne lag(fname) then delete;
  input Identifier -- MeasureBenchmark ;
run;

Chelsea8 · Posted 03-20-2024 02:46 PM

This works great. Thank you.

I am unable to grab the file name to create a column however. I need the 4 characters to the left of underscrore here: \2022_ (where the * is). When I input positive number it brings back error: (substr function call has too many arguments). When it is negative I either get blank or 'User' with smaller negative numbers.

data fromcsv;
infile 'C:\Users\Chelsea.H\Desktop\QRR_2022\2022_*.csv' dlm=',' dsd truncover filename=fname;
length Identifier $6 HierarchyComponent $35 ComponentScore 8 CutPoint1 8
CutPoint2 8 CutPoint3 8 CutPoint4 8 Rating 8 MeasureScore 8 RawValue 8
Denominator 8 EligiblePopulation 8 Percentile5th 8 Percentile10th 8
Percentile25th 8 Percentile50th 8 Percentile75th 8 Percentile90th 8
Percentile95th 8 Mean 8 StdDeviation 8 Min 8 Max 8 MeasureBenchmark 8
;
input @;
if fname ne lag(fname) then delete;
input Identifier -- MeasureBenchmark ;
QHP= substr(fname,length(fname)38,4);
run;

ballardw · Posted 03-20-2024 02:53 PM

Perhaps this gives you an alternate:

data example;
   fname='C:\Users\Chelsea.H\Desktop\QRR_2022\2022_*.csv';
   length QHP $ 4.;
   qhp = scan(fname,-1,'\');
run;

Scan doesn't mind negatives as long you don't exceed the number of substrings separated by the provided delimiter (at which point you get missing values). -1 asks for the last "word" where words in this string are separated by the \ character only.

Chelsea8 · Posted 03-25-2024 03:30 PM

Thank you! This got me almost there. Someone from another thread suggested adding VALIDVARNAME=ANY which got the name to come over.

Tom · Posted 03-20-2024 03:02 PM

Using SCAN() is much easier. Since you seem to want the first four characters of the actual name of the file (ignoring the path) you could just set the length of variable to $4 and then SAS will truncate it to 4 bytes since there is no place to store the other characters.

data fromcsv;
infile 'C:\Users\Chelsea.H\Desktop\QRR_2022\2022_*.csv' dlm=',' dsd truncover filename=fname;
length QHP $4;
length Identifier $6 HierarchyComponent $35 ComponentScore 8 CutPoint1 8
CutPoint2 8 CutPoint3 8 CutPoint4 8 Rating 8 MeasureScore 8 RawValue 8
Denominator 8 EligiblePopulation 8 Percentile5th 8 Percentile10th 8
Percentile25th 8 Percentile50th 8 Percentile75th 8 Percentile90th 8
Percentile95th 8 Mean 8 StdDeviation 8 Min 8 Max 8 MeasureBenchmark 8
;
input @;
if fname ne lag(fname) then delete;
input Identifier -- MeasureBenchmark ;
QHP= scan(fname,-1,'/\') ;
run;

Chelsea8 · Posted 03-20-2024 07:27 PM

I've tried the follow and their results: -1 (user), 1 (c:), 2 (user), 3 (error improper order), 4 (blank), 5 (blank), -5 (blank)

I know that it should grab right to left when negative, but I don't see that happening, as -1 or -5 doesn't work. Totally stumped.

data fromcsv;
infile 'C:\Users\Chelsea.Ha-Co\Desktop\QRR_2022\2022_*.csv' dlm=',' dsd truncover filename=fname;
length QHP $4;
length Identifier $6 HierarchyComponent $35 ComponentScore 8 CutPoint1 8
CutPoint2 8 CutPoint3 8 CutPoint4 8 Rating 8 MeasureScore 8 RawValue 8
Denominator 8 EligiblePopulation 8 Percentile5th 8 Percentile10th 8
Percentile25th 8 Percentile50th 8 Percentile75th 8 Percentile90th 8
Percentile95th 8 Mean 8 StdDeviation 8 Min 8 Max 8 MeasureBenchmark 8;
input @;
if fname ne lag(fname) then delete;
input Identifier -- MeasureBenchmark ;
QHP= scan(fname,3,'/\');
run;

Patrick · Posted 03-20-2024 09:13 PM

@Chelsea8 Define the length for fname explicitly to avoid any risk of truncation. Then use scan() with -1

data fromcsv;
length fname $1000 QHP $100;
infile 'C:\Users\Chelsea.Ha-Co\Desktop\QRR_2022\2022_*.csv' dlm=',' dsd truncover filename=fname;
....
QHP= scan(fname,-1,'\');
run;

Error in pulling over column names using proc import multiple csv files

Re: Error in pulling over column names using proc import multiple csv files

Re: Error in pulling over column names using proc import multiple csv files

Re: Error in pulling over column names using proc import multiple csv files

Re: Error in pulling over column names using proc import multiple csv files

Re: Error in pulling over column names using proc import multiple csv files

Re: Error in pulling over column names using proc import multiple csv files

Re: Error in pulling over column names using proc import multiple csv files

Re: Error in pulling over column names using proc import multiple csv files

Register Today!

SAS Training: Just a Click Away