BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
weg
Obsidian | Level 7 weg
Obsidian | Level 7

I have a large csv file with thousands of columns, and another dataset containing desired metadata (corresponding to dictionary.columns rows) for that column. Values are different from proc import would guess for the main csv.

 

Is there a nice  way to import the metadata from this file so I dont have to set individual column lengths/types/formats?  my naive solution would be to read the metadata file in a macro statement and use the row values to populate the length statements when reading the main file, but that seems horribly inelegant, and I'm hoping there is a nice proc statement I can use to set the metadata for the import from the metadata file directly

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Read the metadata into an actual dataset.  There is not really any need to then transfer it to macro variables as the desired result is to generate CODE , not macro variables.

 

So if your metadata has VARNUM (order of the columns in the CSV file), NAME, TYPE, LENGTH and when necessary INFORMAT and when desired FORMAT and LABEL values then using it to generate ATTRIB statements to define the variables and an INPUT statement to read the variables.

filename code temp;
data _null_;
   set metadata end=eof;
   by varnum;
   file code ;
   length first_var nliteral $60;
   nliteral=nliteral(name);
   if _n_=1 then first_var = nliteral ;
   retain first_var;
   put 'ATTRIB ' name 'LENGTH=' @;
   if type='char' then put '$' @ ;
   put length @;
   if not missing(informat) then put informat= @;
   if not missing(format) then put format= @;
   if not missing(label) and label ne name then put label= :$quote. @ ;
  put ';' ;
  if eof then put 'INPUT ' first_var '-- ' nliteral ';' ;
run;

Then when you have those statements you just need include them into a data step that reads the actual CSV file.

data want;
  infile 'myfile.csv' dsd truncover firstobs=2 lrecl=1000000 ;
%Include code / source2;
run;

For example if you had the metadata from DICTIONARY.COLUMNS (also available at SASHELP.VCOLUMN) for SASHELP.CLASS.

proc sql noprint;
create table metadata as 
select varnum,name,type,length,informat,format,label
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
order by varnum
;
quit;

And used that to generate the code to read a CSV file generated from SASHELP.CLASS the LOG would look like this:

2442  data want;
2443    infile csv dsd truncover firstobs=2 lrecl=1000000 ;
2444  %Include code / source2;
NOTE: %INCLUDE (level 1) file CODE is (system-specific pathname).
2445 +ATTRIB Name LENGTH=$8 ;
2446 +ATTRIB Sex LENGTH=$1 ;
2447 +ATTRIB Age LENGTH=8 ;
2448 +ATTRIB Height LENGTH=8 ;
2449 +ATTRIB Weight LENGTH=8 ;
2450 +INPUT Name -- Weight ;
NOTE: %INCLUDE (level 1) ending.
2451  run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 19 records were read from the infile (system-specific pathname).
      The minimum record length was 17.
      The maximum record length was 21.
NOTE: The data set WORK.WANT has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

View solution in original post

3 REPLIES 3
weg
Obsidian | Level 7 weg
Obsidian | Level 7
Thanks, the %include was the glue I was missing Your answer and toms example gave me enough to go on. I have columns for sas_name, type, length, pos, format, and label so this will apply them all.
Tom
Super User Tom
Super User

Read the metadata into an actual dataset.  There is not really any need to then transfer it to macro variables as the desired result is to generate CODE , not macro variables.

 

So if your metadata has VARNUM (order of the columns in the CSV file), NAME, TYPE, LENGTH and when necessary INFORMAT and when desired FORMAT and LABEL values then using it to generate ATTRIB statements to define the variables and an INPUT statement to read the variables.

filename code temp;
data _null_;
   set metadata end=eof;
   by varnum;
   file code ;
   length first_var nliteral $60;
   nliteral=nliteral(name);
   if _n_=1 then first_var = nliteral ;
   retain first_var;
   put 'ATTRIB ' name 'LENGTH=' @;
   if type='char' then put '$' @ ;
   put length @;
   if not missing(informat) then put informat= @;
   if not missing(format) then put format= @;
   if not missing(label) and label ne name then put label= :$quote. @ ;
  put ';' ;
  if eof then put 'INPUT ' first_var '-- ' nliteral ';' ;
run;

Then when you have those statements you just need include them into a data step that reads the actual CSV file.

data want;
  infile 'myfile.csv' dsd truncover firstobs=2 lrecl=1000000 ;
%Include code / source2;
run;

For example if you had the metadata from DICTIONARY.COLUMNS (also available at SASHELP.VCOLUMN) for SASHELP.CLASS.

proc sql noprint;
create table metadata as 
select varnum,name,type,length,informat,format,label
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
order by varnum
;
quit;

And used that to generate the code to read a CSV file generated from SASHELP.CLASS the LOG would look like this:

2442  data want;
2443    infile csv dsd truncover firstobs=2 lrecl=1000000 ;
2444  %Include code / source2;
NOTE: %INCLUDE (level 1) file CODE is (system-specific pathname).
2445 +ATTRIB Name LENGTH=$8 ;
2446 +ATTRIB Sex LENGTH=$1 ;
2447 +ATTRIB Age LENGTH=8 ;
2448 +ATTRIB Height LENGTH=8 ;
2449 +ATTRIB Weight LENGTH=8 ;
2450 +INPUT Name -- Weight ;
NOTE: %INCLUDE (level 1) ending.
2451  run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 19 records were read from the infile (system-specific pathname).
      The minimum record length was 17.
      The maximum record length was 21.
NOTE: The data set WORK.WANT has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1483 views
  • 2 likes
  • 3 in conversation