Assigning length to all Character Variables in a Dataset

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Assigning length to all Character Variables in a Dataset

[ Edited ]
 

This is the sas code I have written to carry out the process.

 I have multiple csv datasets with multiple character and numeric fields. Assigning a length to all character fields with length statement would be extensive. The csv files have header starting from row 3. That's why I have to do the First import/export and then import part. What do you think I am doing wrong. The macro Variable 'Charnames' doesn't get resolved in the first iteration and uses the old values (1st itr values) in the second itr, thus creating new variables and not changing length of certain old ones.

 

All the csv files have differnt fields too. There are only few similar fields.

 

%macro get_filenames(location);
filename _dir_ "%bquote(&location.)";
data filenames(keep=memname);
  handle=dopen( '_dir_' );
  if handle > 0 then do;
    count=dnum(handle);
    do i=1 to count;
      memname=dread(handle,i);
      output filenames;
    end;
  end;
  rc=dclose(handle);
run;
filename _dir_ clear;
%mend;



%macro ReadInputFiles(filePath, Fformat, sheetName,outDS, itr);
	
	%let Prefix= Fw_;
	%let extension= .csv;
	

	proc import  DATAFILE="&filePath" out=&outDS
		dbms=&Fformat replace;
		getnames= no;
		*namerow= 3;
		datarow= 3;
		GUESSINGROWS=10000;
		%if %length(%cmpres(&sheetName)) > 0 %then %do;
			Sheet=&sheetName;
		%end;
	run;

	
	proc export data = &outDS outfile= "%trim(&inXLDir/&Prefix&itr&extension)"
	  dbms=csv REPLACE;
	  PUTNAMES= NO;
	run;

	
	proc import  DATAFILE= "%trim(&inXLDir/&Prefix&itr&extension)" out= &Prefix&outDS
		dbms=&Fformat replace;
		getnames= yes;
		*namerow= 3;
		datarow= 2;
		GUESSINGROWS=10000;
		%if %length(%cmpres(&sheetName)) > 0 %then %do;
			Sheet=&sheetName;
		%end;
	run;

	%let Dataset= %upcase(&Prefix&outDS);
	%put /////&dataset;

	proc sql noprint;
	  select name into :names separated by ' '
	  from dictionary.columns where libname='WORK' and memname= "&dataset" and type='char'
	  ;
	quit;

	data inlib.&Prefix&outDS;
	  length &names $300 ;
	  set &Prefix&outDS;
	  if compress(cats(of _all_),,'kad') = '' then delete;
	run;




%mend;


%macro ReadAllRateCardFiles;

	%let CurrDt = %sysfunc(today(),worddate.);
	%let Month  = %sysfunc(today(),monname.);
	%let LastMonth = %sysfunc(putn(%sysfunc(INTNX(Month,%sysfunc(today()),-1)),monname.));
	%let LastlastMonth= %sysfunc(putn(%sysfunc(INTNX(Month,%sysfunc(today()),-2)),monname.));

	%let Path_Month= &inXLDir/&LastlastMonth;
	;

	%get_filenames("&inXLDir");  

	proc sql;
		select memname ,count(memname) into :Files_Dir separated by ',', :count_files
			from filenames
	;
	quit; 


	

	%do i= 1 %to &count_files;

	
		%let FileName = %cmpres(%scan(%bquote(&Files_Dir),&i,%str(,)));
		%put &Filename;

		data ReadFile_Status1;

			length path $5000;
		
			%let FilePath = "%trim(&inXLDir/&FileName)";
			path = &FilePath;
			
			FileExists = fileexist(path);
			%let DSName = %scan(%scan(%bquote(&Files_Dir),&i,%str(,)),1,'.');

			if FileExists > 0 then do;
				call execute('%ReadInputFiles('||path||',csv,,'||"&DSName"|| ',' ||&i|| ')');
			end;

		
		run;

%end;
		

%mend;

%ReadAllRateCardFiles;

Accepted Solutions
Solution
‎08-26-2016 11:47 AM
Super User
Super User
Posts: 6,499

Re: Assigning length to all Character Variables in a Dataset

[ Edited ]

Why don't we re-engineer this process so the SAS code is a little shorter and clearer?

Let's work from the top down.  So you want to pass in a directory and have it read the files into datasets.  So your call at the end of the program will look like this:

%ReadAllRateCardFiles(&inXLDir);

Now lets look at definition of the macro that takes the directory path as input and generates calls to read each file.

%macro ReadAllRateCardFiles(path);
* Get list of filenames ;
  %get_filenames("&path");

* Calculate member name from filename and generate call to read it ;
  filename code temp;
  data filename_status ;
    set filenames ;
    length dsname $32 fileexists 8 filePath $500 ;
    DSName = 'FW_'||upcase(scan(memname,1,'.'));
    filePath = "&path/" || memname ;
    FileExists = fileexist(filePath);
    if FileExists then put '%ReadInputFiles(' filePath ',' dsname ');' ;
  run;

* Run macro calls to read the files ;
  %include code / source2 ;

%mend ReadAllRateCardFiles ;

Now let's look at the program to read the files.  Let's make it only handle CSV files to make it easier.

%macro ReadInputFiles(filePath,outDS,outlib=inlib);
%local names ;

filename tempcsv temp;

* Copy file and remove first two lines and any all blank lines ;
data _null_;
  infile "&filePath" firstobs=3 ;
  file tempcsv ;
  input;
  if compress(_infile_,',') = ' ' then delete;
  put _infile_;
run;

* Use PROC IMPORT to create work dataset ;
proc import  datafile=tempcsv out=work.&outDS replace dbms=dlm ;
  delimiter=',';
  guessingrows=10000;
run;

* Get list of character variables ;
proc sql noprint;
  select name into :names separated by ' '
    from dictionary.columns
    where libname='WORK'
      and memname= %upcase("&outDS")
      and type='char'
  ;
quit;
%if (&sqlobs) %then %do;
  %let names=length &names $300 ;
%end ;

* Write permanent dataset ;
* If any character variables then set length to $300 ;
data &outlib..&outDS;
  &names;
  set &outDS;
run;

%mend ReadInputFiles;

Your macro to read the file names can stay the same.

 

View solution in original post


All Replies
Super User
Posts: 5,081

Re: Assigning length to all Character Variables in a Dataset

Here are one very suspicious issue that could contribute to such a problem.

 

  • Why is the name of the data set FW_HOUSE_IMPS_11 hard-coded as part of the macro?  Shouldn't the incoming data set vary each time?

Also, it might help to see how you are looping through many data sets.  This macro processes only one data set.

Contributor
Posts: 24

Re: Assigning length to all Character Variables in a Dataset

Hey Sorry about that hard coded memname. I was trying to do a test run. It should be "&dataset".

 

Also i have edited the question with the rest of code.

 

Thanks

Super User
Posts: 5,081

Re: Assigning length to all Character Variables in a Dataset

OK, a new chief culprit arises:  CALL EXECUTE.  CALL EXECUTE executes its statements as quickly as it can.  For generated DATA and PROC steps, that means waiting until after the current DATA step is complete.  But for macro language statements, that means right away before the current DATA step ends.  Picture all your %LET statements executing multiple times before the first PROC IMPORT begins. 

 

If I recall properly, %NRSTR can handle that behavior, forcing the macro language statements to wait until the current DATA step is complete.  Try it this way and see:

 

call execute(%nrstr('%ReadInputFiles('||path||',csv,,'||"&DSName"|| ',' ||&i|| ')'));

 

If that doesn't work, you may have to just use FILE/PUT to write the macro calls to a separate file, then %include that file.

Contributor
Posts: 24

Re: Assigning length to all Character Variables in a Dataset

Thanks a lot for the reply.  I tried %nrstr but it doesn't seem to be working.

Super User
Posts: 5,081

Re: Assigning length to all Character Variables in a Dataset

OK.  Are you familiar with the alternative approach?  Replacing CALL EXECUTE with FILE/PUT, write the macro calls to a separate file, then %include the file?

Contributor
Posts: 24

Re: Assigning length to all Character Variables in a Dataset

Actually no..Cat Embarassed

I would have to raed about it.

Super User
Posts: 5,081

Re: Assigning length to all Character Variables in a Dataset

OK, it will be helpful to learn about here ... both to overcome difficulties with CALL EXECUTE and to verify that the code you are generating is the right code.  Here are the basics of FILE/PUT.  You can easily learn about %INCLUDE if you don't know it already.

 

Within the DATA step that contains CALL EXECUTE ...

 

Select a file that will contain calls to your macro.  This will be a text file ... capable of holding a SAS program.  The FILE statement designates that PUT messages will be written to that file.  So early in your DATA step, add:

 

file 'path to my selected file' noprint;

 

Then replace CALL EXECUTE with a PUT statement that will write out a call to the macro:

 

put '%ReadInputFiles(' path ",csv,, &DsName, &i)";

 

The single quotes will suppress any attempt to execute %ReadInputFiles.  The double quotes will allow &Dsname and &i to resolve before being written out.

 

See if that code generates an accurate set of macro calls.  (If not, it won't be difficult to tweak the code.)

Contributor
Posts: 24

Re: Assigning length to all Character Variables in a Dataset

Its generating the right call to macro. It shows the last iterator call.

Now should I use this text file in someway?

Super User
Posts: 5,081

Re: Assigning length to all Character Variables in a Dataset

It sounds like each iteration of the %DO loop is replacing the contents of this new file.  That's OK, but affects how you use the file.  To use it, add one line to the definition of %ReadAllRateCardFiles:

 

%include 'path to this file';

 

Add it INSIDE the %DO loop, just before the %END statement (2 lines before the %MEND statement).

Contributor
Posts: 24

Re: Assigning length to all Character Variables in a Dataset

Heyy,,

 

This worked.Smiley Very Happy Thank you so much. Though I don't fully understand the logic.

Are we trying to delay the macro call with the file/put? 

 

Thanks Again....

Super User
Posts: 5,081

Re: Assigning length to all Character Variables in a Dataset

That's exactly it.  The macro call is being delayed, because CALL EXECUTE will execute all its macro statements immediately and execute the DATA and PROC steps later.

Contributor
Posts: 24

Re: Assigning length to all Character Variables in a Dataset

I have a question. I tried applying the same logic somewhere else too. There I am just focussing on Importing the files and not using this proc sql or setting length of character variables.

 

proc sql;
select name into :Charname separated by ' '
from dictionary.columns where upcase(libname)='WORK' and upcase(memname)= "&DATASET" and upcase(type)='CHAR'
;
quit;

 

Can proc sql not be used inside a macro like this? Is reading a dataset formed inside a macro not possible?

Super User
Posts: 5,081

Re: Assigning length to all Character Variables in a Dataset

The capabilities you are asking about are all possible.  One thing to watch for:  is &CHARNAME already defined as global?  If not, SQL will create it in the local symbol table.  It might have been erased by the time you need to use it.

Super User
Posts: 10,497

Re: Assigning length to all Character Variables in a Dataset


SASEnthusiast wrote:

 

This is the sas code I have written to carry out the process.

 I have multiple csv datasets with multiple character and numeric fields. Assigning a length to all character fields with length statement would be extensive. The csv files have header starting from row 3. That's why I have to do the First import/export and then import part. What do you think I am doing wrong. The macro Variable 'Charnames' doesn't get resolved in the first iteration and uses the old values (1st itr values) in the second itr, thus creating new variables and not changing length of certain old ones.

 


Some questions:

Are these CSV files supposed to all have the same structure?

How many are " multiple character and numeric fields"?

Do you have a document somewhere that describes these files?

Are you going to be combining these datasets?

 

I ask as if you have a Document with the description, especially if it is a table you may be able to generate the code much easier than this approach.

If they have the same structure and you are going to combine them then you may be able to read them using:

Filename DataIn "C:\path\*.CSV";

instead of looping through the individual sets.

If Multiple variables is less then a couple of hundred and the files have the same structure it may well be worth the time to import ONE file, copy the code generated by Proc Import from the log and modify the code with search and replace and set lengths for the character variables a bit larger than the generated code "just in case".

The INFORMAT statements in the generated datastep will set the length as well so you only have to change a digit or two in each.

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 666 views
  • 3 likes
  • 5 in conversation