BookmarkSubscribeRSS Feed
maxjiang6999
Calcite | Level 5

Hi guys, I know that when we use "proc append", the base always prevails. What if I want to suppress it? How can I do it?
What I'm trying to do is:

1. read in hundreds of 'tsv' files 

2. rename some columns 

3. append those files to create a master file.

I succeeded to read in all files but I ran into "type mismatch" error, specifically only for the numerical variables, when I tried to append those files. 

I tried 1. FORCE option, 2. change the 'FORMAT',INFORMAT' to 'best 32.' in the 'DATA INFILE' step (my thinking for this is formatting the variables the same before appending them). The second helped some variables but I still have this error for other variables.
Can anyone help?

 

I also have a super simple question. When we read in data to SAS, how does SAS decide the datatype for the variable if we use proc import? Does it depend on the first few observations(SAS will make a guess)? I know we can specify the data type using the data step. Another related question would be if the first few observations are numeric, what would happen if there's a character in the column? Would SAS still read it in as what it is or as '.' or sth else?

 

%macro import(dt);
data work.test (where=(CUSIP ne '')     )   ;
%let _EFIERR_ = 0; 
infile "C:\Users\abc_&dt..tsv"  delimiter='09'x
MISSOVER DSD lrecl=32767 firstobs=2 ;
informat DXL_Identifier best32. ;
informat ISIN $12. ;
informat SEDOL $7. ;
informat CUSIP $9. ;
informat Quick best32. ;
informat Stock_Description $62. ;
informat Market_Area__Country_level_ $14. ;
informat Dvd_Req best32. ;
informat Record_Type best32. ;
informat Total_Demand_Value best32. ;
informat Total_Demand_Quantity best32. ;
	 format DXL_Identifier best12. ;
         format ISIN $12. ;
         format SEDOL $7. ;
         format CUSIP $9. ;
         format Quick best12. ;
         format Stock_Description $62. ;
         format Market_Area__Country_level_ $14. ;
         format Dvd_Req best12. ;
         format Record_Type best12. ;
         format Total_Demand_Value best12. ;
         format Total_Demand_Quantity best12. ;
      input
                  DXL_Identifier
                  ISIN  $
                  SEDOL  $
                  CUSIP  $
                  Quick
                  Stock_Description  $
                  Market_Area__Country_level_  $
                  Dvd_Req
                  Record_Type
                  Total_Demand_Value
                  Total_Demand_Quantity
                  
;
if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
run;
proc contents data=test;
run;

data test2(
	rename=(ISIN = ISIN));
	attrib CUSIP format=$8. Date format=YYMMDDn8.;
	set test(where=(Record_Type eq 1));
	CUSIP=substr(CUSIP,1,8);
	Year=substr("&dt",1,4);
	Month=substr("&dt",5,2);
	Day=substr("&dt",7,2);
	Year2=input(Year,4.);
	Month2=input(Month,2.);
	Day2=input(Day,2.);
	Date=mdy(Month2, Day2, Year2);
	drop Year Month Day Year2 Month2 Day2 Record_Type;
run;

proc contents data=test2;
run;

proc append data=test2 base=test3 force; run;


%mend;
3 REPLIES 3
Tom
Super User Tom
Super User

What is a TSV file? The data step you posted is reading a simple tab delimited text file. 

 

To avoid having SAS guess how to define the variables do not use PROC IMPORT.  Instead just write the data step to read the file based on your understanding of what columns are in the file and how they should be defined.  Note that your program does not need to be nearly as complex or ugly as the code that PROC IMPORT generates.  There is no need to attach formats or informats for most variables, SAS already knows how to both read and write character strings and numbers. The exception are things like date, time and datetime values.  There is also no need to include $ in the input statement if you have already defined the variable as character before the INPUT statement (either directly with LENGTH or ATTRIB statement or indirectly by including the variable in a FORMAT or INFORMAT statement).  Also if you have already set the order of teh variables you can just use a first -- last variable list in the INPUT statement.

 

data work.test ;
  infile "C:\Users\abc_&dt..tsv"  dsd dlm='09'x truncover firstobs=2;
  length 
     DXL_Identifier 8
     ISIN $12
     SEDOL $7
     CUSIP $9
     Quick 8
     Stock_Description $62
     Market_Area__Country_level_ $14
     Dvd_Req 8
     Record_Type 8
     Total_Demand_Value 8
     Total_Demand_Quantity 8
  ;
  input DXL_Identifier -- Total_Demand_Quantity ;
run;

 

mkeintz
PROC Star
I presume TSV is CSV with a tab instead of a comma. I admit I've never seen that designation before.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

When you read all your files with the same data step, you can't have a mismatch.

If you have a base dataset that does not come from reading an external file, adapt your data step to the structure of that base dataset; run proc contents on it to show you the necessary attributes.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1761 views
  • 1 like
  • 4 in conversation