I am absolutely new to SAS and SAS Viya, so this might seem like a stupid question, but really need some insight and help.
I am trying to load a CSV file from a Linux machine disk to CAS and I am using both the options of PROC IMPORT and PROC CASUTIL LOAD DATA.
It would be very helpful if someone can explain the fundamental difference in the functioning of these two modules because using these modules with the same dataset I am getting different results.
My dataset has 3192 columns and 30 rows, when I used PROC IMPORT it gives me a message saying "Number of names found is less than number of variables found." which I understand that the number of columns is less than that of variables used. But when I use PROC CASUTIL it imports perfectly without any message or warnings.
Also I observed that there was a comma at the end of the first line (header line) so I removed the same and then retried importing. With PROC IMPORT I am getting the same message but with PROC CASUTIL it failed with the error message saying "At least one row of input data is invalid: too many columns"
Kindly help
Thanks,
Rahul
3K columns might simply be too much for proc import. SAS needs to read the first line for names, and if that is too large for a single character variable (32K), some of the names will be lost. Then, when import inspects the data to determine data types and attributes, the lines are usually shorter than the 32K limit, so more columns are found than names.
I have no idea how CASUTIL does the import, but it is more modern in any case, so there will be some improvements.
If you have a somewhat consistent naming scheme for all those columns, you might be better off writing the data step yourself, using variable lists like
input (var1-var3192) (best.);
Mind that so many columns are usually a design problem; tables should have few columns and lots of rows. Think of transposing your dataset.
3K columns might simply be too much for proc import. SAS needs to read the first line for names, and if that is too large for a single character variable (32K), some of the names will be lost. Then, when import inspects the data to determine data types and attributes, the lines are usually shorter than the 32K limit, so more columns are found than names.
I have no idea how CASUTIL does the import, but it is more modern in any case, so there will be some improvements.
If you have a somewhat consistent naming scheme for all those columns, you might be better off writing the data step yourself, using variable lists like
input (var1-var3192) (best.);
Mind that so many columns are usually a design problem; tables should have few columns and lots of rows. Think of transposing your dataset.
Thanks for the inputs Kurt. Regarding your observation for the large number of columns in the table and less rows, well this was a test data hence the less number of rows. But my actual data is going to have 500K+ rows and 3192 columns reason being that this would be fed to SAS DM for decisioning which requires a lot of attributes for any row to help in good decisioning. So I will be sticking to CASUTIL to import and load the data.
I see. I once had a similar dataset prepared for regression analysis, with lots of dummy variables in groups created by lots of macro code.
Mind that you can also write a data step quite easily if you have the list of variables with attributes stored in some way. call execute() allows you to build statements exceeding any character variable limits. Statements do not have a size limit.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.