<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Big data analysis in SAS in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936656#M46738</link>
    <description>&lt;P&gt;Know your data.&amp;nbsp; Look at your CSV file.&amp;nbsp; Figure out which variables are numeric.&amp;nbsp; For character variables figure out how long each one needs to be to store all of the data (or all of the data you actually need). Write the data step to read it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To read a CSV file the data step is as simple as this.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  infile 'myfile.csv' dsd firstobs=2 truncover lrecl=1000000;
  length firstvar 8 secondvar $20 ..... lastvar 8 ;
  informat datevar mmddyy.;
  format datevar yymmdd10.;
  input firstvar -- lastvar;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So for your file just update the LENGTH statement that is setting the variable types and storage length. Define the variables in the same order they appear on the lines of the file and then the INPUT statement can be as simple as the one I show that uses a simple position based variable list.&amp;nbsp; You only need to attach formats or informats to variables that NEED them.&amp;nbsp; Most variables will NOT need them.&amp;nbsp; Usually only things like DATE, TIME and DATETIME variables will need either informats or formats attached to them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you don't know how to LOOK at your CSV file you can use a simple data step like this to read in the first 5 lines and dump them to the SAS log where you can look at them.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  infile 'myfile.csv' obs=5;
  input;
  list;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You might want to include the option LRECL= like I used above in case the lines in the file are longer than the default 32K bytes.&lt;/P&gt;</description>
    <pubDate>Mon, 22 Jul 2024 19:39:30 GMT</pubDate>
    <dc:creator>Tom</dc:creator>
    <dc:date>2024-07-22T19:39:30Z</dc:date>
    <item>
      <title>Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936487#M46711</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am working with a large dataset containing 500,000 rows and 4,000 columns, which requires 10 GB of memory. I imported the data into SAS, but only a portion of the data was successfully loaded, and many variables were not detected during analysis. My computer has limited memory, with only 16 GB available. Consequently, SAS is unable to open the entire dataset. How can I open the full dataset and ensure all variables are available for analysis in SAS?&lt;/P&gt;</description>
      <pubDate>Sat, 20 Jul 2024 13:36:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936487#M46711</guid>
      <dc:creator>Manije72</dc:creator>
      <dc:date>2024-07-20T13:36:33Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936490#M46712</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/439159"&gt;@Manije72&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Typical SAS session comes with some default settings, which you may need to customize to work with this "large" data set of yours&lt;/P&gt;
&lt;P&gt;to see the default settings, execute this code&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc options group=performance; run;
/* To see the memory settings - which is a subset of the performance */
proc options group=memory; run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;if you want to change the default settings, here are the ways to do it before starting the SAS session&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Windows shortcut &lt;STRONG&gt;Target&lt;/STRONG&gt; properties --&amp;gt; Add &lt;STRONG&gt;-memsize 12G&lt;/STRONG&gt; anywhere after the sas.exe&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Create sasv9.cfg file under your windows user home directory (C:\Users\&amp;lt;YourName&amp;gt;\), and within the sasv9.cfg have the following lines, as an example&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;-memsize 12G
-sortsize 4G
-cpucount 8
-threads&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Note: You can always&amp;nbsp; execute the proc options statement to check your customization affects&lt;/P&gt;
&lt;P&gt;Hope this can help&lt;/P&gt;</description>
      <pubDate>Sat, 20 Jul 2024 15:15:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936490#M46712</guid>
      <dc:creator>AhmedAl_Attar</dc:creator>
      <dc:date>2024-07-20T15:15:41Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936492#M46713</link>
      <description>&lt;P&gt;SAS does not normally load the whole dataset into memory, So the fact that it takes 10Gbytes of disk space to store the data should not prevent you from using it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How did you "import" the data into SAS?&amp;nbsp; What format is the data in now?&amp;nbsp; If you have it in a TEXT file, such as a CSV file then SAS can easily READ such a file.&amp;nbsp; Show the data step you used to read the file and explain how it failed to read all of the data.&lt;/P&gt;</description>
      <pubDate>Sat, 20 Jul 2024 16:22:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936492#M46713</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-20T16:22:45Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936493#M46714</link>
      <description>&lt;P&gt;Can you post the code you are running, and the log you get that shows an error messages about running out of memory?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;4,000 columns is a lot of columns, but 500,000 rows is not much data.&amp;nbsp; One of the beautiful designs of the SAS DATA step is that it does NOT need to open the entire dataset it reads.&amp;nbsp; It reads data one row at a time.&amp;nbsp; The DATA step was designed to use very little memory, because there wasn't much memory around when SAS was created.&amp;nbsp; So the decision decision was to minimize memory usage by increasing disk I/O.&amp;nbsp; Generally in a DATA step, it's unusual to hit a limitation in memory.&amp;nbsp; If you run some PROCs, like PROC SORT, they can be more memory intensive.&amp;nbsp; In a DATA step if you create a hash table, that can be memory intensive.&amp;nbsp; So depending on what step you are running when you have the problem, there may be different solutions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ideally, the best way to explain the problem would be to post some code that causes the problem, and then post the log you get from running that code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, below code will create a dataset with 50,000 rows and 4001 variables.&amp;nbsp; I would think it would run fine on your PC, does it?&amp;nbsp; Then maybe try running your PROC on some data like this, and see if you can get the memory error to occur.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have ;
  array x {4000} (1000*(3 6 9 12)) ;
  do id=1 to 500000 ;
    output ;
  end ;
run ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 20 Jul 2024 16:24:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936493#M46714</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2024-07-20T16:24:23Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936512#M46716</link>
      <description>&lt;P&gt;Please describe how you know " only a portion of the data was successfully loaded, and many variables were not detected during analysis".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What type of analysis? Many regression procedures by default will not use any observations with one or more missing values for variables on the MODEL statement. Other statements in other procedures will have similar limitations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jul 2024 05:25:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936512#M46716</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-07-21T05:25:56Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936515#M46717</link>
      <description>&lt;P&gt;What is the format (database, text file, Excel) of your source data, and how did you import it into SAS?&lt;/P&gt;
&lt;P&gt;With 4000 variables my "data hidden in structure" alarm bell goes off, so you should consider converting to a long dataset structure during the import process.&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jul 2024 09:20:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936515#M46717</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2024-07-21T09:20:16Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936611#M46723</link>
      <description>&lt;P&gt;My dataset is in CSV format. I import the data by clicking file/import data/CSV/.. While I can perform analysis, I am unable to view the data using &lt;CODE&gt;proc print&lt;/CODE&gt;. Furthermore, It appears that SAS cannot read the last 1000 variables correctly and displays them with generic names like &lt;CODE&gt;VAR4680, VAR4681, VAR4682,...&lt;/CODE&gt; instead of their actual names. Additionally, some of these variables are numeric, but SAS recognizes them as character variables when I use &lt;CODE&gt;proc contents&lt;/CODE&gt;. Consequently, when I run &lt;CODE&gt;proc means&lt;/CODE&gt; on these variables, I receive the error: 'Variable VAR4680 in list does not match type prescribed for this list'."&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 15:24:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936611#M46723</guid>
      <dc:creator>Manije72</dc:creator>
      <dc:date>2024-07-22T15:24:33Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936614#M46724</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/439159"&gt;@Manije72&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;My dataset is in CSV format. I import the data by clicking file/import data/CSV/.. While I can perform analysis, I am unable to view the data using &lt;CODE&gt;proc print&lt;/CODE&gt;. Furthermore, It appears that SAS cannot read the last 1000 variables correctly and displays them with generic names like &lt;CODE&gt;VAR4680, VAR4681, VAR4682,...&lt;/CODE&gt; instead of their actual names. Additionally, some of these variables are numeric, but SAS recognizes them as character variables when I use &lt;CODE&gt;proc contents&lt;/CODE&gt;. Consequently, when I run &lt;CODE&gt;proc means&lt;/CODE&gt; on these variables, I receive the error: 'Variable VAR4680 in list does not match type prescribed for this list'."&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;PROC IMPORT (or even the IMPORT tools in many point and click interfaces) is a good way to do QUICK AND DIRTY look at your data.&amp;nbsp; But because it has to GUESS how to define the variables you should not use it for anything important.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Instead write your own data step to READ the file. That way you can control how each variable is defined (name, type, storage length, display format, any special informat need to convert the text in the CSV file into valid data values, etc.).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And in particular PROC IMPORT has a bug that prevents it from seeing more than 32K bytes of the header line of the CSV file.&amp;nbsp; That is what is causing your variable names to be generic VAR4681 etc.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Another common trouble is that it will define empty variables as character of length 1 since that uses 7 fewer bytes that a numeric variable would.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want a tool that does a better guess about how to read a CSV file try using this macro instead:&amp;nbsp;&amp;nbsp;&lt;A href="https://github.com/sasutils/macros/blob/master/csv2ds.sas" target="_blank"&gt;https://github.com/sasutils/macros/blob/master/csv2ds.sas&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In addition to reading all of the header row it generates cleaner easier to use SAS code that you use as a starting point for writing your own data step to read the file.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 15:33:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936614#M46724</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-22T15:33:33Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936624#M46727</link>
      <description>Does it only prevent viewing the header? I mean, do the variables with generic names still contain the actual data in SAS?</description>
      <pubDate>Mon, 22 Jul 2024 16:41:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936624#M46727</guid>
      <dc:creator>Manije72</dc:creator>
      <dc:date>2024-07-22T16:41:49Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936627#M46728</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Some of these variables are numeric, but SAS recognizes them as character variables when I use&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;proc contents&lt;/CODE&gt;&lt;SPAN&gt;. Consequently, when I run&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;proc means&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;on these variables, I receive the error: 'Variable VAR4680 in list does not match type prescribed for this list'." How can I solve this problem?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 16:46:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936627#M46728</guid>
      <dc:creator>Manije72</dc:creator>
      <dc:date>2024-07-22T16:46:55Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936628#M46729</link>
      <description>proc print will not print your entire data set but you should be able to view it using the viewer or printing the top 10 observations though I have no idea how you visually verify 4000 columns. &lt;BR /&gt;&lt;BR /&gt;proc print datasetname(obs=10);run;&lt;BR /&gt;&lt;BR /&gt;Run Proc Import - look at the log. It will have the code. Copy and paste that into a new program and modify it, verifying your data until you're sure your data is read correctly. &lt;BR /&gt;&lt;BR /&gt;Unless...do you have a data dictionary for your data set?</description>
      <pubDate>Mon, 22 Jul 2024 16:57:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936628#M46729</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2024-07-22T16:57:22Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936629#M46730</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/439159"&gt;@Manije72&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I am working with a large dataset containing 500,000 rows and 4,000 columns, which requires 10 GB of memory. I imported the data into SAS, but only a portion of the data was successfully loaded, and many variables were not detected during analysis. My computer has limited memory, with only 16 GB available. Consequently, SAS is unable to open the entire dataset. How can I open the full dataset and ensure all variables are available for analysis in SAS?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;This what would be considered big data by many. 10GB would also seem to be a lot of text fields I'm guessing? What is the analysis plan for those text fields?&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 16:58:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936629#M46730</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2024-07-22T16:58:18Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936638#M46731</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;SAS cannot read the last 1000 variables correctly and displays them with generic names like&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;VAR4680, VAR4681, VAR4682,...&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;instead of their actual names. Additionally, some of these variables are numeric, but SAS recognizes them as character variables when I use&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;proc contents&lt;/CODE&gt;&lt;SPAN&gt;. Consequently, when I run&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;proc means&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;on these variables, I receive the error: 'Variable VAR4680 in list does not match type prescribed for this list'." I tried to convert character variable to numeric variables by some codes. However, the mean and SD that I got is not what expected to get for that variable,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;data kl;&lt;BR /&gt;set kk;&lt;BR /&gt;VAR4679_num=input(VAR4679, best32.);&lt;BR /&gt;run;&lt;BR /&gt;proc means data=kl;&lt;BR /&gt;var VAR4679_num;&lt;BR /&gt;run;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 17:35:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936638#M46731</guid>
      <dc:creator>Manije72</dc:creator>
      <dc:date>2024-07-22T17:35:59Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936641#M46732</link>
      <description>&lt;P&gt;I could see the data for the top 10 observations. Some of variables are continuous and should be numeric. However, SAS shows them character variables and I can not get the mean with proc means because it gives me this error "&lt;SPAN&gt;'Variable VAR4680 in list does not match type prescribed for this list".&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;I tried to convert character variable to numeric variables by some codes. However, the mean and SD that I got is not what I expected to get for that variable. I can not see the character data once I use proc print for 10 observations.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;data kl;&lt;BR /&gt;set kk;&lt;BR /&gt;VAR4679_num=input(VAR4679, best32.);&lt;BR /&gt;run;&lt;BR /&gt;proc means data=kl;&lt;BR /&gt;var VAR4679_num;&lt;BR /&gt;run;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 17:59:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936641#M46732</guid>
      <dc:creator>Manije72</dc:creator>
      <dc:date>2024-07-22T17:59:05Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936643#M46733</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/439159"&gt;@Manije72&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Does it only prevent viewing the header? I mean, do the variables with generic names still contain the actual data in SAS?&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Yes, but.&amp;nbsp; It might be you accidentally forgot to specify GUESSINGROW=MAX when you ran PROC IMPORT so it only use the first few lines to guess how to define the variables.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 18:03:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936643#M46733</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-22T18:03:45Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936644#M46734</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/439159"&gt;@Manije72&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;SPAN&gt;Some of these variables are numeric, but SAS recognizes them as character variables when I use&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;proc contents&lt;/CODE&gt;&lt;SPAN&gt;. Consequently, when I run&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;proc means&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;on these variables, I receive the error: 'Variable VAR4680 in list does not match type prescribed for this list'." How can I solve this problem?&lt;/SPAN&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;PROC CONTENTS shows how the variables are defined in the SAS dataset.&amp;nbsp; If PROC IMPORT created character variables for columns that you think should be numeric then one of three things happened.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) There is non numeric data somewhere in the column.&amp;nbsp; Perhaps it has those superfluous NA strings that programs like R like to write into CSV files.&lt;/P&gt;
&lt;P&gt;2) The variable is empty on every observation.&amp;nbsp; PROC IMPORT makes a 1 character variable in that case to save space in the SAS dataset.&lt;/P&gt;
&lt;P&gt;2) You did not tell it to use enough observations to guess how to read the data and all of the observations checked had empty values for in that column.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 18:08:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936644#M46734</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-22T18:08:07Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936645#M46735</link>
      <description>&lt;P&gt;Is this code correct to import the data in SAS:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc import datafile='path_to_my_file.csv'&lt;BR /&gt;out=data&lt;BR /&gt;dbms=csv&lt;BR /&gt;replace;&lt;BR /&gt;getnames=yes;&amp;nbsp;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 18:21:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936645#M46735</guid>
      <dc:creator>Manije72</dc:creator>
      <dc:date>2024-07-22T18:21:57Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936646#M46736</link>
      <description>&lt;P&gt;Only if you know the file has less than 20 observations.&lt;/P&gt;
&lt;P&gt;Check out the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/v_044/proc/p13kvtl8ezj13in17i6m99jypcwi.htm" target="_self"&gt;GUESSINGROWS=&lt;/A&gt; statement of PROC IMPORT.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 18:34:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936646#M46736</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-22T18:34:37Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936650#M46737</link>
      <description>&lt;P&gt;&lt;STRONG&gt;1) There is non numeric data somewhere in the column.&amp;nbsp; Perhaps it has those superfluous NA strings that programs like R like to write into CSV files.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Yes, maybe there are some string variables like "NaN" . How should I solve this problem now?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2) The variable is empty on every observation.&amp;nbsp; PROC IMPORT makes a 1 character variable in that case to save space in the SAS dataset.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;How can I solve this problem?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3) You did not tell it to use enough observations to guess how to read the data and all of the observations checked had empty values for in that column.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;How should I solve this problem?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I previously imported the data by clicking File &amp;gt; Import Data &amp;gt; CSV, and it was successfully imported. Now, I tried to import it using this code, but it has been running for two hours.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc import datafile='path_to_your_file.csv'&lt;BR /&gt;out=your_dataset&lt;BR /&gt;dbms=csv&lt;BR /&gt;replace;&lt;BR /&gt;getnames=yes;&amp;nbsp;&lt;BR /&gt;guessingrows=max;&amp;nbsp;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 19:22:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936650#M46737</guid>
      <dc:creator>Manije72</dc:creator>
      <dc:date>2024-07-22T19:22:27Z</dc:date>
    </item>
    <item>
      <title>Re: Big data analysis in SAS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936656#M46738</link>
      <description>&lt;P&gt;Know your data.&amp;nbsp; Look at your CSV file.&amp;nbsp; Figure out which variables are numeric.&amp;nbsp; For character variables figure out how long each one needs to be to store all of the data (or all of the data you actually need). Write the data step to read it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To read a CSV file the data step is as simple as this.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  infile 'myfile.csv' dsd firstobs=2 truncover lrecl=1000000;
  length firstvar 8 secondvar $20 ..... lastvar 8 ;
  informat datevar mmddyy.;
  format datevar yymmdd10.;
  input firstvar -- lastvar;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So for your file just update the LENGTH statement that is setting the variable types and storage length. Define the variables in the same order they appear on the lines of the file and then the INPUT statement can be as simple as the one I show that uses a simple position based variable list.&amp;nbsp; You only need to attach formats or informats to variables that NEED them.&amp;nbsp; Most variables will NOT need them.&amp;nbsp; Usually only things like DATE, TIME and DATETIME variables will need either informats or formats attached to them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you don't know how to LOOK at your CSV file you can use a simple data step like this to read in the first 5 lines and dump them to the SAS log where you can look at them.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  infile 'myfile.csv' obs=5;
  input;
  list;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You might want to include the option LRECL= like I used above in case the lines in the file are longer than the default 32K bytes.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 19:39:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Big-data-analysis-in-SAS/m-p/936656#M46738</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-07-22T19:39:30Z</dc:date>
    </item>
  </channel>
</rss>

