BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Nathalie1
Obsidian | Level 7

Hello,

I try to  import a file excel .csv with proc import but I obtain a table with only one column instead a table with several column.

I use this program of the SAS documentation :

proc import datafile="C:\Users\771\Desktop\deseq.csv" 
        out=Denis.sclero
        dbms=dlm
        replace;
		delimiter=",";  
		     getnames=yes;
run;

The file csv contains 24405 rows.

The first raw contains the names of 42 variables.

I use SAS 9.4 under windows 10.

Thank your for your help, I searched among similar topics but I don't obtain a solution.

Nathalie

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If the NA only appears in a field that is character anyway then there is no need to preprocess the file to remove them.  They will not cause PROC IMPORT to define the variable with the wrong type.  You can either leave the value as NA or convert it to blanks later if you want.

View solution in original post

15 REPLIES 15
Kurt_Bremser
Super User

If you used "Save as" from Excel, you will have semicolons as delimiters.

 

But one never uses PROC IMPORT for such text files. You know the structure and contents of the table, so you write the data step to read the file yourself. Do not rely on the guessing of PROC IMPORT.

Tom
Super User Tom
Super User

Can you post the lines from the SAS log for the PROC IMPORT step?   Please do not attach documents as most people will not download them.  Instead just highlight the lines of text, copy them and paste them into the pop-up window you get when you push Insert Code icon in this forum's editor screen. The icon looks like the three characters < / > .

 

Can you post a few of the lines of the CSV file?  Again open the file using a text editor and just copy a few lines and paste them into the pop-up window.

 

If PROC IMPORT thinks there is only one variable then you used the wrong delimiter.  Don't be fooled by the extension on the filename.  The creator of the file might have called the file a Comma Separated Values file, but the content might be using a different delimiter.  It might not even be a text file.

Nathalie1
Obsidian | Level 7

Hello,

I dont' work usually with this sort of file. It's bioinformatics (NGS) and new to me.

Thanks

Here is the log/

Nom
Row-names;Symbol;myopathie-ctrlFC;myopathiepadj;myosite-ctrlFC;myositepadj;necrose-ctrl
FC;necrosepadj;data;SC1;SC10;SC12;SC13;SC14;SC2;SC4;SC7;SM1;SM10;SM2;SM4;SM6;SM7;SM8;SM
9;SN1;SN2;SN3;SN4;SN5;SN6;
tronqué à Row_names_Symbol_myopathie_ctrlF.
Problèmes détectés dans les noms fournis. Consultez le Journal.
8     /**********************************************************************
9     *   PRODUCT:   SAS
10    *   VERSION:   9.4
11    *   CREATOR:   External File Interface
12    *   DATE:      22NOV22
13    *   DESC:      Generated SAS Datastep Code
14    *   TEMPLATE SOURCE:  (None Specified.)
15    ***********************************************************************/
16       data DENIS.SCLERO    ;
17       %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
18       infile 'C:\Users\771\Desktop\deseq.csv' delimiter = ',' MISSOVER DSD
18 ! lrecl=32767 firstobs=2 ;
19          informat Row_names_Symbol_myopathie_ctrlF $670. ;
20          format Row_names_Symbol_myopathie_ctrlF $670. ;
21       input
22                   Row_names_Symbol_myopathie_ctrlF  $
23       ;
24       if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro
24 ! variable */
25       run;

NOTE: The infile 'C:\Users\771\Desktop\deseq.csv' is:
      Nom du fichier=C:\Users\771\Desktop\deseq.csv,
      RECFM=V,LRECL=32767,
      Taille de fichier (octets)=16489909,
      Modifié(e) le=18 novembre 2022 18 h 53,
      Heure de création=17 novembre 2022 15 h 25

NOTE: 24404 records were read from the infile 'C:\Users\771\Desktop\deseq.csv'.
      The minimum record length was 570.
      The maximum record length was 763.
NOTE: The data set DENIS.SCLERO has 24404 observations and 1 variables.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
      real time           0.09 seconds
      cpu time            0.06 seconds


24404 lignes créées dans DENIS.SCLERO à partir de C:\Users\771\Desktop\deseq.csv.



NOTE: DENIS.SCLERO data set was successfully created.
NOTE: The data set DENIS.SCLERO has 24404 observations and 1 variables.
NOTE: PROCEDURE IMPORT a utilisé (Durée totale du traitement) :
      real time           0.58 seconds
      cpu time            0.10 seconds

 

Nathalie1
Obsidian | Level 7

And the 3 lines of csv

Row-names,"Symbol","myopathie-ctrl___log2FoldChange","myopathie-ctrl___padj","myosite-ctrl___log2FoldChange","myosite-ctrl___padj","necrose-ctrl___log2FoldChange","necrose-ctrl___padj","data","SC1","SC10","SC12","SC13","SC14","SC2","SC4","SC7","SM1","SM10","SM2","SM4","SM6","SM7","SM8","SM9","SN1","SN2","SN3","SN4","SN5","SN6","SN7","SN8","SO1","SO10","SO2","SO4","SO5","SO6","SO7","SO8"
AARS,NA,-0.0758710675843131,0.861241285307165,-0.0358099824068411,0.903339794457185,-0.00200734386277129,0.99462684874415," ",10.6415505239976,10.4638914678016,10.2479485175174,10.2359483582564,10.4116592260173,10.3058066413489,10.4252465769884,10.4394331728511,10.3594297000236,10.3464511680559,10.2436473092891,10.5534213132843,10.3651261650145,10.4227791773922,10.0901296650663,10.3708921047553,10.3587402169133,10.188232755801,10.3138521494347,10.2427816650575,10.2421397804036,10.389929730174,11.0275995673176,10.1755995134686,10.2871045787052,10.310614941516,10.9250995236069,10.3040923746184,10.0927886814902,10.4591509715167,10.0936536706565,10.3173530034344
AC000032.1,NA,0.882934964982423,0.504145260865,0.154452099679535,0.896660567110898,-0.273785673206738,0.769945717478779," ",3.49959860433791,2.87628077432956,2.8394423579697,3.12749696468309,3.07145656859786,2.98886453024241,2.90854503492956,3.77843342791373,3.00662641171253,3.0575181753425,3.1361492164693,3.910495310054,3.76353231394963,3.78347226756131,3.37324474769781,3.00793812482112,2.8403253211428,2.98010892388058,3.02577464469225,3.58905886642943,3.20624831046413,2.7870332045477,2.93264746758655,3.3597347585454,3.76866926319802,2.93101119879043,3.53135840399672,3.15264002310487,2.8711366073201,2.98964950473105,3.1772057983745,3.03120063031753
Reeza
Super User
Whatever tool exported the data wrapped each line in quotes as well as comma delimiters.
Typically anything in quotes is considered one field so PROC IMPORT is doing what is expected.
I would add quotes to the delimiter list and see what that does.

Not sure that will work. You probably want to strip the quotes somehow....weird way to export a file.

proc import datafile="C:\Users\771\Desktop\deseq.csv"
out=Denis.sclero
dbms=dlm
replace;
delimiter='",";
getnames=yes;
run;
Tom
Super User Tom
Super User

Looks like something quoted every line.

1    data _null_;
2      infile "c:\downloads\deseqpartial.csv" obs=5;
3      input;
4      list;
5    run;

NOTE: The infile "c:\downloads\deseqpartial.csv" is:
      Filename=c:\downloads\deseqpartial.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=6517,
      Last Modified=18Nov2022:13:01:57,
      Create Time=18Nov2022:13:01:57

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1         "Row-names,""Symbol"",""myopathie-ctrl___log2FoldChange"",""myopathie-ctrl___padj"",""myosite-ctrl__
     101  _log2FoldChange"",""myosite-ctrl___padj"",""necrose-ctrl___log2FoldChange"",""necrose-ctrl___padj"",
     201  ""data"",""SC1"",""SC10"",""SC12"",""SC13"",""SC14"",""SC2"",""SC4"",""SC7"",""SM1"",""SM10"",""SM2"
     301  ",""SM4"",""SM6"",""SM7"",""SM8"",""SM9"",""SN1"",""SN2"",""SN3"",""SN4"",""SN5"",""SN6"",""SN7"",""
     401  SN8"",""SO1"",""SO10"",""SO2"",""SO4"",""SO5"",""SO6"",""SO7"",""SO8""" 471
2         "AARS,NA,-0.0758710675843131,0.861241285307165,-0.0358099824068411,0.903339794457185,-0.002007343862
     101  77129,0.99462684874415,"" "",10.6415505239976,10.4638914678016,10.2479485175174,10.2359483582564,10.
     201  4116592260173,10.3058066413489,10.4252465769884,10.4394331728511,10.3594297000236,10.3464511680559,1
     301  0.2436473092891,10.5534213132843,10.3651261650145,10.4227791773922,10.0901296650663,10.3708921047553
     401  ,10.3587402169133,10.188232755801,10.3138521494347,10.2427816650575,10.2421397804036,10.389929730174
     501  ,11.0275995673176,10.1755995134686,10.2871045787052,10.310614941516,10.9250995236069,10.304092374618
     601  4,10.0927886814902,10.4591509715167,10.0936536706565,10.3173530034344" 670
3         "AC000032.1,NA,0.882934964982423,0.504145260865,0.154452099679535,0.896660567110898,-0.2737856732067
     101  38,0.769945717478779,"" "",3.49959860433791,2.87628077432956,2.8394423579697,3.12749696468309,3.0714
     201  5656859786,2.98886453024241,2.90854503492956,3.77843342791373,3.00662641171253,3.0575181753425,3.136
     301  1492164693,3.910495310054,3.76353231394963,3.78347226756131,3.37324474769781,3.00793812482112,2.8403
     401  253211428,2.98010892388058,3.02577464469225,3.58905886642943,3.20624831046413,2.7870332045477,2.9326
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
     501  4746758655,3.3597347585454,3.76866926319802,2.93101119879043,3.53135840399672,3.15264002310487,2.871
     601  1366073201,2.98964950473105,3.1772057983745,3.03120063031753" 661
4         "AC000068.1,NA,-0.0996281181987146,0.869775440004814,-0.020852503598597,0.960576089184265,-0.2376496
     101  73709568,0.407251586285377,"" "",4.40019141795356,4.88451598575833,4.56810822529497,4.69248715257058
     201  ,4.65395850758048,4.72674087613359,4.39799172048867,4.78046875125802,4.74905918603538,4.731036832468
     301  65,4.72278555942627,4.3795093163144,4.71196469208795,4.54790862054949,4.30569176178196,4.65602128970
     401  241,4.48212663191656,4.55089561271112,4.33883304501163,4.64619336092806,4.44328927062614,4.613753933
     501  90447,4.62317386567925,4.70414238998872,4.80217499262711,4.70933445341056,4.70221508056959,4.6246319
     601  7170885,4.33469867482062,4.85103135421018,4.57744253610555,4.50529129906363" 676
5         "AC000078.1,NA,0.0215340518223843,0.980756803285732,-0.0531845337407771,0.904438814793895,-0.7026539
     101  0054828,0.0152786938323478,"" "",4.90098949669161,5.0698371986479,5.16711270425245,4.91272428445778,
     201  4.98406380640687,4.80444615780785,5.11549599050413,4.852781827807,4.79750446679717,4.85843796788449,
     301  5.05411472288466,4.49367428390234,4.99852933285037,5.17450201449618,4.82099242088009,5.4333311837930
     401  3,4.71475776685609,4.67762474084169,4.70560996636432,4.54472613031612,4.74466752367144,4.64419480908
     501  1,4.98555645841624,4.53215777007643,4.84307659255099,4.58400466826344,5.04884384576482,5.18229448739
     601  324,5.16028354162762,4.80277828075246,4.87568022293903,5.03257893759531" 672
NOTE: 5 records were read from the infile "c:\downloads\deseqpartial.csv".
      The minimum record length was 471.

Run a step to remove the quotes and then try reading it.

While you are at it convert those NA text strings that appear to have been placed into some of the cells into something that INPUT will recognize as numeric.  Since you are reading a delimited file just REMOVE them and SAS will know because there are two adjacent delimiters that the value is MISSING.

data _null_;
  infile "c:\downloads\deseqpartial.csv" dsd;
  file "c:\downloads\deseqpartial_fixed.csv" ;
  input line :$32767.;
  line=tranwrd(cats(',',line,','),',NA,',',,');
  line=substrn(line,2,length(line)-2);
  put line ;
run;
proc import file="c:\downloads\deseqpartial_fixed.csv" dbms=csv 
  out=test replace 
;
run;

Tom_0-1668795415909.png

 

 

Nathalie1
Obsidian | Level 7

Thank you Tom, for your detailed response. It'is OK for me.

Sincerely yours,

 

Nathalie

Nathalie1
Obsidian | Level 7

Thank you very much. It' very well but it is a little long to import.

Nathalie

Tom
Super User Tom
Super User

Then don't use IMPORT.  Instead just write your own data step to read the file.

Nathalie1
Obsidian | Level 7

When I use the big file (not partial), I obtain one column with the first data of file csv.

I don't understand.

1        data _null_;
2      infile "C:\Users\771\Desktop\deseq.csv" dsd;
3      file "C:\Users\771\Desktop\deseq_fixed.csv" ;
4      input line :$32767.;
5      line=tranwrd(cats(',',line,','),',NA,',',,');
6      line=substrn(line,2,length(line)-2);
7      put line ;
8    run;

NOTE: The infile "C:\Users\771\Desktop\deseq.csv" is:
      Nom du fichier=C:\Users\771\Desktop\deseq.csv,
      RECFM=V,LRECL=32767,
      Taille de fichier (octets)=16600667,
      Modifié(e) le=22 novembre 2022 13 h 35,
      Heure de création=17 novembre 2022 15 h 25

NOTE: The file "C:\Users\771\Desktop\deseq_fixed.csv" is:
      Nom du fichier=C:\Users\771\Desktop\deseq_fixed.csv,
      RECFM=V,LRECL=32767,
      Taille de fichier (octets)=0,
      Modifié(e) le=22 novembre 2022 16 h 44,
      Heure de création=22 novembre 2022 16 h 35

NOTE: 24405 records were read from the infile "C:\Users\771\Desktop\deseq.csv".
      The minimum record length was 391.
      The maximum record length was 769.
NOTE: 24405 records were written to the file "C:\Users\771\Desktop\deseq_fixed.csv".
      The minimum record length was 3.
      The maximum record length was 16.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
      real time           0.65 seconds
      cpu time            0.65 seconds


9     proc import file="C:\Users\771\Desktop\deseq_fixed.csv" dbms=csv
10     out=Denis.sclerodeseq replace
11   ;
12   run;

13    /**********************************************************************
14    *   PRODUCT:   SAS
15    *   VERSION:   9.4
16    *   CREATOR:   External File Interface
17    *   DATE:      22NOV22
18    *   DESC:      Generated SAS Datastep Code
19    *   TEMPLATE SOURCE:  (None Specified.)
20    ***********************************************************************/
21       data DENIS.SCLERODESEQ    ;
22       %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
23       infile 'C:\Users\771\Desktop\deseq_fixed.csv' delimiter = ',' MISSOVER DSD
23 ! lrecl=32767 firstobs=2 ;
24          informat Row_names $10. ;
25          format Row_names $10. ;
26       input
27                   Row_names  $
28       ;
29       if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro
29 ! variable */
30       run;

NOTE: The infile 'C:\Users\771\Desktop\deseq_fixed.csv' is:
      Nom du fichier=C:\Users\771\Desktop\deseq_fixed.csv,
      RECFM=V,LRECL=32767,
      Taille de fichier (octets)=385046,
      Modifié(e) le=22 novembre 2022 16 h 44,
      Heure de création=22 novembre 2022 16 h 35

NOTE: 24404 records were read from the infile 'C:\Users\771\Desktop\deseq_fixed.csv'.
      The minimum record length was 3.
      The maximum record length was 16.
NOTE: The data set DENIS.SCLERODESEQ has 24404 observations and 1 variables.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
      real time           0.02 seconds
      cpu time            0.01 seconds


24404 lignes créées dans DENIS.SCLERODESEQ à partir de
C:\Users\771\Desktop\deseq_fixed.csv.



NOTE: DENIS.SCLERODESEQ data set was successfully created.
NOTE: The data set DENIS.SCLERODESEQ has 24404 observations and 1 variables.
NOTE: PROCEDURE IMPORT a utilisé (Durée totale du traitement) :
      real time           0.11 seconds
      cpu time            0.10 seconds
Tom
Super User Tom
Super User

When I use the big file (not partial), I obtain one column with the first data of file csv.

 

I suspect that the "big file" is different than the "partial" file.  Perhaps it was the process of creating the partial file that added the extra quotes?

 

LOOK at the "big file".  At least the beginning of it.

data _null_;
  infile "big file" obs=5 ;
  input;
  list;
run;

If it does not have every line quoted then there is no need to use the DSD option to remove the quoting.

You might be able to just run PROC IMPORT on it.

 

If you want to still replace the NA strings so that they don't cause PROC IMPORT to define the variables as character then you can modify that step to work like this instead.

data _null_;
  infile "C:\Users\771\Desktop\deseq.csv";
  file "C:\Users\771\Desktop\deseq_fixed.csv" ;
  input ;
  _infile_=tranwrd(cats(',',_infile_,','),',NA,',',,');
  _infile_=substrn(_infile_,2,length(_infile_)-2);
  put _infile_ ;
run;
Nathalie1
Obsidian | Level 7

Thank you very much. It is OK. The five lines are identical to others. But your last programm  works well.

I don't know how I can make this post as " Solution".

 

I have a last problem. In the second column NA is very important because after there are names longer. These are genes names.

How can I import the full name (20 letters max).

 

Sincerely yours

 

Nathalie🙂

Tom
Super User Tom
Super User

To change the selected solution you have unselect the one that is currently selected first.

Nathalie1
Obsidian | Level 7
DUSP27,NA,-0.233553503781192,0.422154870438521,0.361644983799399,0.0215161539174701,-0.495327880871102,0.00137126316457661," ",12.2206465843529,11.6743802610199,11.4276743226708,11.6690787592722,11.5716633538577,11.8680316670334,11.8828918705746,11.832500352165,11.4915866439778,11.5793991209473,11.444223668059,11.6963065497906,11.466413369076,11.4194612192898,11.4219187573663,11.7591298121901,11.5737754561301,11.4072500378671,11.681253417665,11.5520803254873,11.1999593000386,11.0854999066885,11.4936665122481,10.8805125224294,11.8175956152054,11.6032763296729,11.8508606707285,11.3919003476389,10.6906511774562,11.7176232187558,11.0838054173613,11.372044512607
DUXAP10,NA,-0.00932702773011185,0.97944303519719,-0.0144795162534574,0.872838947771714,0.0200320627356549,0.906060088572712," ",1.75702865468525,1.96019100051514,1.93579219856688,2.04445493698598,1.97317034476665,1.91768267803736,1.7597365198182,1.88173968568948,1.92934588497302,1.85494018243841,2.10823764195832,1.75812125242565,1.85910121597399,1.84385509494798,2.04471622455973,1.75686229994694,1.7585137374716,2.10991825829373,2.08658704571924,2.12662062632541,1.84113760123639,2.29282655154192,1.75294050571654,1.96899484994553,1.76184153650302,1.91322634439061,2.54851393922588,1.90862017763355,2.01366036634976,1.74680016625806,2.03587426069978,2.08588626996486
DWORF,NA,-0.263958644048431,0.612425758348956,0.410786543035799,0.0534266511353885,0.56242567941047,3.7058152984888e-05," ",13.3092116073233,13.1256966161627,12.922113489191,13.3867283662777,13.2065365568485,13.1704428722365,13.2057525264963,13.1685949075565,13.0160668959217,13.0669633564558,12.7778148917951,13.1522731237361,13.0615190047856,12.4001697522813,12.6402721473604,13.2683286173691,12.3698288491377,10.7365713065405,12.9723453530202,12.950724820269,12.6847912649622,12.2943595037735,12.7498285633345,11.9336310122164,12.9537691219219,13.2723377628727,12.139951100932,12.9642495879393,12.5561539774997,13.1488525018019,12.2126972261664,13.3204437350131
ELMSAN1,NA,-0.0180435234612499,0.948854707508336,0.027231976822705,0.843586185899498,0.0385151928121611,0.631961912824559," ",10.6645946254807,10.6651269951487,10.7341902565016,10.7504205882958,10.3732922797012,10.764329087793,10.8010085616349,10.621396681061,10.4200860769911,10.5943870751628,10.4760986832458,10.8396404296182,10.8412109992944,10.8679778103478,10.7812059768939,10.4094038363729,10.6267146342146,10.7880530438333,10.4945434862646,10.7186574618958,10.6307725012771,10.334792858453,10.639978907414,10.5680375601411,10.6805920679286,10.4251876694642,10.7180896255331,10.6451399355914,10.7353690764474,10.4878439354696,10.7096418539109,10.5392944648297
ENSG00000000003,"TSPAN6",0.219557134007105,0.345711619213423,0.341042906138812,0.026239150370821,0.464300627449591,0.00131916461042577," ",5.99885492254986,6.29375707719054,6.35295549489356,6.28499333898747,6.30275767107047,6.49956932643377,6.14537355886913,6.07944898314969,6.49096241234388,6.58621232912691,6.35964719729203,6.39867139323463,6.42021967575666,6.34672085307537,6.68113188525106,6.43060169912081,6.42994202073784,6.36437617099597,6.40102462256339,6.56947827333529,6.07794398196877,6.20198789561415,6.61765261031469,7.35418336550478,6.48376189531745,6.5406641609121,6.55775233610418,6.55908594060816,6.71303820957681,6.6053809724145,6.57475707353699,6.6324979687265
ENSG00000000005,"TNMD",0.0862823325022295,0.97952710454042,-0.127482336873838,0.0788170905062459,0.172067964239364,0.182726427624537," ",2.82309264620023,2.88054725481357,2.94480713309667,3.55861081233487,3.40934232465429,2.89521622378237,2.72937417712914,2.83973753110849,3.02805371687115,3.34124666668563,3.02960621789429,3.14050408658048,3.13852928548507,2.7128293318578,3.67560601962457,2.70461143813848,8.87133027565146,2.79605958809584,2.87475734941178,2.99908595754299,3.12834418938021,3.08876072916424,2.66927549898309,8.22143793834163,3.08754811259002,2.97953305341863,3.49337202427502,2.86943470120744,3.36798580554172,10.078190731304,2.89228388173081,3.06886783964915
ENSG00000000419,"DPM1",-0.22860463783924,0.254479497365392,-0.354870130635843,0.0110480129046693,0.4850636116691,0.00546551270801154," ",8.98893285236088,8.52921868996075,8.59350127727324,8.64167823199966,8.90585095862947,8.8198664961214,8.92574145415702,8.70741301835585,8.54637676294028,8.88705335495875,8.56257193880398,8.80393088265222,8.39207302499165,8.33420116210911,8.2994728887575,8.45768586093083,8.53175500562292,8.38822561225523,8.93797703961808,8.55862647333507,8.43378520854918,8.3751819741127,8.60793035195973,8.45057108106316,8.73779147839277,8.43951154159285,8.74208343057085,8.69692219771014,8.25410916085542,8.22195478949133,8.58683772176815,8.68982059848321

Here is an example of the column with NA and gene names (second column).

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1188 views
  • 5 likes
  • 4 in conversation