Hello,
I try to import a file excel .csv with proc import but I obtain a table with only one column instead a table with several column.
I use this program of the SAS documentation :
proc import datafile="C:\Users\771\Desktop\deseq.csv"
out=Denis.sclero
dbms=dlm
replace;
delimiter=",";
getnames=yes;
run;
The file csv contains 24405 rows.
The first raw contains the names of 42 variables.
I use SAS 9.4 under windows 10.
Thank your for your help, I searched among similar topics but I don't obtain a solution.
Nathalie
If the NA only appears in a field that is character anyway then there is no need to preprocess the file to remove them. They will not cause PROC IMPORT to define the variable with the wrong type. You can either leave the value as NA or convert it to blanks later if you want.
If you used "Save as" from Excel, you will have semicolons as delimiters.
But one never uses PROC IMPORT for such text files. You know the structure and contents of the table, so you write the data step to read the file yourself. Do not rely on the guessing of PROC IMPORT.
Can you post the lines from the SAS log for the PROC IMPORT step? Please do not attach documents as most people will not download them. Instead just highlight the lines of text, copy them and paste them into the pop-up window you get when you push Insert Code icon in this forum's editor screen. The icon looks like the three characters < / > .
Can you post a few of the lines of the CSV file? Again open the file using a text editor and just copy a few lines and paste them into the pop-up window.
If PROC IMPORT thinks there is only one variable then you used the wrong delimiter. Don't be fooled by the extension on the filename. The creator of the file might have called the file a Comma Separated Values file, but the content might be using a different delimiter. It might not even be a text file.
Hello,
I dont' work usually with this sort of file. It's bioinformatics (NGS) and new to me.
Thanks
Here is the log/
Nom Row-names;Symbol;myopathie-ctrlFC;myopathiepadj;myosite-ctrlFC;myositepadj;necrose-ctrl FC;necrosepadj;data;SC1;SC10;SC12;SC13;SC14;SC2;SC4;SC7;SM1;SM10;SM2;SM4;SM6;SM7;SM8;SM 9;SN1;SN2;SN3;SN4;SN5;SN6; tronqué à Row_names_Symbol_myopathie_ctrlF. Problèmes détectés dans les noms fournis. Consultez le Journal. 8 /********************************************************************** 9 * PRODUCT: SAS 10 * VERSION: 9.4 11 * CREATOR: External File Interface 12 * DATE: 22NOV22 13 * DESC: Generated SAS Datastep Code 14 * TEMPLATE SOURCE: (None Specified.) 15 ***********************************************************************/ 16 data DENIS.SCLERO ; 17 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 18 infile 'C:\Users\771\Desktop\deseq.csv' delimiter = ',' MISSOVER DSD 18 ! lrecl=32767 firstobs=2 ; 19 informat Row_names_Symbol_myopathie_ctrlF $670. ; 20 format Row_names_Symbol_myopathie_ctrlF $670. ; 21 input 22 Row_names_Symbol_myopathie_ctrlF $ 23 ; 24 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro 24 ! variable */ 25 run; NOTE: The infile 'C:\Users\771\Desktop\deseq.csv' is: Nom du fichier=C:\Users\771\Desktop\deseq.csv, RECFM=V,LRECL=32767, Taille de fichier (octets)=16489909, Modifié(e) le=18 novembre 2022 18 h 53, Heure de création=17 novembre 2022 15 h 25 NOTE: 24404 records were read from the infile 'C:\Users\771\Desktop\deseq.csv'. The minimum record length was 570. The maximum record length was 763. NOTE: The data set DENIS.SCLERO has 24404 observations and 1 variables. NOTE: DATA statement a utilisé (Durée totale du traitement) : real time 0.09 seconds cpu time 0.06 seconds 24404 lignes créées dans DENIS.SCLERO à partir de C:\Users\771\Desktop\deseq.csv. NOTE: DENIS.SCLERO data set was successfully created. NOTE: The data set DENIS.SCLERO has 24404 observations and 1 variables. NOTE: PROCEDURE IMPORT a utilisé (Durée totale du traitement) : real time 0.58 seconds cpu time 0.10 seconds
And the 3 lines of csv
Row-names,"Symbol","myopathie-ctrl___log2FoldChange","myopathie-ctrl___padj","myosite-ctrl___log2FoldChange","myosite-ctrl___padj","necrose-ctrl___log2FoldChange","necrose-ctrl___padj","data","SC1","SC10","SC12","SC13","SC14","SC2","SC4","SC7","SM1","SM10","SM2","SM4","SM6","SM7","SM8","SM9","SN1","SN2","SN3","SN4","SN5","SN6","SN7","SN8","SO1","SO10","SO2","SO4","SO5","SO6","SO7","SO8" AARS,NA,-0.0758710675843131,0.861241285307165,-0.0358099824068411,0.903339794457185,-0.00200734386277129,0.99462684874415," ",10.6415505239976,10.4638914678016,10.2479485175174,10.2359483582564,10.4116592260173,10.3058066413489,10.4252465769884,10.4394331728511,10.3594297000236,10.3464511680559,10.2436473092891,10.5534213132843,10.3651261650145,10.4227791773922,10.0901296650663,10.3708921047553,10.3587402169133,10.188232755801,10.3138521494347,10.2427816650575,10.2421397804036,10.389929730174,11.0275995673176,10.1755995134686,10.2871045787052,10.310614941516,10.9250995236069,10.3040923746184,10.0927886814902,10.4591509715167,10.0936536706565,10.3173530034344 AC000032.1,NA,0.882934964982423,0.504145260865,0.154452099679535,0.896660567110898,-0.273785673206738,0.769945717478779," ",3.49959860433791,2.87628077432956,2.8394423579697,3.12749696468309,3.07145656859786,2.98886453024241,2.90854503492956,3.77843342791373,3.00662641171253,3.0575181753425,3.1361492164693,3.910495310054,3.76353231394963,3.78347226756131,3.37324474769781,3.00793812482112,2.8403253211428,2.98010892388058,3.02577464469225,3.58905886642943,3.20624831046413,2.7870332045477,2.93264746758655,3.3597347585454,3.76866926319802,2.93101119879043,3.53135840399672,3.15264002310487,2.8711366073201,2.98964950473105,3.1772057983745,3.03120063031753
Looks like something quoted every line.
1 data _null_; 2 infile "c:\downloads\deseqpartial.csv" obs=5; 3 input; 4 list; 5 run; NOTE: The infile "c:\downloads\deseqpartial.csv" is: Filename=c:\downloads\deseqpartial.csv, RECFM=V,LRECL=32767,File Size (bytes)=6517, Last Modified=18Nov2022:13:01:57, Create Time=18Nov2022:13:01:57 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 1 "Row-names,""Symbol"",""myopathie-ctrl___log2FoldChange"",""myopathie-ctrl___padj"",""myosite-ctrl__ 101 _log2FoldChange"",""myosite-ctrl___padj"",""necrose-ctrl___log2FoldChange"",""necrose-ctrl___padj"", 201 ""data"",""SC1"",""SC10"",""SC12"",""SC13"",""SC14"",""SC2"",""SC4"",""SC7"",""SM1"",""SM10"",""SM2" 301 ",""SM4"",""SM6"",""SM7"",""SM8"",""SM9"",""SN1"",""SN2"",""SN3"",""SN4"",""SN5"",""SN6"",""SN7"","" 401 SN8"",""SO1"",""SO10"",""SO2"",""SO4"",""SO5"",""SO6"",""SO7"",""SO8""" 471 2 "AARS,NA,-0.0758710675843131,0.861241285307165,-0.0358099824068411,0.903339794457185,-0.002007343862 101 77129,0.99462684874415,"" "",10.6415505239976,10.4638914678016,10.2479485175174,10.2359483582564,10. 201 4116592260173,10.3058066413489,10.4252465769884,10.4394331728511,10.3594297000236,10.3464511680559,1 301 0.2436473092891,10.5534213132843,10.3651261650145,10.4227791773922,10.0901296650663,10.3708921047553 401 ,10.3587402169133,10.188232755801,10.3138521494347,10.2427816650575,10.2421397804036,10.389929730174 501 ,11.0275995673176,10.1755995134686,10.2871045787052,10.310614941516,10.9250995236069,10.304092374618 601 4,10.0927886814902,10.4591509715167,10.0936536706565,10.3173530034344" 670 3 "AC000032.1,NA,0.882934964982423,0.504145260865,0.154452099679535,0.896660567110898,-0.2737856732067 101 38,0.769945717478779,"" "",3.49959860433791,2.87628077432956,2.8394423579697,3.12749696468309,3.0714 201 5656859786,2.98886453024241,2.90854503492956,3.77843342791373,3.00662641171253,3.0575181753425,3.136 301 1492164693,3.910495310054,3.76353231394963,3.78347226756131,3.37324474769781,3.00793812482112,2.8403 401 253211428,2.98010892388058,3.02577464469225,3.58905886642943,3.20624831046413,2.7870332045477,2.9326 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 501 4746758655,3.3597347585454,3.76866926319802,2.93101119879043,3.53135840399672,3.15264002310487,2.871 601 1366073201,2.98964950473105,3.1772057983745,3.03120063031753" 661 4 "AC000068.1,NA,-0.0996281181987146,0.869775440004814,-0.020852503598597,0.960576089184265,-0.2376496 101 73709568,0.407251586285377,"" "",4.40019141795356,4.88451598575833,4.56810822529497,4.69248715257058 201 ,4.65395850758048,4.72674087613359,4.39799172048867,4.78046875125802,4.74905918603538,4.731036832468 301 65,4.72278555942627,4.3795093163144,4.71196469208795,4.54790862054949,4.30569176178196,4.65602128970 401 241,4.48212663191656,4.55089561271112,4.33883304501163,4.64619336092806,4.44328927062614,4.613753933 501 90447,4.62317386567925,4.70414238998872,4.80217499262711,4.70933445341056,4.70221508056959,4.6246319 601 7170885,4.33469867482062,4.85103135421018,4.57744253610555,4.50529129906363" 676 5 "AC000078.1,NA,0.0215340518223843,0.980756803285732,-0.0531845337407771,0.904438814793895,-0.7026539 101 0054828,0.0152786938323478,"" "",4.90098949669161,5.0698371986479,5.16711270425245,4.91272428445778, 201 4.98406380640687,4.80444615780785,5.11549599050413,4.852781827807,4.79750446679717,4.85843796788449, 301 5.05411472288466,4.49367428390234,4.99852933285037,5.17450201449618,4.82099242088009,5.4333311837930 401 3,4.71475776685609,4.67762474084169,4.70560996636432,4.54472613031612,4.74466752367144,4.64419480908 501 1,4.98555645841624,4.53215777007643,4.84307659255099,4.58400466826344,5.04884384576482,5.18229448739 601 324,5.16028354162762,4.80277828075246,4.87568022293903,5.03257893759531" 672 NOTE: 5 records were read from the infile "c:\downloads\deseqpartial.csv". The minimum record length was 471.
Run a step to remove the quotes and then try reading it.
While you are at it convert those NA text strings that appear to have been placed into some of the cells into something that INPUT will recognize as numeric. Since you are reading a delimited file just REMOVE them and SAS will know because there are two adjacent delimiters that the value is MISSING.
data _null_;
infile "c:\downloads\deseqpartial.csv" dsd;
file "c:\downloads\deseqpartial_fixed.csv" ;
input line :$32767.;
line=tranwrd(cats(',',line,','),',NA,',',,');
line=substrn(line,2,length(line)-2);
put line ;
run;
proc import file="c:\downloads\deseqpartial_fixed.csv" dbms=csv
out=test replace
;
run;
Thank you Tom, for your detailed response. It'is OK for me.
Sincerely yours,
Nathalie
Thank you very much. It' very well but it is a little long to import.
Nathalie
Then don't use IMPORT. Instead just write your own data step to read the file.
When I use the big file (not partial), I obtain one column with the first data of file csv.
I don't understand.
1 data _null_;
2 infile "C:\Users\771\Desktop\deseq.csv" dsd;
3 file "C:\Users\771\Desktop\deseq_fixed.csv" ;
4 input line :$32767.;
5 line=tranwrd(cats(',',line,','),',NA,',',,');
6 line=substrn(line,2,length(line)-2);
7 put line ;
8 run;
NOTE: The infile "C:\Users\771\Desktop\deseq.csv" is:
Nom du fichier=C:\Users\771\Desktop\deseq.csv,
RECFM=V,LRECL=32767,
Taille de fichier (octets)=16600667,
Modifié(e) le=22 novembre 2022 13 h 35,
Heure de création=17 novembre 2022 15 h 25
NOTE: The file "C:\Users\771\Desktop\deseq_fixed.csv" is:
Nom du fichier=C:\Users\771\Desktop\deseq_fixed.csv,
RECFM=V,LRECL=32767,
Taille de fichier (octets)=0,
Modifié(e) le=22 novembre 2022 16 h 44,
Heure de création=22 novembre 2022 16 h 35
NOTE: 24405 records were read from the infile "C:\Users\771\Desktop\deseq.csv".
The minimum record length was 391.
The maximum record length was 769.
NOTE: 24405 records were written to the file "C:\Users\771\Desktop\deseq_fixed.csv".
The minimum record length was 3.
The maximum record length was 16.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
real time 0.65 seconds
cpu time 0.65 seconds
9 proc import file="C:\Users\771\Desktop\deseq_fixed.csv" dbms=csv
10 out=Denis.sclerodeseq replace
11 ;
12 run;
13 /**********************************************************************
14 * PRODUCT: SAS
15 * VERSION: 9.4
16 * CREATOR: External File Interface
17 * DATE: 22NOV22
18 * DESC: Generated SAS Datastep Code
19 * TEMPLATE SOURCE: (None Specified.)
20 ***********************************************************************/
21 data DENIS.SCLERODESEQ ;
22 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
23 infile 'C:\Users\771\Desktop\deseq_fixed.csv' delimiter = ',' MISSOVER DSD
23 ! lrecl=32767 firstobs=2 ;
24 informat Row_names $10. ;
25 format Row_names $10. ;
26 input
27 Row_names $
28 ;
29 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro
29 ! variable */
30 run;
NOTE: The infile 'C:\Users\771\Desktop\deseq_fixed.csv' is:
Nom du fichier=C:\Users\771\Desktop\deseq_fixed.csv,
RECFM=V,LRECL=32767,
Taille de fichier (octets)=385046,
Modifié(e) le=22 novembre 2022 16 h 44,
Heure de création=22 novembre 2022 16 h 35
NOTE: 24404 records were read from the infile 'C:\Users\771\Desktop\deseq_fixed.csv'.
The minimum record length was 3.
The maximum record length was 16.
NOTE: The data set DENIS.SCLERODESEQ has 24404 observations and 1 variables.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
real time 0.02 seconds
cpu time 0.01 seconds
24404 lignes créées dans DENIS.SCLERODESEQ à partir de
C:\Users\771\Desktop\deseq_fixed.csv.
NOTE: DENIS.SCLERODESEQ data set was successfully created.
NOTE: The data set DENIS.SCLERODESEQ has 24404 observations and 1 variables.
NOTE: PROCEDURE IMPORT a utilisé (Durée totale du traitement) :
real time 0.11 seconds
cpu time 0.10 seconds
When I use the big file (not partial), I obtain one column with the first data of file csv.
I suspect that the "big file" is different than the "partial" file. Perhaps it was the process of creating the partial file that added the extra quotes?
LOOK at the "big file". At least the beginning of it.
data _null_;
infile "big file" obs=5 ;
input;
list;
run;
If it does not have every line quoted then there is no need to use the DSD option to remove the quoting.
You might be able to just run PROC IMPORT on it.
If you want to still replace the NA strings so that they don't cause PROC IMPORT to define the variables as character then you can modify that step to work like this instead.
data _null_;
infile "C:\Users\771\Desktop\deseq.csv";
file "C:\Users\771\Desktop\deseq_fixed.csv" ;
input ;
_infile_=tranwrd(cats(',',_infile_,','),',NA,',',,');
_infile_=substrn(_infile_,2,length(_infile_)-2);
put _infile_ ;
run;
Thank you very much. It is OK. The five lines are identical to others. But your last programm works well.
I don't know how I can make this post as " Solution".
I have a last problem. In the second column NA is very important because after there are names longer. These are genes names.
How can I import the full name (20 letters max).
Sincerely yours
Nathalie🙂
To change the selected solution you have unselect the one that is currently selected first.
DUSP27,NA,-0.233553503781192,0.422154870438521,0.361644983799399,0.0215161539174701,-0.495327880871102,0.00137126316457661," ",12.2206465843529,11.6743802610199,11.4276743226708,11.6690787592722,11.5716633538577,11.8680316670334,11.8828918705746,11.832500352165,11.4915866439778,11.5793991209473,11.444223668059,11.6963065497906,11.466413369076,11.4194612192898,11.4219187573663,11.7591298121901,11.5737754561301,11.4072500378671,11.681253417665,11.5520803254873,11.1999593000386,11.0854999066885,11.4936665122481,10.8805125224294,11.8175956152054,11.6032763296729,11.8508606707285,11.3919003476389,10.6906511774562,11.7176232187558,11.0838054173613,11.372044512607 DUXAP10,NA,-0.00932702773011185,0.97944303519719,-0.0144795162534574,0.872838947771714,0.0200320627356549,0.906060088572712," ",1.75702865468525,1.96019100051514,1.93579219856688,2.04445493698598,1.97317034476665,1.91768267803736,1.7597365198182,1.88173968568948,1.92934588497302,1.85494018243841,2.10823764195832,1.75812125242565,1.85910121597399,1.84385509494798,2.04471622455973,1.75686229994694,1.7585137374716,2.10991825829373,2.08658704571924,2.12662062632541,1.84113760123639,2.29282655154192,1.75294050571654,1.96899484994553,1.76184153650302,1.91322634439061,2.54851393922588,1.90862017763355,2.01366036634976,1.74680016625806,2.03587426069978,2.08588626996486 DWORF,NA,-0.263958644048431,0.612425758348956,0.410786543035799,0.0534266511353885,0.56242567941047,3.7058152984888e-05," ",13.3092116073233,13.1256966161627,12.922113489191,13.3867283662777,13.2065365568485,13.1704428722365,13.2057525264963,13.1685949075565,13.0160668959217,13.0669633564558,12.7778148917951,13.1522731237361,13.0615190047856,12.4001697522813,12.6402721473604,13.2683286173691,12.3698288491377,10.7365713065405,12.9723453530202,12.950724820269,12.6847912649622,12.2943595037735,12.7498285633345,11.9336310122164,12.9537691219219,13.2723377628727,12.139951100932,12.9642495879393,12.5561539774997,13.1488525018019,12.2126972261664,13.3204437350131 ELMSAN1,NA,-0.0180435234612499,0.948854707508336,0.027231976822705,0.843586185899498,0.0385151928121611,0.631961912824559," ",10.6645946254807,10.6651269951487,10.7341902565016,10.7504205882958,10.3732922797012,10.764329087793,10.8010085616349,10.621396681061,10.4200860769911,10.5943870751628,10.4760986832458,10.8396404296182,10.8412109992944,10.8679778103478,10.7812059768939,10.4094038363729,10.6267146342146,10.7880530438333,10.4945434862646,10.7186574618958,10.6307725012771,10.334792858453,10.639978907414,10.5680375601411,10.6805920679286,10.4251876694642,10.7180896255331,10.6451399355914,10.7353690764474,10.4878439354696,10.7096418539109,10.5392944648297 ENSG00000000003,"TSPAN6",0.219557134007105,0.345711619213423,0.341042906138812,0.026239150370821,0.464300627449591,0.00131916461042577," ",5.99885492254986,6.29375707719054,6.35295549489356,6.28499333898747,6.30275767107047,6.49956932643377,6.14537355886913,6.07944898314969,6.49096241234388,6.58621232912691,6.35964719729203,6.39867139323463,6.42021967575666,6.34672085307537,6.68113188525106,6.43060169912081,6.42994202073784,6.36437617099597,6.40102462256339,6.56947827333529,6.07794398196877,6.20198789561415,6.61765261031469,7.35418336550478,6.48376189531745,6.5406641609121,6.55775233610418,6.55908594060816,6.71303820957681,6.6053809724145,6.57475707353699,6.6324979687265 ENSG00000000005,"TNMD",0.0862823325022295,0.97952710454042,-0.127482336873838,0.0788170905062459,0.172067964239364,0.182726427624537," ",2.82309264620023,2.88054725481357,2.94480713309667,3.55861081233487,3.40934232465429,2.89521622378237,2.72937417712914,2.83973753110849,3.02805371687115,3.34124666668563,3.02960621789429,3.14050408658048,3.13852928548507,2.7128293318578,3.67560601962457,2.70461143813848,8.87133027565146,2.79605958809584,2.87475734941178,2.99908595754299,3.12834418938021,3.08876072916424,2.66927549898309,8.22143793834163,3.08754811259002,2.97953305341863,3.49337202427502,2.86943470120744,3.36798580554172,10.078190731304,2.89228388173081,3.06886783964915 ENSG00000000419,"DPM1",-0.22860463783924,0.254479497365392,-0.354870130635843,0.0110480129046693,0.4850636116691,0.00546551270801154," ",8.98893285236088,8.52921868996075,8.59350127727324,8.64167823199966,8.90585095862947,8.8198664961214,8.92574145415702,8.70741301835585,8.54637676294028,8.88705335495875,8.56257193880398,8.80393088265222,8.39207302499165,8.33420116210911,8.2994728887575,8.45768586093083,8.53175500562292,8.38822561225523,8.93797703961808,8.55862647333507,8.43378520854918,8.3751819741127,8.60793035195973,8.45057108106316,8.73779147839277,8.43951154159285,8.74208343057085,8.69692219771014,8.25410916085542,8.22195478949133,8.58683772176815,8.68982059848321
Here is an example of the column with NA and gene names (second column).
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.