BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DThorn
Calcite | Level 5

Hello everyone,

I have a question. Its a long story so I will give the short version of the situation and the long in case you need details.

Short Version. I export dataset test7 to file test7.csv. When I open A.csv in excel it shows negative values such as -1, -2, ect for some of the observations for some of the values. I then import test7.csv to dataset test8, and those negative values simply become a negative sign without a number. Any positive numbers in test7.csv stay as their positive number value. Can someone tell me why this is happening and how to retain my negative values?

Long Story: I am initally importing huge files with an unknown number of variables which I transpose into observations. However, when I do the inital import, proc import decides some of the values are character and some are numeric at it's leasuire. I need everything to be one or the other (character or numeric) for an eventual append. Anyway, when I transpose all of the values that were deemed to be numeric turn into "..." in the Output Data part of SAS Enterprise. If I export these '...' values, they export as real values (such as in test7.csv in the short version) but the negative values wont reimport.

I cannot do a infile for various reason, but namely:

1) On the initial import the files have varying numbers of variables and varying names.

2) On the test8 import, the order of the variables changes. Therefore I need the import to grab the names of the first row as variables for an eventual append.

CODE:

        data test7;

            proc export data = test7 outfile = 'C:\Users\David\Desktop\SASOUT\test7.csv' dbms = csv replace;

            run;

        proc import datafile= "C:\Users\David\Desktop\SASOUT\test7.csv" out =test8 dbms=csv replace;

        run;

        proc export data = test8 outfile = 'C:\Users\David\Desktop\SASOUT\test8.csv' dbms = csv replace;

        run;

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I was able to import the test7.csv file, with no problem (that I'm aware of at least), by using:

PROC IMPORT OUT= WORK.test7

            DATAFILE= "c:\art\test7.csv"

            DBMS=CSV REPLACE;

     GETNAMES=YES;

     DATAROW=2;

     guessingrows=1000;

RUN;

View solution in original post

10 REPLIES 10
art297
Opal | Level 21

Can you provide test7, or at least a subset of it, that results in the same behavior?

Tom
Super User Tom
Super User

You can just read the names with a data step and change the commas to spaces to create a normal SAS list of variable names.  You could use this to generate an INPUT or LENGTH statement.

Let's say you want to force every variable to be numeric.

filename code temp;

data _null_;

   infile 'test7.csv' dsd dlm=',' truncover lrecl=10000;

   input @;

   file code ;

   _infile_ = tranwrd(_infile_,',',' ');

   put 'length ' _infile_ '8 ;' ;

   put 'input ' _infile_ ';' ;

   stop;

run;

data test8 ;

   infile 'test7.csv' dsd dlm=',' truncover lrecl=10000 firstobs=2;

%inc code;

run;

DThorn
Calcite | Level 5

Thanks for the feedback Art and Tom. I appreciate it.

Art,

I am attaching files for you to see the output. The files I am attaching are:

FirstOperation3 - The SAS Code (You will note I am not a clean coder, though I do use comments. My code is a random sample of things I found that worked and not elegant).

Test7.csv - The output with the ok negative numbers.

InitialAppend.csv This is imported early in the code to make the base file to append data to. Note this goes in a SASOUT Folder

Assassins_Creed.csv this is the data. Note this goes in a SASCODING folder.

Note you may need to change the directories to see all of the test outputs.

The long long story: This is part of my PhD research. I, and three coders I have hired, have coded every sentence of reviews of video games to better understand the predictors of product quality. Not every sentence gets a positive or negative number attached, only some. Not all of us have coded every game, and not all of us coded the games in the same order. As you will see, only Rebeca and myself coded Assassins_Creed.csv. The ultimate goal is to get all of the data transposed and into a huge appended file with 200+ games in it (from SASCODING folder with 200+ csv files). I hope the comments makes sense, but basically I load the initialappend.csv to dataset appendlist to get the correct variables in the appended file (I dont mind changing them all to character). I then create a macro to cycle through all of the files in the SASCODING folder to loop through each game's file. The rest (before test8) is trying to get the data in a manageable format for an append.  Variable names:

_NAME_ = Name of Sentence # of the Review. Also has data on type of coding.

David, Rebeca, Victoria, Coder4 = All Coders

NEWNAME = Game Name

Type = Type of Code (there were initially four types, but type 4 is NA's I drop it)

Tom,

Thanks for your insights. I am not skilled enough to follow all of your code, but if I wanted to make everything character would I do  input $; instead of  input @;?

I appreciate all of the help immensly.

-David

Tom
Super User Tom
Super User

I looked at the file "Assassins_Creed.csv" . You have 732 (!!!!) columns. The second line with the question values is over 100,000 characters long. The file is using UTF-8, but there is some strange character in there that SAS cannot transcode into English.  Perhaps it is the strange symbol that I see at the end of the second line?

Not sure that there is going to be any easy way to deal with this format.  There appear to be three types of rows in that file.  The first has (almost) variable names.  The second is a mix of variable names (or perhaps labels) for the first 12 columns and then information about the question asked for the rest of the columns. The string here looks a little like it also might have other information about the question as a number of the values appear to be multiple values separated by hyphens.

Then the remaining rows appears to have the actual data.  The first 12 columns could perhaps be read using defined variable types. The rest seem to have coded values (even though names end in 'TEXT'). 

The FREQ Procedure

                                     Cumulative    Cumulative

response    Frequency     Percent     Frequency      Percent

-------------------------------------------------------------

-1                43       20.28            43        20.28

-2                10        4.72            53        25.00

0                 22       10.38            75        35.38

1                 39       18.40           114        53.77

2                 16        7.55           130        61.32

3                  1        0.47           131        61.79

N/A               81       38.21           212       100.00

Here is an attempt to read this file into three datasets in a vertical format.

*-----------------------------------------------------------------------------;

* Read in response file ;

*-----------------------------------------------------------------------------;

%let fname=c:\downloads\transpose\Assassins_Creed.csv;

options compress=yes;

*-----------------------------------------------------------------------------;

* Get the variable names from first row. Clean up to valid SAS variable names ;

*-----------------------------------------------------------------------------;

data names ;

  infile "&fname" dsd dlm=',' truncover lrecl=500000 obs=1;

  length col 8 name $32 label $256;

  do col=1 by 1 until(name=' ');

     input label @;

     name = translate(label,'_','#');

     if name ne ' ' then output;

  end;

  call symputx('NCOL',col-1);

run;

%put Found &ncol column names;

*-----------------------------------------------------------------------------;

* Get the question names from second row. ;

*-----------------------------------------------------------------------------;

data questions ;

  infile "&fname" dsd dlm=',' truncover lrecl=500000 obs=2 firstobs=2;

  length col 8 question $1000 ;

  do col=1 to &ncol;

     input question @;

     if index(question,'Click to write the question text') then

       question=left(substr(question,index(question,':')+1))

     ;

     if question ne ' ' then output;

  end;

run;

/*

proc report data=questions nofs split='FF'x nocenter headline missing spacing=1;

column col question;

define col / order ;

define question / display flow width=75 ;

run;

*/

*-----------------------------------------------------------------------------;

* Get the response from the third row and forward ;

*-----------------------------------------------------------------------------;

data responses ;

  infile "&fname" dsd dlm=',' truncover lrecl=500000 firstobs=3;

  length row col 8 response $1000 ;

  row+1;

  do col=1 to &ncol;

     input response @;

     if response ne ' ' then output;

  end;

run;

DThorn
Calcite | Level 5

Tom,

Thanks for the help. I ran your code and I am not sure if I can use the data in any of those formats. It becomes impossible to parse out what type of sentence is which.

You are correct, the first two rows of the data file are both semi variable names. That is how the online survey tool outputs. The top row is generally the survey tools variable number for each sentence. The second row is generally the sentence and some other info.

I may have found something interesting regarding the importing of only negative signs. I tried exporting the test7 dataset as a txt file. Then I open the text file, the data looks good. However, when I import the test7.txt using DBMS=CSV the negative numbers disappear and leave me with only negative signs. However, if I dont use the DBMS = CSV and leave it blank, it will import the data as a -1, -2 or whatever, but the data is not separated into rows.

I think the DBMS =CSV is doing something strange to those negative numbers. It is strange it would treat them differently since txt files have no internal formatting for it to look at differently.

DThorn
Calcite | Level 5

Can anyone take these CSV or TXT files (both from exporting the test7 dataset as a DBMS = csv, but changing the file name extension) and using the proc import function, import the file with:

1) The negative values being retained for 'David' and 'Rebeca'

2) Having the import sort the columns appropriately via the commas.

I cant believe that a .txt file, delimited by commas, wont import negative values and changes them all to negative signs. I can either get the negative values in a dataset if I dont use the DBMS =CSV, but I lose the sorting into the appropriate variables. Or I can have it sort variables via the commas, but lose the negative values.

Thanks everyone, kinda losing my mind over such a simple issue. If you can verify that you can or cannot import the csv or text file, and post the code, I would appreciate it. I'll let you know if it works for me.

Tom
Super User Tom
Super User

Most likely this is being caused by PROC IMPORT not being able to detect the type and length of the fields. Look up the quessing rows option.

I could not even get PROC IMPORT to read your Assassins_Creed.csv file.  Most likely because of the UTF-8 warnings that I was getting before.  The UTF-8 issues might also be causing the trouble with the negative sign.  Perhaps the character that is actually in the file is something other than a hyphen and it just looks to your eye like a negative sign?  If so then SAS will not know how to interpret that character.

2839  filename xx "&fname" encoding='utf-8';

2840  PROC IMPORT OUT= WORK.test1

2841              DATAFILE= xx

2842              DBMS=CSV REPLACE;

2843       GETNAMES=NO;

2844       DATAROW=3;

2845  RUN;

ERROR: Invalid string.

ERROR: Invalid string.

ERROR: Import unsuccessful.  See SAS Log for details.

NOTE: The SAS System stopped processing this step because of errors.

That is why it is easier for you to just write a DATA step to read these files.  You can use the names pulled from the first line to generate the code yourself.

proc sql noprint ;

  select name into :keys separated by ' ' from names where col <=12 ;

  select name into :questions separated by ' ' from names where col > 12;

quit;

data results ;

  infile "&fname" dsd dlm=',' truncover firstobs=3 lrecl=500000;

  length &keys $100 &questions $4 ;

  input &keys &questions ;

run;

art297
Opal | Level 21

I was able to import the test7.csv file, with no problem (that I'm aware of at least), by using:

PROC IMPORT OUT= WORK.test7

            DATAFILE= "c:\art\test7.csv"

            DBMS=CSV REPLACE;

     GETNAMES=YES;

     DATAROW=2;

     guessingrows=1000;

RUN;

DThorn
Calcite | Level 5

Art: Your code worked! I dont know why mine didnt but it now imports the negative values as negative! I was using the following import statement, but I suppose their your GETNAMES, DATAROWS, or GuessingRows arguments helped!

        proc import datafile= "C:\Users\David\Desktop\SASOUT\test7.csv" out =test8  dbms= csv replace;

        run;

Tom: Thanks for the code. I may use it in the future if I continue to have problems but right now I dont want to mess with anything since it seems to be working.

Thank you both for your help over the last 2 days. I have a long day ahead of me to get my results prepared for my meeting with my adviser tomorrow morning. I appreciate all of the help!

Ksharp
Super User

David.

It is good that Art has already supplied a solution for your.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3645 views
  • 5 likes
  • 4 in conversation