Re: How to input large dataset without typing as datalines?

wkm21 · Posted 03-29-2021 08:09 PM

Hi

I have a large raw datasets (50 variables and 80 individuals). I need to evaluate all of the variables and individuals but I do not want to manually type after the datalines command. Can you please tell me how to more efficiently handle this. For now I have run this command to identify my data:

/* Generated Code (IMPORT) */
/* Source File: ParkNGend.csv */
/* Source Path: /home/wkmustahs210/sasuser.v94 */
/* Code generated on: 3/28/21, 4:12 PM */

%web_drop_table(PARKDAT);


FILENAME REFFILE '/home/wkmustahs210/sasuser.v94/ParkNGendSAS.csv';

PROC IMPORT DATAFILE=REFFILE
	DBMS=CSV
	OUT=PARKDAT;
	GETNAMES=YES;
RUN;

PROC CONTENTS DATA=PARKDAT; RUN;


%web_open_table(PARKDAT);

I want to be able to further adjust my data kind of like it's done with this iris data:

Data Iris;
   Input sepallen sepalwid petallen petalwid species @@;
   Format species specname.;
   Label sepallen='Sepal Length in mm.'
         sepalwid='Sepal Width in mm.'
         petallen='Petal Length in mm.'
         petalwid='Petal Width in mm.';
Datalines;
50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3
63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2
59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2
65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3
68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3
77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3
49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2
64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3
55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1
49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1
67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1
77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2
50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1
61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1
61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1
51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1
51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1
46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1
50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3
57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1
71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3
49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1
49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1
66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1
44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2
47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2
74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1
56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3
49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1
56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2
51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3
54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3
61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3
68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1
45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1
55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1
51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2
63 33 60 25 3 53 37 15 02 1
;

Proc Sort Data=Iris;
 By Species;
Run;

Even here this is alot of information for this data set.

Since I am new to SAS I don't know how insert the data in datalines in pretty fashion as it's done with the iris data.

Please give some adive

-Thanks Rose

qoit · Posted 03-29-2021 08:37 PM

Apologies if I have missed the point, but why do you want to use the DATALINES when you have already imported the dataset via the IMPORT procedure? Once the dataset has been imported, use the DATASETS procedure to add the variable label/formats:

PROC DATASETS LIB=WORK;
MODIFY PARKDAT;
FORMAT variable1 format-name
variable2 format-name;
LABEL variable-name = 'Label for the variable';
RUN;
QUIT;

wkm21 · Posted 03-29-2021 09:05 PM

Hi
In the iris example there is an @@ function after the input command, can I still use it in this case?

qoit · Posted 03-29-2021 09:14 PM

I have not seen your dataset so I am unsure but @@ essentially means when each input line contains values for several observations. Check the below link if you want to move your data into a DATA Step by Mark Jordan (SAS Jedi):

https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/

japelin · Posted 03-29-2021 10:41 PM

I also think you can use PROC IMPORT, but if you need to do it in the DATA step, you can do that too.

If your csv data is delimited with comma, you can use infile with dsd option.

like this.

Data Iris;
   Infile datalines dsd;
   Input sepallen sepalwid petallen petalwid species @@;
   Format species specname.;
   Label sepallen='Sepal Length in mm.'
         sepalwid='Sepal Width in mm.'
         petallen='Petal Length in mm.'
         petalwid='Petal Width in mm.';
Datalines;
/* copy and paste raw data like below. 	*/
50,33,14,02,1,64,28,56,22,3,65,28,46,15,2,67,31,56,24,3
...
;
run;

If your data is delimited with space like you post, no infile statement needed.

Tom · Posted 03-29-2021 11:26 PM

The example data step you show for reading in the IRIS data is use DATALINES (aka in-line data) because that is how they decided to do it. Most likely because it is easier than having to distribute two files, one with the program and one with the data.

But it looks like you already have a text file with the data, so you can write a data step that does not need to include in-line data since you can have the data step read from the text file.

If you know the structure of your CSV file just write your own data step to read it.

data PARKDAT;
  infile  '/home/wkmustahs210/sasuser.v94/ParkNGendSAS.csv'
    dsd firstobs=2 truncover
  ;
  input .... ;
run;

If you don't know what are the names to use for your 50 variables just open the CSV file in any text editor (such as the SAS program editor) and copy the first line. If you cannot tell if the variables are intended to be numbers or character strings from the names then look at some of the 80 lines of data and figure it out. It will not take you very long and you will definitely do a better job of figuring out how to define the variables than PROC IMPORT could even hope to do.

How to input large dataset without typing as datalines?