Solved: Creating a data set with multiple identical observations using the dat...

Simon80 · Posted 06-12-2013 08:30 PM

I'm trying to create a date set by entering information using the datelines statement in a data step. For instance, I have multiple observations with the same value. I was wondering if there was a shorthand way of entering these values for multiple observations without having to type them one by one.

Example:

data students;

input Age;

datalines;

20

and so on...

;

Is there a function or an operator that I can use to generate repeated values. I know certain programming languages you can take a value and add the product sign (*) and a number to do this sort of operation. Is there something similar in SAS? Thanks for your response and help!

PGStats · Posted 06-12-2013 08:57 PM

Requires a little programming :

data students(keep=age);
infile datalines missover;
input Age repeat;
do i = 1 to coalesce(repeat, 1);
output;
end;
datalines;
20 4
10
15 2
55
;

proc print; run;

But don't forget that most data analysis procedures allow a FREQ statement that names a variable containing observation frequencies. Thus, you could use :

data students(keep=age repeat);
infile datalines missover;
input Age repeat;
repeat = coalesce(repeat,1);

datalines;
20 4
10
15 2
55
;

proc univariate data=students; var age; freq repeat; run;

PG

View solution in original post

PGStats · Posted 06-12-2013 08:57 PM

Requires a little programming :

data students(keep=age);
infile datalines missover;
input Age repeat;
do i = 1 to coalesce(repeat, 1);
output;
end;
datalines;
20 4
10
15 2
55
;

proc print; run;

But don't forget that most data analysis procedures allow a FREQ statement that names a variable containing observation frequencies. Thus, you could use :

data students(keep=age repeat);
infile datalines missover;
input Age repeat;
repeat = coalesce(repeat,1);

datalines;
20 4
10
15 2
55
;

proc univariate data=students; var age; freq repeat; run;

PG

Simon80 · Posted 06-12-2013 11:20 PM

PGStats, thank you for your quick response! I tried the second method and it works beautifully! I also tried a very minimal data step, shown below, based on yours that also seems to work. Would you mind explaining a few things to me as I'm relatively new to SAS. In your code, you have the infile statement (infile datalines missover). What is this for? Why did you use the coalesce function? I know it returns the first non-missing value but I don't understand what its function is in your data step. Thanks for your help!

My simplified code:

data students(keep=age repeat);

input Age repeat;

datalines;
20 4
10
15 2
55
;

proc univariate data=students; var age; freq repeat; run;

PGStats · Posted 06-13-2013 10:18 AM

When you run that code, you get the following error messages:

NOTE: LOST CARD.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7---

10 ;

Age=55 repeat=. _ERROR_=1 _N_=3

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.STUDENTS has 2 observations and 2 variables.

The reason for this is that the default behaviour when SAS reads data is to go to the next line to read the remaining input variables. The first statement (infile datalines missover) tells SAS to set the remaining variables to missing when the end of line is reached. The use of coalesce function says that when repeat is missing, take it as meaning 1.

PG

Simon80 · Posted 06-13-2013 11:19 AM

Thanks again PGStats! I think I understand now.

ballardw · Posted 06-13-2013 11:14 AM

Depending on the analysis you're going to do, I suggest investigating in adding the count variable to every value, not creating multiple records and use the WEIGHT option in analysis with the count as the weight variable.

Simon80 · Posted 06-13-2013 11:24 AM

ballardw, thanks for your input! I haven't seen the WEIGHT option before. Can you please tell me what it does? Thank you!

ballardw · Posted 06-13-2013 11:40 AM

Actually more properly with your data the FREQ option might be better which is available in many procs, says to use the count variable and treat that record as representing N records.

For example with proc univariate add a statement

Freq countvariablename;

Weights are similar but need not be integers and affect calculations of some statistics a bit differently.

Simon80 · Posted 06-14-2013 12:50 PM

Thanks ballardw!

Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Re: Creating a data set with multiple identical observations using the datalines statement

Classroom Training Available!