Help using Base SAS procedures

Creating a data set with multiple identical observations using the datalines statement

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Creating a data set with multiple identical observations using the datalines statement

I'm trying to create a date set by entering information using the datelines statement in a data step.  For instance, I have multiple observations with the same value. I was wondering if there was a shorthand way of entering these values for multiple observations without having to type them one by one.

Example:

data students;

     input Age;

     datalines;

     20

     20

     20

     20

     and so on...

     ;

Is there a function or an operator that I can use to generate repeated values.  I know certain programming languages you can take a value and add the product sign  (*) and a number to do this sort of operation.  Is there something similar in SAS?  Thanks for your response and help!


Accepted Solutions
Solution
‎06-12-2013 08:57 PM
Respected Advisor
Posts: 4,932

Re: Creating a data set with multiple identical observations using the datalines statement

Requires a little programming :

data students(keep=age);
infile datalines missover;
input Age repeat;
do i = 1 to coalesce(repeat, 1);
     output;
     end;
datalines;
20 4
10
15 2
55
;

proc print; run;

But don't forget that most data analysis procedures allow a FREQ statement that names a variable containing observation frequencies. Thus, you could use :

data students(keep=age repeat);
infile datalines missover;
input Age repeat;
repeat = coalesce(repeat,1);

datalines;
20 4
10
15 2
55
;

proc univariate data=students; var age; freq repeat; run;

PG

PG

View solution in original post


All Replies
Solution
‎06-12-2013 08:57 PM
Respected Advisor
Posts: 4,932

Re: Creating a data set with multiple identical observations using the datalines statement

Requires a little programming :

data students(keep=age);
infile datalines missover;
input Age repeat;
do i = 1 to coalesce(repeat, 1);
     output;
     end;
datalines;
20 4
10
15 2
55
;

proc print; run;

But don't forget that most data analysis procedures allow a FREQ statement that names a variable containing observation frequencies. Thus, you could use :

data students(keep=age repeat);
infile datalines missover;
input Age repeat;
repeat = coalesce(repeat,1);

datalines;
20 4
10
15 2
55
;

proc univariate data=students; var age; freq repeat; run;

PG

PG
Occasional Contributor
Posts: 18

Re: Creating a data set with multiple identical observations using the datalines statement

PGStats, thank you for your quick response!  I tried the second method and it works beautifully!  I also tried a very minimal data step, shown below, based on yours that also seems to work. Would you mind explaining a few things to me as I'm relatively new to SAS. In your code, you have the infile statement (infile datalines missover).  What is this for? Why did you use the coalesce function? I know it returns the first non-missing value but I don't understand what its function is in your data step.  Thanks for your help!

My simplified code:

data students(keep=age repeat);

input Age repeat;

datalines;
20 4
10
15 2
55
;

proc univariate data=students; var age; freq repeat; run;

Respected Advisor
Posts: 4,932

Re: Creating a data set with multiple identical observations using the datalines statement

When you run that code, you get the following error messages:

NOTE: LOST CARD.

RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7---

10         ;

Age=55 repeat=. _ERROR_=1 _N_=3

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.STUDENTS has 2 observations and 2 variables.

The reason for this is that the default behaviour when SAS reads data is to go to the next line to read the remaining input variables. The first statement  (infile datalines missover) tells SAS to set the remaining variables to missing when the end of line is reached. The use of coalesce function says that when repeat is missing, take it as meaning 1.

PG

PG
Occasional Contributor
Posts: 18

Re: Creating a data set with multiple identical observations using the datalines statement

Thanks again PGStats! Smiley Happy  I think I understand now. 

Super User
Posts: 11,343

Re: Creating a data set with multiple identical observations using the datalines statement

Depending on the analysis you're going to do, I suggest investigating in adding the count variable to every value, not creating multiple records and use the WEIGHT option in analysis with the count as the weight variable.

Occasional Contributor
Posts: 18

Re: Creating a data set with multiple identical observations using the datalines statement

ballardw, thanks for your input! I haven't seen the WEIGHT option before. Can you please tell me what it does? Thank you!

Super User
Posts: 11,343

Re: Creating a data set with multiple identical observations using the datalines statement

Actually more properly with your data the FREQ option might be better which is available in many procs, says to use the count variable and treat that record as representing N records.

For example with proc univariate add a statement

Freq countvariablename;

Weights are similar but need not be integers and affect calculations of some statistics a bit differently.

Occasional Contributor
Posts: 18

Re: Creating a data set with multiple identical observations using the datalines statement

Thanks ballardw!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 905 views
  • 4 likes
  • 3 in conversation