DATA Step, Macro, Functions and more

_N_ , Arrays, and Do loops

Reply
Occasional Contributor
Posts: 10

_N_ , Arrays, and Do loops

Hey everyone,

 

trying to rearrange some data using an array. I can't figure out what I'm doing wrong here, not getting any errors in my log. Only some of the observations are printing. 

 

options nonumber nodate;
data Project2;
input x1 - x6;

if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
ELSE DISTRICT= 'WALTON';

if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
else race= 'O';

if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
else gender= 'F';

array pr2(6) x1-x6;
do i= 1 to 6;
if i < 4 then learntype= 'Passive';
else leanrtype= 'Active';
if i in (1,4) then level= 'Elementary';
if i in (2,5) then level= 'Middle';
else level= 'High';
absenses= pr2(6);
end;
drop i;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850

;
proc print data=Project2;
title 'Absenteeism';
VAR district race gender learntype level absenses;
run;

Super User
Posts: 10,278

Re: _N_ , Arrays, and Do loops


@BWDAWG wrote:

Hey everyone,

 

trying to rearrange some data using an array. I can't figure out what I'm doing wrong here, not getting any errors in my log. Only some of the observations are printing. 

 

options nonumber nodate;
data Project2;
input x1 - x6;

if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
ELSE DISTRICT= 'WALTON';

if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
else race= 'O';

if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
else gender= 'F';

array pr2(6) x1-x6;
do i= 1 to 6;
if i < 4 then learntype= 'Passive';
else leanrtype= 'Active';
if i in (1,4) then level= 'Elementary';
if i in (2,5) then level= 'Middle';
else level= 'High';
absenses= pr2(6);
end;
drop i;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850

;
proc print data=Project2;
title 'Absenteeism';
VAR district race gender learntype level absenses;
run;


level will always be 'High', as you are missing a crucial else. The same happens with race, which will never be 'B'. And district will only be 'VOLUSIA' or 'WALTON'. Consider using a select() block instead of if/then/else if chains.

Are you getting less than 24 observations in your output?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 10

Re: _N_ , Arrays, and Do loops

Posted in reply to KurtBremser

Thanks for the reply @KurtBremser. So I made a couple of changes by adding else if statements, but I still am only getting 24 observations where I should have 144. problem seems to be the variables level and learntype.

options nonumber nodate;
data example33;
 input x1 - x6;
 if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
  else IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
  else IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
  ELSE DISTRICT= 'WALTON';
 
 if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
 else if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
 else race= 'O';
 if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
 else gender= 'F';
 array pr2(6) x1-x6;
 do i= 1 to 6;
  if i < 4 then learntype= 'Passive';
  else learntype= 'Active';
  if i in (1,4) then level= 'Elementary';
  else if i in (2,5) then level= 'Middle';
  else level= 'High';
 absenses= pr2(6);
  end;
 drop I;

 

Super User
Posts: 10,278

Re: _N_ , Arrays, and Do loops

Why would you expect more than 24 observations? With no explicit output statement, the data step will write once at the end of each iteration, and there will only be 24 iterations for the 24 datalines. If you want more in the resulting dataset, you have to add an output statement in the do loop.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Trusted Advisor
Posts: 1,256

Re: _N_ , Arrays, and Do loops

Hi @BWDAWG and welcome to the SAS Support Communities!

 

If I look at your code and guess what you intended to do (your description "rearrange some data" is a bit vague), my first recommendations are:

  1. Whenever an ELSE statement is preceded by more than one IF statement, put an "ELSE" in front of the IF statements as well, except for the first "IF" in the group. Otherwise, the instruction following "ELSE" (e.g. DISTRICT='WALTON') is applied to all observations not satisfying the most recent IF statement. That's why neither 'OKEECHOBEE' nor 'SEMINOLE' occurred in your output dataset Project2. (Same for the other IF-IF-...-ELSE-constructs.)
  2. Declare character variables such as DISTRICT with a LENGTH statement to avoid the risk of truncation.
  3. Avoid typos in variable names, such as leanrtype. Otherwise SAS will create unwanted additional variables with exactly these misspelled names.
  4. Rethink the purpose of the DO loop. It seems that you want to assign different values of a variable (e.g. LEARNTYPE) to different array elements. However, in fact all this occurs within the same observation of the output dataset, where a variable (such as LEVEL) can only have one value. Maybe you want to include an OUTPUT statement in the DO loop so as to create a new observation in each of the six iterations. In this case one would expect to see an array reference like pr2[i] somewhere in the DO loop. Otherwise there is no connection between variable i and the array.
Occasional Contributor
Posts: 10

Re: _N_ , Arrays, and Do loops

[ Edited ]
Posted in reply to FreelanceReinhard

@FreelanceReinhardthanks for the reply, that was a great explanation, very helpful. I fixed everything you mentioned except the length statement bc I am not seeing any truncation. However something still isn't working correctly. Now it is printing out all 144 observations, but the absenses are not correct, it seems to be assigning the last value in the datalines to all the variables? not sure if i explained that correctly. here is what i updated

 

options nonumber nodate;
data example33;
 input x1 - x6;
 if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
  else IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
  else IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
  ELSE DISTRICT= 'WALTON';
 
 if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
 else if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
 else race= 'O';
 if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
 else gender= 'F';
 array pr2(6) x1-x6;
 do i= 1 to 6;
  if i < 4 then learntype= 'Passive';
  else learntype= 'Active';
  if i in (1,4) then level= 'Elementary';
  else if i in (2,5) then level= 'Middle';
  else level= 'High';
 absenses= pr2(6);
 output;
  end;
 drop I;
Trusted Advisor
Posts: 1,256

Re: _N_ , Arrays, and Do loops


@BWDAWG wrote:

I fixed everything you mentioned except the length statement bc I am not seeing any truncation.

 

You were just lucky that no truncation occurred. Truncation would have occurred if the first value assigned to, e.g., DISTRICT, LEARNTYPE and LEVEL had not happened to be the longest value. Imagine that someone told you in a few weeks to alter the value of variable LEVEL where it was 'Middle' to 'Intermediate'. Would you then really remember that without additional changes to the code (such as a LENGTH statement) the new values would be truncated to 'Intermedia'?

Trusted Advisor
Posts: 1,256

Re: _N_ , Arrays, and Do loops

If you want the first, second, ..., sixth value, i.e., the values of x1, ..., x6, to occur in the new variable ABSENSES, pr2[i] (in place of pr2(6)) would achieve this.

Super User
Super User
Posts: 8,120

Re: _N_ , Arrays, and Do loops

[ Edited ]

Looks like the first three variables generate 4*3*2=24 combinations, one per row in the file.

And the last two variables generate 2*3=6 combinations, one per each value in a row.

 

Why not just let the DO loop take care of the values.  

Looks like the values are in this order.  But perhaps you will need to reorder the DO statements to match the pattern used to create the source data file.

data Project2;
  length district $20 race $1 gender $1 learntype $10 level $10;
  do district='OKEECHOBEE','SEMINOLE','VOLUSIA','WALTON';
  do race = 'B','W','O';
  do gender='M','F' ;
  do learntype= 'Passive','Active';
  do level= 'Elementary','Middle','High';
    input absenses @ ;
    output;
  end;
  end;
  end;
  end;
  end;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850
;
Obs     district     race    gender    learntype    level         absenses

  1    OKEECHOBEE     B        M        Passive     Elementary       100
  2    OKEECHOBEE     B        M        Passive     Middle           300
  3    OKEECHOBEE     B        M        Passive     High            1600
  4    OKEECHOBEE     B        M        Active      Elementary       750
  5    OKEECHOBEE     B        M        Active      Middle           250
  6    OKEECHOBEE     B        M        Active      High             700
  7    OKEECHOBEE     B        F        Passive     Elementary       150

 

Ask a Question
Discussion stats
  • 8 replies
  • 124 views
  • 0 likes
  • 4 in conversation