DATA Step, Macro, Functions and more

Need help with DO loop

Reply
Contributor
Posts: 43

Need help with DO loop

Hi,

 

I need help with subsetting a data using ARRAY and WHERE function

DATA Inpatient_2013_new;
	SET Inpatient_2013;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
	DO i = 1 to 12;
		WHERE ICD {12} IN ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519");
RUN;

In the above code, I'm trying to use array and where function to subset a huge data set. I want to look at 25 variables out of 250 for some values of interest and subset those matching value rows with all 250 variables in a new data set. When I run this code I'm getting the following error

ERROR: Syntax error while parsing WHERE clause.
ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, -, /, <,
              <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, GE, GT, IN, IS, LE, LIKE, LT, NE,
              NOT, NOTIN, OR, ^, ^=, |, ||, ~, ~=.

ERROR 200-322: The symbol is not recognized and will be ignored.

429  RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.INPATIENT_2013_NEW may be incomplete.  When this step was stopped
         there were 0 observations and 257 variables.
WARNING: Data set WORK.INPATIENT_2013_NEW was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.06 seconds

Can some one help me with this problem?

 

Thanks!

Super User
Posts: 19,870

Re: Need help with DO loop

Posted in reply to Sujithpeta

1. Do loop doesn't have matching END

2. Incorrect usage of WHERE - you'll need to create a variable and then drop the records you're not interested in.

3. When looping through array variables, you usually want to use the counter as your array index - not 12 in this case. 

 

See the code below.

You may be able to simplify this using WHICHC. 

 

DATA Inpatient_2013_new;
	SET Inpatient_2013;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
keep_record = 0;
	DO i = 1 to 12;
		if ICD (i) IN ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519") then keep_record=1;
END;

if keep_record=1;
RUN;
Contributor
Posts: 43

Re: Need help with DO loop

Hello Reeza,

 

I tried but I'm getting errors

 

DATA Inpatient_2013_new;
	SET Inpatient_2013;
	ARRAY ICD {12} ICD_DGNS_CD1 $ ICD_DGNS_CD2 $ ICD_DGNS_CD3 $ ICD_DGNS_CD4 $ ICD_DGNS_CD5 $ ICD_DGNS_CD6 $ ICD_DGNS_CD7 $ ICD_DGNS_CD8 $ ICD_DGNS_CD9 $ ICD_DGNS_CD10 $ ICD_DGNS_CD11 $ ICD_DGNS_CD12 $;
Keep_record = 0;
	DO i = 1 to 12;
		IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519");
	END;
IF keep_record = 1;
RUN;

I'm getting following error!!

500  DATA Inpatient_2013_new;
501      SET Inpatient_2013;
502      ARRAY ICD {12} ICD_DGNS_CD1 $ ICD_DGNS_CD2 $ ICD_DGNS_CD3 $ ICD_DGNS_CD4 $ ICD_DGNS_CD5 $
                                     -              -              -              -              -
-
-
-
-
-
-
-
                                     22             22             22             22
22
22
22
22
22
22
22
22
                                     200            200            200            200
200
200
200
200
200
200
200
200
502!  ICD_DGNS_CD6 $ ICD_DGNS_CD7 $ ICD_DGNS_CD8 $ ICD_DGNS_CD9 $ ICD_DGNS_CD10 $ ICD_DGNS_CD11 $
502! ICD_DGNS_CD12 $;
ERROR 22-322: Syntax error, expecting one of the following: a name, (, -, :, ;, _ALL_,
              _CHARACTER_, _CHAR_, _NUMERIC_.

ERROR 200-322: The symbol is not recognized and will be ignored.

503  Keep_record = 0;
504      DO i = 1 to 12;
505          IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181",
505! "45111", "45119", "41511", "41512", "41513", "41519");
506      END;
507  IF keep_record = 1;
508  RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.INPATIENT_2013_NEW may be incomplete.  When this step was stopped
         there were 0 observations and 258 variables.
WARNING: Data set WORK.INPATIENT_2013_NEW was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.08 seconds
      cpu time            0.09 seconds
Super User
Super User
Posts: 7,077

Re: Need help with DO loop

[ Edited ]
Posted in reply to Sujithpeta

You cannot put those dollar signs into the middle of your ARRAY statement.

 

Also save yourself some typing by using a variable list.  Also you don't need to tell SAS how variables are in an array when you are listing the variables.  SAS can count them for itself.

 

ARRAY ICD ICD_DGNS_CD1-ICD_DGNS_CD12;

 

Contributor
Posts: 43

Re: Need help with DO loop

[ Edited ]

1. I had to write manually because ICD_DGNS_CD1 to 12 aren't in a order in the dataset, when I say 

ICD_DGNS_CD1-ICD_DGNS_CD12

 It includes other variables in between this range.

 

2. I kept $ because those variables are charachters, now I tried without $ sign, code ran successfuly but 0 observations are in the output

 

DATA Inpatient_2013_new;
	SET Inpatient_2013;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
Keep_record = 0;
	DO i = 1 to 12;
		IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519");
	END;
IF keep_record = 1;
RUN;

 

3. Can you please explain me the importance of Keep_record in the code?

 

 

509 DATA Inpatient_2013_new;
510 SET Inpatient_2013;
511 ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5
511! ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12
511! ;
512 Keep_record = 0;
513 DO i = 1 to 12;
514 IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181",
514! "45111", "45119", "41511", "41512", "41513", "41519");
515 END;
516 IF keep_record = 1;
517 RUN;
NOTE: There were 582422 observations read from the data set WORK.INPATIENT_2013.
NOTE: The data set WORK.INPATIENT_2013_NEW has 0 observations and 258 variables.
NOTE: DATA statement used (Total process time):
real time 1.15 seconds
cpu time 1.09 seconds

 

 

Thank you!!

 

 

 

Super User
Super User
Posts: 7,077

Re: Need help with DO loop

Posted in reply to Sujithpeta

A variable ilist of the format A1-A5 is asking for the sequence of variables formed by changing the numeric suffix of the name. So the position of the variables in tha current data vector , or even their existance, does not matter. To make a variable list based on position you need to use two hyphens lke NAME -- HEIGHT 

 

When you are referencing existing variables in an ARRAY statement you don't need to tell SAS wheher they are numeric or character, SAS already knows how the variables are defined and it is too late to change the variables definitions.  

 

But if you did want to use an ARRAY statement to define new character variables then can indication that by placing the type/length declaration BEFORE the variable list.  Such as

array newvar (12) $5 ;

Which would define 12 variables named NEWVAR1-NEWVAR12 where each one is character of length 5.

 

I assume you are trying to check if any of the variables match any of the values in your list?  If so then set the condition FALSE be starting to look and set it to TRUE when you find one.  Then you can decide whether you want to keep based on the condition flag variable.

 

data want;
  set have;
  array codes icd_dgns_cd1-icd_dgns_cd12;
  has_condition = 0;
  do i = 1 to dim(codes)  ;
    if codes(i) in ("45341" "45340" "45342" "45350" "45351" "45352"
                    "45181" "45111" "45119" "41511" "41512" "41513" "41519")
    then has_condition=1;
  end;
  if has_condition then output ;
run;

 

Contributor
Posts: 43

How to subset a large data set by looping through selected Variables of particular value of Interest

Hello,

 

I've 257 variables and millions of rows, my interest is to extract the rows that match the value of my interest in 12 variables out of 257.

But I want the extracted rows to have all 250 variables data. Is there any function that would do this? I tried using ARRAY and DO loop together but I'm getting frustrated with the errors. Can someone help me out? Thanks

 

DATA Inpatient_2013_new;
SET Inpatient_2013; *Inpatient_2013 contains 257 variables and millions of rows;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
	DO i = 1 to 12;
		IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519"); *If any of the variables have the following values, I want the entire row with all 250 varible info;
	END;
RUN;

  I'm getting the following error

563  DATA Inpatient_2013_new;
564  SET Inpatient_2013; *Inpatient_2013 contains 250 variables and millions of rows;
565      ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5
565! ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12
565! ;
566      DO i = 1 to 12;
567          IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181",
567! "45111", "45119", "41511", "41512", "41513", "41519"); *If any of the variables have the
567! following values, I want the entire row with all 250 varible info;
568      END;
569  RUN;

NOTE: There were 582422 observations read from the data set WORK.INPATIENT_2013.
NOTE: The data set WORK.INPATIENT_2013_NEW has 0 observations and 257 variables.
NOTE: DATA statement used (Total process time):
      real time           1.08 seconds
      cpu time            1.07 seconds
Super User
Posts: 10,046

Re: How to subset a large data set by looping through selected Variables of particular value of Inte

Posted in reply to Sujithpeta

DATA Inpatient_2013_new;
SET Inpatient_2013; *Inpatient_2013 contains 257 variables and millions of rows;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
	DO i = 1 to 12;
		IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519") then do; output;leave;end;
	END;
RUN;

Super User
Super User
Posts: 7,077

Re: How to subset a large data set by looping through selected Variables of particular value of Inte

@Ksharp That could potentially output the same row multiple times.

 

Answer is at the other version of this question.

https://communities.sas.com/t5/Base-SAS-Programming/Need-help-with-DO-loop/m-p/299440#M63154

 

Super User
Posts: 19,870

Re: How to subset a large data set by looping through selected Variables of particular value of Inte

@Tom I think the LEAVE prevents that? 

 

@Sujithpeta Please don't post the same questions multiple times. 

Super User
Super User
Posts: 7,077

Re: How to subset a large data set by looping through selected Variables of particular value of Inte

@Ksharp  Didn't see the LEAVE command.

Super User
Posts: 10,046

Re: How to subset a large data set by looping through selected Variables of particular value of Inte

@Tom , Never Mind. I respect everything you did for this forum just as data _null_(John King). You both are really similar .
Contributor
Posts: 43

Re: Need help with DO loop

I ran a sample code like this

 

DATA Inpatient_2011_new2;
	SET Inpatient_2011;
	KEEP ICD_DGNS_CD1 -- ICD_DGNS_CD12;
RUN;

Instead of giving 12 variables it included other variables in between those 12 variables

 

 

676  DATA Inpatient_2011_new2;
677      SET Inpatient_2011;
678      KEEP ICD_DGNS_CD1 -- ICD_DGNS_CD12;
679  RUN;

NOTE: There were 5000 observations read from the data set WORK.INPATIENT_2011.
NOTE: The data set WORK.INPATIENT_2011_NEW2 has 5000 observations and 34 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 second

 

 --Sujith

Super User
Posts: 19,870

Re: Need help with DO loop

Posted in reply to Sujithpeta

@Sujithpeta That is correct behaviour. 2 dashes include all variables between the two. A single dash includes only variables with a consecutive suffix. 

Contributor
Posts: 43

Re: Need help with DO loop

Posted in reply to Sujithpeta

Thanks to all of you guys for helping me out

Ask a Question
Discussion stats
  • 14 replies
  • 832 views
  • 6 likes
  • 4 in conversation