BookmarkSubscribeRSS Feed
Sujithpeta
Quartz | Level 8

Hi,

 

I need help with subsetting a data using ARRAY and WHERE function

DATA Inpatient_2013_new;
	SET Inpatient_2013;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
	DO i = 1 to 12;
		WHERE ICD {12} IN ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519");
RUN;

In the above code, I'm trying to use array and where function to subset a huge data set. I want to look at 25 variables out of 250 for some values of interest and subset those matching value rows with all 250 variables in a new data set. When I run this code I'm getting the following error

ERROR: Syntax error while parsing WHERE clause.
ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, -, /, <,
              <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, GE, GT, IN, IS, LE, LIKE, LT, NE,
              NOT, NOTIN, OR, ^, ^=, |, ||, ~, ~=.

ERROR 200-322: The symbol is not recognized and will be ignored.

429  RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.INPATIENT_2013_NEW may be incomplete.  When this step was stopped
         there were 0 observations and 257 variables.
WARNING: Data set WORK.INPATIENT_2013_NEW was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.06 seconds

Can some one help me with this problem?

 

Thanks!

14 REPLIES 14
Reeza
Super User

1. Do loop doesn't have matching END

2. Incorrect usage of WHERE - you'll need to create a variable and then drop the records you're not interested in.

3. When looping through array variables, you usually want to use the counter as your array index - not 12 in this case. 

 

See the code below.

You may be able to simplify this using WHICHC. 

 

DATA Inpatient_2013_new;
	SET Inpatient_2013;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
keep_record = 0;
	DO i = 1 to 12;
		if ICD (i) IN ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519") then keep_record=1;
END;

if keep_record=1;
RUN;
Sujithpeta
Quartz | Level 8

Hello Reeza,

 

I tried but I'm getting errors

 

DATA Inpatient_2013_new;
	SET Inpatient_2013;
	ARRAY ICD {12} ICD_DGNS_CD1 $ ICD_DGNS_CD2 $ ICD_DGNS_CD3 $ ICD_DGNS_CD4 $ ICD_DGNS_CD5 $ ICD_DGNS_CD6 $ ICD_DGNS_CD7 $ ICD_DGNS_CD8 $ ICD_DGNS_CD9 $ ICD_DGNS_CD10 $ ICD_DGNS_CD11 $ ICD_DGNS_CD12 $;
Keep_record = 0;
	DO i = 1 to 12;
		IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519");
	END;
IF keep_record = 1;
RUN;

I'm getting following error!!

500  DATA Inpatient_2013_new;
501      SET Inpatient_2013;
502      ARRAY ICD {12} ICD_DGNS_CD1 $ ICD_DGNS_CD2 $ ICD_DGNS_CD3 $ ICD_DGNS_CD4 $ ICD_DGNS_CD5 $
                                     -              -              -              -              -
-
-
-
-
-
-
-
                                     22             22             22             22
22
22
22
22
22
22
22
22
                                     200            200            200            200
200
200
200
200
200
200
200
200
502!  ICD_DGNS_CD6 $ ICD_DGNS_CD7 $ ICD_DGNS_CD8 $ ICD_DGNS_CD9 $ ICD_DGNS_CD10 $ ICD_DGNS_CD11 $
502! ICD_DGNS_CD12 $;
ERROR 22-322: Syntax error, expecting one of the following: a name, (, -, :, ;, _ALL_,
              _CHARACTER_, _CHAR_, _NUMERIC_.

ERROR 200-322: The symbol is not recognized and will be ignored.

503  Keep_record = 0;
504      DO i = 1 to 12;
505          IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181",
505! "45111", "45119", "41511", "41512", "41513", "41519");
506      END;
507  IF keep_record = 1;
508  RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.INPATIENT_2013_NEW may be incomplete.  When this step was stopped
         there were 0 observations and 258 variables.
WARNING: Data set WORK.INPATIENT_2013_NEW was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.08 seconds
      cpu time            0.09 seconds
Tom
Super User Tom
Super User

You cannot put those dollar signs into the middle of your ARRAY statement.

 

Also save yourself some typing by using a variable list.  Also you don't need to tell SAS how variables are in an array when you are listing the variables.  SAS can count them for itself.

 

ARRAY ICD ICD_DGNS_CD1-ICD_DGNS_CD12;

 

Sujithpeta
Quartz | Level 8

1. I had to write manually because ICD_DGNS_CD1 to 12 aren't in a order in the dataset, when I say 

ICD_DGNS_CD1-ICD_DGNS_CD12

 It includes other variables in between this range.

 

2. I kept $ because those variables are charachters, now I tried without $ sign, code ran successfuly but 0 observations are in the output

 

DATA Inpatient_2013_new;
	SET Inpatient_2013;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
Keep_record = 0;
	DO i = 1 to 12;
		IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519");
	END;
IF keep_record = 1;
RUN;

 

3. Can you please explain me the importance of Keep_record in the code?

 

 

509 DATA Inpatient_2013_new;
510 SET Inpatient_2013;
511 ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5
511! ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12
511! ;
512 Keep_record = 0;
513 DO i = 1 to 12;
514 IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181",
514! "45111", "45119", "41511", "41512", "41513", "41519");
515 END;
516 IF keep_record = 1;
517 RUN;
NOTE: There were 582422 observations read from the data set WORK.INPATIENT_2013.
NOTE: The data set WORK.INPATIENT_2013_NEW has 0 observations and 258 variables.
NOTE: DATA statement used (Total process time):
real time 1.15 seconds
cpu time 1.09 seconds

 

 

Thank you!!

 

 

 

Tom
Super User Tom
Super User

A variable ilist of the format A1-A5 is asking for the sequence of variables formed by changing the numeric suffix of the name. So the position of the variables in tha current data vector , or even their existance, does not matter. To make a variable list based on position you need to use two hyphens lke NAME -- HEIGHT 

 

When you are referencing existing variables in an ARRAY statement you don't need to tell SAS wheher they are numeric or character, SAS already knows how the variables are defined and it is too late to change the variables definitions.  

 

But if you did want to use an ARRAY statement to define new character variables then can indication that by placing the type/length declaration BEFORE the variable list.  Such as

array newvar (12) $5 ;

Which would define 12 variables named NEWVAR1-NEWVAR12 where each one is character of length 5.

 

I assume you are trying to check if any of the variables match any of the values in your list?  If so then set the condition FALSE be starting to look and set it to TRUE when you find one.  Then you can decide whether you want to keep based on the condition flag variable.

 

data want;
  set have;
  array codes icd_dgns_cd1-icd_dgns_cd12;
  has_condition = 0;
  do i = 1 to dim(codes)  ;
    if codes(i) in ("45341" "45340" "45342" "45350" "45351" "45352"
                    "45181" "45111" "45119" "41511" "41512" "41513" "41519")
    then has_condition=1;
  end;
  if has_condition then output ;
run;

 

Sujithpeta
Quartz | Level 8

Hello,

 

I've 257 variables and millions of rows, my interest is to extract the rows that match the value of my interest in 12 variables out of 257.

But I want the extracted rows to have all 250 variables data. Is there any function that would do this? I tried using ARRAY and DO loop together but I'm getting frustrated with the errors. Can someone help me out? Thanks

 

DATA Inpatient_2013_new;
SET Inpatient_2013; *Inpatient_2013 contains 257 variables and millions of rows;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
	DO i = 1 to 12;
		IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519"); *If any of the variables have the following values, I want the entire row with all 250 varible info;
	END;
RUN;

  I'm getting the following error

563  DATA Inpatient_2013_new;
564  SET Inpatient_2013; *Inpatient_2013 contains 250 variables and millions of rows;
565      ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5
565! ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12
565! ;
566      DO i = 1 to 12;
567          IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181",
567! "45111", "45119", "41511", "41512", "41513", "41519"); *If any of the variables have the
567! following values, I want the entire row with all 250 varible info;
568      END;
569  RUN;

NOTE: There were 582422 observations read from the data set WORK.INPATIENT_2013.
NOTE: The data set WORK.INPATIENT_2013_NEW has 0 observations and 257 variables.
NOTE: DATA statement used (Total process time):
      real time           1.08 seconds
      cpu time            1.07 seconds
Ksharp
Super User

DATA Inpatient_2013_new;
SET Inpatient_2013; *Inpatient_2013 contains 257 variables and millions of rows;
	ARRAY ICD {12} ICD_DGNS_CD1 ICD_DGNS_CD2 ICD_DGNS_CD3 ICD_DGNS_CD4 ICD_DGNS_CD5 ICD_DGNS_CD6 ICD_DGNS_CD7 ICD_DGNS_CD8 ICD_DGNS_CD9 ICD_DGNS_CD10 ICD_DGNS_CD11 ICD_DGNS_CD12;
	DO i = 1 to 12;
		IF ICD {i} in ("45341", "45340", "45342", "45350", "45351", "45352", "45181", "45111", "45119", "41511", "41512", "41513", "41519") then do; output;leave;end;
	END;
RUN;

Tom
Super User Tom
Super User

@Ksharp That could potentially output the same row multiple times.

 

Answer is at the other version of this question.

https://communities.sas.com/t5/Base-SAS-Programming/Need-help-with-DO-loop/m-p/299440#M63154

 

Reeza
Super User

@Tom I think the LEAVE prevents that? 

 

@Sujithpeta Please don't post the same questions multiple times. 

Ksharp
Super User
@Tom , Never Mind. I respect everything you did for this forum just as data _null_(John King). You both are really similar .
Sujithpeta
Quartz | Level 8

I ran a sample code like this

 

DATA Inpatient_2011_new2;
	SET Inpatient_2011;
	KEEP ICD_DGNS_CD1 -- ICD_DGNS_CD12;
RUN;

Instead of giving 12 variables it included other variables in between those 12 variables

 

 

676  DATA Inpatient_2011_new2;
677      SET Inpatient_2011;
678      KEEP ICD_DGNS_CD1 -- ICD_DGNS_CD12;
679  RUN;

NOTE: There were 5000 observations read from the data set WORK.INPATIENT_2011.
NOTE: The data set WORK.INPATIENT_2011_NEW2 has 5000 observations and 34 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 second

 

 --Sujith

Reeza
Super User

@Sujithpeta That is correct behaviour. 2 dashes include all variables between the two. A single dash includes only variables with a consecutive suffix. 

Sujithpeta
Quartz | Level 8

Thanks to all of you guys for helping me out

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2026 views
  • 6 likes
  • 4 in conversation