BookmarkSubscribeRSS Feed
Gpp
Fluorite | Level 6 Gpp
Fluorite | Level 6

Hi all, I am really bad in array, hoping someone can help me to complete the array.

 

I have to compare column1 vs column2, then column2 vs column3... and each comparison I want create a indicator field.

 

dataset A is the table that I want to compare.

 

dataset B is the incomplete code that I try to write, I know my array c{*} is incorrect but I don't know how to modify it.

 

Sample output of dataset D, the last 5 columns in numeric are what I want to compute.

sample.JPG

 

%let mthrun=5;
%let dt=01mar2019;

data _null_;
	do i=1 to &mthrun.;
	mth=put(intnx('month', "&dt"d, i-&mthrun.),yymmn6.);
	call symput ("mth"||compress(i),mth);
	put i mth;
	end;
run;

data A;
input id $2. J_201811 $2.  J_201812 $2.  J_201901 $2.  J_201902 $2.  J_201903 $2.;
DATALINES;
AAQ1Q1Q1Q3T5
BB  R1Q1R1T1
CC    Y1Y1Y1
DDG1G1G1G1G1
;
run;


DATA B;
	SET A;
	ARRAY JC{*} J_:;
	ARRAY C{*} C_J:;
		DO i=1 to dim(JC)-1;
		IF jc{i} NE "" AND jc{i} NE jc{i+1} THEN C_J{i+1}=1;
			else C_J{i+1}=0;
		END; 
RUN;

DATA D;
	SET A;
	IF J_201811 NE '' AND J_201811 NE J_201812 THEN C_J_201812=1; ELSE C_J_201812=0;
	IF J_201812 NE '' AND J_201812 NE J_201901 THEN C_J_201901=1; ELSE C_J_201901=0;
	IF J_201901 NE '' AND J_201901 NE J_201902 THEN C_J_201902=1; ELSE C_J_201902=0;
	IF J_201902 NE '' AND J_201902 NE J_201903 THEN C_J_201903=1; ELSE C_J_201903=0;
	NO_OF_CHANGES=SUM(OF C_J_201812--C_J_201903);
RUN;

Thank you in advance 😃

10 REPLIES 10
Reeza
Super User

It may help if you explain what you're trying to do overall. I would recommend redesigning this approach to work with a long data set and then flip at the end for reporting. It's more dynamic that way and you are less likely to need to hardcode any values.

 


@Gpp wrote:

Hi all, I am really bad in array, hoping someone can help me to complete the array.

 

I have to compare column1 vs column2, then column2 vs column3... and each comparison I want create a indicator field.

 

dataset A is the table that I want to compare.

 

dataset B is the incomplete code that I try to write, I know my array c{*} is incorrect but I don't know how to modify it.

 

Sample output of dataset D, the last 5 columns in numeric are what I want to compute.

sample.JPG

 

%let mthrun=5;
%let dt=01mar2019;

data _null_;
	do i=1 to &mthrun.;
	mth=put(intnx('month', "&dt"d, i-&mthrun.),yymmn6.);
	call symput ("mth"||compress(i),mth);
	put i mth;
	end;
run;

data A;
input id $2. J_201811 $2.  J_201812 $2.  J_201901 $2.  J_201902 $2.  J_201903 $2.;
DATALINES;
AAQ1Q1Q1Q3T5
BB  R1Q1R1T1
CC    Y1Y1Y1
DDG1G1G1G1G1
;
run;


DATA B;
	SET A;
	ARRAY JC{*} J_:;
	ARRAY C{*} C_J:;
		DO i=1 to dim(JC)-1;
		IF jc{i} NE "" AND jc{i} NE jc{i+1} THEN C_J{i+1}=1;
			else C_J{i+1}=0;
		END; 
RUN;

DATA D;
	SET A;
	IF J_201811 NE '' AND J_201811 NE J_201812 THEN C_J_201812=1; ELSE C_J_201812=0;
	IF J_201812 NE '' AND J_201812 NE J_201901 THEN C_J_201901=1; ELSE C_J_201901=0;
	IF J_201901 NE '' AND J_201901 NE J_201902 THEN C_J_201902=1; ELSE C_J_201902=0;
	IF J_201902 NE '' AND J_201902 NE J_201903 THEN C_J_201903=1; ELSE C_J_201903=0;
	NO_OF_CHANGES=SUM(OF C_J_201812--C_J_201903);
RUN;

Thank you in advance 😃


 

Gpp
Fluorite | Level 6 Gpp
Fluorite | Level 6
Hello Reza, I am trying to know for the past 2 years (24 column for each month ) any changes between previous month and current month.

Example j_201811=“Q1” and j_201812=“W1” then count=1.
But if the j_201811 = “” and j_201812 ne “”, then is not consider as a change , so count=0
j_201811=“Q1” and j_201812=“Q1” then count=0

in the end of the day I want to know , how many times the contents of the column changed.
Reeza
Super User
Your code seems mostly correct to me then, what isn't happening that you want to happen.

I can't run your code because you didn't provide any sample data (image is not data).
Gpp
Fluorite | Level 6 Gpp
Fluorite | Level 6
%let mthrun=5;
%let dt=01mar2019;

data _null_;
do i=1 to &mthrun.;
mth=put(intnx('month', "&dt"d, i-&mthrun.),yymmn6.);
call symput ("mth"||compress(i),mth);
put i mth;
end;
run;

data A;
input id $2. J_201811 $2. J_201812 $2. J_201901 $2. J_201902 $2. J_201903 $2.;
DATALINES;
AAQ1Q1Q1Q3T5
BB R1Q1R1T1
CC Y1Y1Y1
DDG1G1G1G1G1
;
run;


DATA B;
SET A;
ARRAY JC{*} J_:;
ARRAY C{*} C_J:;
DO i=1 to dim(JC)-1;
IF jc{i} NE "" AND jc{i} NE jc{i+1} THEN C_J{i+1}=1;
else C_J{i+1}=0;
END;
RUN;

DATA D;
SET A;
IF J_201811 NE '' AND J_201811 NE J_201812 THEN C_J_201812=1; ELSE C_J_201812=0;
IF J_201812 NE '' AND J_201812 NE J_201901 THEN C_J_201901=1; ELSE C_J_201901=0;
IF J_201901 NE '' AND J_201901 NE J_201902 THEN C_J_201902=1; ELSE C_J_201902=0;
IF J_201902 NE '' AND J_201902 NE J_201903 THEN C_J_201903=1; ELSE C_J_201903=0;
NO_OF_CHANGES=SUM(OF C_J_201812--C_J_201903);
RUN;

Can you use this ? Dataset A is my sample data . Dataset b is my desired output



Reeza
Super User
If B is expected output, what's wrong with the code?
If B is close but not exact, please show an exact output expected.

Gpp
Fluorite | Level 6 Gpp
Fluorite | Level 6
Sorry, is D will generate the output that I want , but D is too lengthy , so I try to use array in B, but B with error. I am not sure how to make B correct
Reeza
Super User

You can't declare an array like this when it doesn't exist. This type of function only works when you have the variables already in the data set:

 

ARRAY C{*} C_J:;

So you need to explicitly list the elements out and it will work fine. 

That being said, that's why this is not a dynamic solution - it becomes hard coded and requires you modifying it every month or every run. Changing it to a long format doesn't require any of these issues and simplifies your code massively.

 

 

Reeza
Super User

This is how I would approach this instead:

 



data A;
input id $2. J_201811 $2. J_201812 $2. J_201901 $2. J_201902 $2. J_201903 $2.;
DATALINES;
AAQ1Q1Q1Q3T5
BB R1Q1R1T1
CC Y1Y1Y1
DDG1G1G1G1G1
;
run;

*transpose to a long format;
proc transpose data=A out=long1;
by ID;
VAR J:;
run;

*add in dates and rename columns;
data long2;
set long;
date = input(substr(_name_, 3, 6), yymmn6.);
format date yymmn6.;
rename col1 = VALUE;
drop _name_ ;
run;

*check if matches previous value;
data long3;
set long2;
by ID date;

*gets previous value;

prev_value = lag(value);

*checks if new id and sets to missing if it is a new ID;
if first.id then call missing(prev_value);

*code as 1/0 if the same as previous value;
if not first.id then do;
if prev_value = value then flag=1; else flag=0;
end;

run;

*calculate summary statistics;
proc sql;
create table comparison as
select ID, mean(flag) as Percent_Change format=percent12.1, sum(flag) as N_Changed, n(flag) as N_Records
from long3
group by ID;
quit;

Gpp
Fluorite | Level 6 Gpp
Fluorite | Level 6
Thank you so much for spending time for my question. I will definitely try your method.

Just an additional question on array , how to explicitly list the element? Is it Array c[*] $; ? Thanks again!
Reeza
Super User
array c(*) $ C_201811 C_201812 C_201901 ....  C_201912;

You have to list all the elements individually if you need new elements with a different prefix. You could automate that portion with a macro variable or macro as well. 

 


@Gpp wrote:
Thank you so much for spending time for my question. I will definitely try your method.

Just an additional question on array , how to explicitly list the element? Is it Array c[*] $; ? Thanks again!

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 782 views
  • 0 likes
  • 2 in conversation