Hi all, I am really bad in array, hoping someone can help me to complete the array.
I have to compare column1 vs column2, then column2 vs column3... and each comparison I want create a indicator field.
dataset A is the table that I want to compare.
dataset B is the incomplete code that I try to write, I know my array c{*} is incorrect but I don't know how to modify it.
Sample output of dataset D, the last 5 columns in numeric are what I want to compute.
%let mthrun=5; %let dt=01mar2019; data _null_; do i=1 to &mthrun.; mth=put(intnx('month', "&dt"d, i-&mthrun.),yymmn6.); call symput ("mth"||compress(i),mth); put i mth; end; run; data A; input id $2. J_201811 $2. J_201812 $2. J_201901 $2. J_201902 $2. J_201903 $2.; DATALINES; AAQ1Q1Q1Q3T5 BB R1Q1R1T1 CC Y1Y1Y1 DDG1G1G1G1G1 ; run; DATA B; SET A; ARRAY JC{*} J_:; ARRAY C{*} C_J:; DO i=1 to dim(JC)-1; IF jc{i} NE "" AND jc{i} NE jc{i+1} THEN C_J{i+1}=1; else C_J{i+1}=0; END; RUN; DATA D; SET A; IF J_201811 NE '' AND J_201811 NE J_201812 THEN C_J_201812=1; ELSE C_J_201812=0; IF J_201812 NE '' AND J_201812 NE J_201901 THEN C_J_201901=1; ELSE C_J_201901=0; IF J_201901 NE '' AND J_201901 NE J_201902 THEN C_J_201902=1; ELSE C_J_201902=0; IF J_201902 NE '' AND J_201902 NE J_201903 THEN C_J_201903=1; ELSE C_J_201903=0; NO_OF_CHANGES=SUM(OF C_J_201812--C_J_201903); RUN;
Thank you in advance 😃
It may help if you explain what you're trying to do overall. I would recommend redesigning this approach to work with a long data set and then flip at the end for reporting. It's more dynamic that way and you are less likely to need to hardcode any values.
@Gpp wrote:
Hi all, I am really bad in array, hoping someone can help me to complete the array.
I have to compare column1 vs column2, then column2 vs column3... and each comparison I want create a indicator field.
dataset A is the table that I want to compare.
dataset B is the incomplete code that I try to write, I know my array c{*} is incorrect but I don't know how to modify it.
Sample output of dataset D, the last 5 columns in numeric are what I want to compute.
%let mthrun=5; %let dt=01mar2019; data _null_; do i=1 to &mthrun.; mth=put(intnx('month', "&dt"d, i-&mthrun.),yymmn6.); call symput ("mth"||compress(i),mth); put i mth; end; run; data A; input id $2. J_201811 $2. J_201812 $2. J_201901 $2. J_201902 $2. J_201903 $2.; DATALINES; AAQ1Q1Q1Q3T5 BB R1Q1R1T1 CC Y1Y1Y1 DDG1G1G1G1G1 ; run; DATA B; SET A; ARRAY JC{*} J_:; ARRAY C{*} C_J:; DO i=1 to dim(JC)-1; IF jc{i} NE "" AND jc{i} NE jc{i+1} THEN C_J{i+1}=1; else C_J{i+1}=0; END; RUN; DATA D; SET A; IF J_201811 NE '' AND J_201811 NE J_201812 THEN C_J_201812=1; ELSE C_J_201812=0; IF J_201812 NE '' AND J_201812 NE J_201901 THEN C_J_201901=1; ELSE C_J_201901=0; IF J_201901 NE '' AND J_201901 NE J_201902 THEN C_J_201902=1; ELSE C_J_201902=0; IF J_201902 NE '' AND J_201902 NE J_201903 THEN C_J_201903=1; ELSE C_J_201903=0; NO_OF_CHANGES=SUM(OF C_J_201812--C_J_201903); RUN;Thank you in advance 😃
You can't declare an array like this when it doesn't exist. This type of function only works when you have the variables already in the data set:
ARRAY C{*} C_J:;
So you need to explicitly list the elements out and it will work fine.
That being said, that's why this is not a dynamic solution - it becomes hard coded and requires you modifying it every month or every run. Changing it to a long format doesn't require any of these issues and simplifies your code massively.
This is how I would approach this instead:
data A;
input id $2. J_201811 $2. J_201812 $2. J_201901 $2. J_201902 $2. J_201903 $2.;
DATALINES;
AAQ1Q1Q1Q3T5
BB R1Q1R1T1
CC Y1Y1Y1
DDG1G1G1G1G1
;
run;
*transpose to a long format;
proc transpose data=A out=long1;
by ID;
VAR J:;
run;
*add in dates and rename columns;
data long2;
set long;
date = input(substr(_name_, 3, 6), yymmn6.);
format date yymmn6.;
rename col1 = VALUE;
drop _name_ ;
run;
*check if matches previous value;
data long3;
set long2;
by ID date;
*gets previous value;
prev_value = lag(value);
*checks if new id and sets to missing if it is a new ID;
if first.id then call missing(prev_value);
*code as 1/0 if the same as previous value;
if not first.id then do;
if prev_value = value then flag=1; else flag=0;
end;
run;
*calculate summary statistics;
proc sql;
create table comparison as
select ID, mean(flag) as Percent_Change format=percent12.1, sum(flag) as N_Changed, n(flag) as N_Records
from long3
group by ID;
quit;
array c(*) $ C_201811 C_201812 C_201901 .... C_201912;
You have to list all the elements individually if you need new elements with a different prefix. You could automate that portion with a macro variable or macro as well.
@Gpp wrote:
Thank you so much for spending time for my question. I will definitely try your method.
Just an additional question on array , how to explicitly list the element? Is it Array c[*] $; ? Thanks again!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.