Hi all, I am really bad in array, hoping someone can help me to complete the array.
I have to compare column1 vs column2, then column2 vs column3... and each comparison I want create a indicator field.
dataset A is the table that I want to compare.
dataset B is the incomplete code that I try to write, I know my array c{*} is incorrect but I don't know how to modify it.
Sample output of dataset D, the last 5 columns in numeric are what I want to compute.
%let mthrun=5; %let dt=01mar2019; data _null_; do i=1 to &mthrun.; mth=put(intnx('month', "&dt"d, i-&mthrun.),yymmn6.); call symput ("mth"||compress(i),mth); put i mth; end; run; data A; input id $2. J_201811 $2. J_201812 $2. J_201901 $2. J_201902 $2. J_201903 $2.; DATALINES; AAQ1Q1Q1Q3T5 BB R1Q1R1T1 CC Y1Y1Y1 DDG1G1G1G1G1 ; run; DATA B; SET A; ARRAY JC{*} J_:; ARRAY C{*} C_J:; DO i=1 to dim(JC)-1; IF jc{i} NE "" AND jc{i} NE jc{i+1} THEN C_J{i+1}=1; else C_J{i+1}=0; END; RUN; DATA D; SET A; IF J_201811 NE '' AND J_201811 NE J_201812 THEN C_J_201812=1; ELSE C_J_201812=0; IF J_201812 NE '' AND J_201812 NE J_201901 THEN C_J_201901=1; ELSE C_J_201901=0; IF J_201901 NE '' AND J_201901 NE J_201902 THEN C_J_201902=1; ELSE C_J_201902=0; IF J_201902 NE '' AND J_201902 NE J_201903 THEN C_J_201903=1; ELSE C_J_201903=0; NO_OF_CHANGES=SUM(OF C_J_201812--C_J_201903); RUN;
Thank you in advance 😃
It may help if you explain what you're trying to do overall. I would recommend redesigning this approach to work with a long data set and then flip at the end for reporting. It's more dynamic that way and you are less likely to need to hardcode any values.
@Gpp wrote:
Hi all, I am really bad in array, hoping someone can help me to complete the array.
I have to compare column1 vs column2, then column2 vs column3... and each comparison I want create a indicator field.
dataset A is the table that I want to compare.
dataset B is the incomplete code that I try to write, I know my array c{*} is incorrect but I don't know how to modify it.
Sample output of dataset D, the last 5 columns in numeric are what I want to compute.
%let mthrun=5; %let dt=01mar2019; data _null_; do i=1 to &mthrun.; mth=put(intnx('month', "&dt"d, i-&mthrun.),yymmn6.); call symput ("mth"||compress(i),mth); put i mth; end; run; data A; input id $2. J_201811 $2. J_201812 $2. J_201901 $2. J_201902 $2. J_201903 $2.; DATALINES; AAQ1Q1Q1Q3T5 BB R1Q1R1T1 CC Y1Y1Y1 DDG1G1G1G1G1 ; run; DATA B; SET A; ARRAY JC{*} J_:; ARRAY C{*} C_J:; DO i=1 to dim(JC)-1; IF jc{i} NE "" AND jc{i} NE jc{i+1} THEN C_J{i+1}=1; else C_J{i+1}=0; END; RUN; DATA D; SET A; IF J_201811 NE '' AND J_201811 NE J_201812 THEN C_J_201812=1; ELSE C_J_201812=0; IF J_201812 NE '' AND J_201812 NE J_201901 THEN C_J_201901=1; ELSE C_J_201901=0; IF J_201901 NE '' AND J_201901 NE J_201902 THEN C_J_201902=1; ELSE C_J_201902=0; IF J_201902 NE '' AND J_201902 NE J_201903 THEN C_J_201903=1; ELSE C_J_201903=0; NO_OF_CHANGES=SUM(OF C_J_201812--C_J_201903); RUN;Thank you in advance 😃
You can't declare an array like this when it doesn't exist. This type of function only works when you have the variables already in the data set:
ARRAY C{*} C_J:;
So you need to explicitly list the elements out and it will work fine.
That being said, that's why this is not a dynamic solution - it becomes hard coded and requires you modifying it every month or every run. Changing it to a long format doesn't require any of these issues and simplifies your code massively.
This is how I would approach this instead:
data A;
input id $2. J_201811 $2. J_201812 $2. J_201901 $2. J_201902 $2. J_201903 $2.;
DATALINES;
AAQ1Q1Q1Q3T5
BB R1Q1R1T1
CC Y1Y1Y1
DDG1G1G1G1G1
;
run;
*transpose to a long format;
proc transpose data=A out=long1;
by ID;
VAR J:;
run;
*add in dates and rename columns;
data long2;
set long;
date = input(substr(_name_, 3, 6), yymmn6.);
format date yymmn6.;
rename col1 = VALUE;
drop _name_ ;
run;
*check if matches previous value;
data long3;
set long2;
by ID date;
*gets previous value;
prev_value = lag(value);
*checks if new id and sets to missing if it is a new ID;
if first.id then call missing(prev_value);
*code as 1/0 if the same as previous value;
if not first.id then do;
if prev_value = value then flag=1; else flag=0;
end;
run;
*calculate summary statistics;
proc sql;
create table comparison as
select ID, mean(flag) as Percent_Change format=percent12.1, sum(flag) as N_Changed, n(flag) as N_Records
from long3
group by ID;
quit;
array c(*) $ C_201811 C_201812 C_201901 .... C_201912;
You have to list all the elements individually if you need new elements with a different prefix. You could automate that portion with a macro variable or macro as well.
@Gpp wrote:
Thank you so much for spending time for my question. I will definitely try your method.
Just an additional question on array , how to explicitly list the element? Is it Array c[*] $; ? Thanks again!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.