Solved: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Ariin · Posted 11-17-2023 06:12 AM

Dear SAS Community,

I am a new user in SAS and I have encountered an issue while working on large data set. Let me introduce the data-table I am working with.

My table has millions of rows and it is unique at ID+Month level. The Month variable runs from 1 to 100. Then I have series of column named like var1, var2, var3.....var100. And these columns only contain two values; a blank or "Y". I leave a snapshot of the table below.

ID	Month	Var1	Var2	….	….	Var100
ABC1	1		Y			Y
ABC2	1
ABC3	1	Y
ABC4	1					Y
ABC5	1		Y			Y
ABC1	2	Y	Y			Y
ABC2	2		Y
ABC3	2
ABC4	2	Y
ABC5	2
ABC1	…		Y			Y
ABC2	…	Y				Y
ABC3	…
ABC4	…		Y
ABC5	…		Y
ABC1	100		Y			Y
ABC2	100					Y
ABC3	100
ABC4	100	Y	Y
ABC5	100	Y	Y			Y

Using the table I want to create an output which has only unique IDs and columns as Month1, Month2, Month3, ..., Month100. In the columns Month1, Month2, ...., Month100, I want to have only two values as a blank or "Y" based on combination of a specific month and varx. For example, for Month = 1 and Var1, only ID ABC3 has Var1 = "Y" hence in my output table, Month1 value against ID ABC3 has the same value "Y".

ID	Month1	Month2	….	….	Month100
ABC1		Y			Y
ABC2		Y			Y
ABC3	Y
ABC4
ABC5					Y

Hope the long explanation helps.

Patrick · Posted 11-20-2023 12:48 AM

Given that you're new to SAS you might not realize yet that working with a narrow table structure is very often advantageous.

Consider to structure your data as below. Whether you want to create rows where the value for a var<n> variable is missing depends a bit on how you intend to use the data. But it's of course also never a big problem to add such missing rows later for some specific use cases. And some SAS Proc's also allow to just create a second "template" table with all the cross sections you want to see in a report independent if this data exists in the base table or not.

ID	Month	index	value
ABC1	1	2	Y
ABC3	1	1	Y
ABC5	1	2	Y
ABC1	2	1	Y
ABC1	2	2	Y
ABC2	2	2	Y
ABC4	2	1	Y

Here a sample how to create such a structure and report on it

data have;
  infile cards truncover expandtabs;
  input (ID Month Var1  Var2) ($);
  cards;
ABC1  1  .  Y    
ABC2  1  .   .     
ABC3  1 Y  .     
ABC4  1 .    .     
ABC5  1  .  Y    
ABC1  2 Y Y    
ABC2  2  .  Y    
ABC3  2 .   .        
ABC4  2 Y .        
ABC5  2  .  . 
;

data want;
  set have;
  by month id;
  array vars {*} var1-var2;
  do index=1 to dim(vars);
    value=vars[index];
    output;
  end;
  drop var1-var2;
run;

proc tabulate data=want missing;
  class Month index value;
  keylabel n=' ';
  table month, index*value;
run;

View solution in original post

PaigeMiller · Posted 11-17-2023 06:19 AM

Honestly, I don't grasp what you are trying to do here, somehow var1 - var100 turns into month1 - month100 but I can't follow your explanation.

Furthermore, I always object to trying to take data like this and put an calendar type information into variable names like month1 - month100. Although I don't understand your data or what you are trying to do, I do understand SAS and it seems to me that you can work with the data much more easily in the original form rather than in the form you want to output.

--
Paige Miller

Ariin · Posted 11-17-2023 06:35 AM

Hi, thank you for your response. I agree that my explanation is not apt. Here is a glimpse of how I would do it using a case when statement. The problem is I have to write this case when 100 times as I have 100 distinct months. I am trying to find an easier way to write this code using some kind of loop which can based on the two conditions Month and Varx = "Y", create individual Monthx columns.

Proc Sql;
Create table data_want as
Select distinct ID,
case when var1 = "Y" and Month = 1 then "Y" else "" end as Month1,
case when var2 = "Y" and Month = 2 then "Y" else "" end as Month2,
.....
.....
case when var100 = "Y" and Month = 100 then "Y" else "" end as Month100
from data_have
group by 1;
Quit;

PaigeMiller · Posted 11-17-2023 06:42 AM

So when month=1 you only look at var1 and when month=2 you only look at var2 and so on?

--
Paige Miller

Ariin · Posted 11-17-2023 06:44 AM

yes absolutely

PaigeMiller · Posted 11-17-2023 06:56 AM

Before I go ahead and try to write a program, I feel compelled to ask ... why is this rearrangement of the data needed? Normally leaving data in the long arrangement makes subsequent programming easier, while turning data into a wide data set usually makes subsequent programming harder. So what is the reason you are doing this? What will be done with the data after you re-arrange it?

--
Paige Miller

Ariin · Posted 11-17-2023 07:44 AM

So the IDs based on Month+Varx combination lead to certain specific payout in the future and the client wants to understand two things: how many IDs/contract are there in each month (Monthx) and how much will be the payout for each month and second, how different are these contracts when MonthX changes from Month1 to Month10 or Month4 to Month14 and so on... I hope it makes sense.

PaigeMiller · Posted 11-17-2023 08:02 AM

@Ariin wrote:
So the IDs based on Month+Varx combination lead to certain specific payout in the future and the client wants to understand two things: how many IDs/contract are there in each month (Monthx) and how much will be the payout for each month and second, how different are these contracts when MonthX changes from Month1 to Month10 or Month4 to Month14 and so on... I hope it makes sense.

But this can be done without re-arranging to a wide data set.

--
Paige Miller

Ariin · Posted 11-17-2023 08:13 AM

could be, but the output table is much easier to filer for a or more specific months and will be useful in any downstream activities whereas the input table is not based on months.

PaigeMiller · Posted 11-17-2023 08:28 AM

@Ariin wrote:

could be, but the output table is much easier to filer for a or more specific months and will be useful in any downstream activities whereas the input table is not based on months.

Well I guess I disagree, but does the code from @Ksharp work for you?

--
Paige Miller

Ksharp · Posted 11-17-2023 07:23 AM

Assuming I understood what you mean.

data have;
infile cards truncover expandtabs;
input (ID	Month	Var1	Var2) ($);
cards;
ABC1	1	 .	Y	 	 
ABC2	1	 .	 .	 	 
ABC3	1	Y	 .	 	 
ABC4	1	. 	 .	 	 
ABC5	1	 .	Y	 	 
ABC1	2	Y	Y	 	 
ABC2	2	 .	Y	 	 
ABC3	2	. 	. 	 	 	 
ABC4	2	Y	. 	 	 	 
ABC5	2	 .	. 
;

data temp;
 set have;
 if vvaluex(cats('var',month))='Y' then do;name=cats('month',month);value='Y';output; end;
keep ID name value month;
run;
proc sort data=temp;by ID ;run;
proc transpose data=temp out=want(drop=_:);
by ID;
id name;
var value;
run;
proc sql noprint;
select distinct cats('month',month) into: month separated by ' ' from have;
quit;
data want;
 retain ID &month.;
 set want;
run;

Ariin · Posted 11-17-2023 08:40 AM

@Ksharp, I am looking at response, thank you so much

yabwon · Posted 11-17-2023 08:41 AM

I'd say: nice usecase for two dimensional array:

%let size = 100;
data have;
  call streaminit(123);

  do id = 'ABC1','ABC2','ABC3','ABC4','ABC5';
    do Month =1 to &size.;
      array Var[1:&size.] $ 1;
      do _N_=1 to &size.;
        if rand('uniform') > 0.8 then Var[_N_]="Y";
      end;
      output;
      call missing(of Var[*]);
    end;
  end;
run;

data want;
  array a[&size.,&size.] $ 1 _temporary_;

  r=0;
  do until (last.id);
    set have;
    by ID;
    r+1;
    array Var[1:&size.] $ 1;

    do c =1 to &size.;
      a[r,c]=Var[c];
    end;
  end;

  array M[&size.] $ 1 Month1-Month100;
  do r=1 to &size.;
    M[r] = a[r,r];
  end;
  output;

drop r c Month Var: ;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Patrick · Posted 11-20-2023 12:48 AM

Given that you're new to SAS you might not realize yet that working with a narrow table structure is very often advantageous.

Consider to structure your data as below. Whether you want to create rows where the value for a var<n> variable is missing depends a bit on how you intend to use the data. But it's of course also never a big problem to add such missing rows later for some specific use cases. And some SAS Proc's also allow to just create a second "template" table with all the cross sections you want to see in a report independent if this data exists in the base table or not.

ID	Month	index	value
ABC1	1	2	Y
ABC3	1	1	Y
ABC5	1	2	Y
ABC1	2	1	Y
ABC1	2	2	Y
ABC2	2	2	Y
ABC4	2	1	Y

Here a sample how to create such a structure and report on it

data have;
  infile cards truncover expandtabs;
  input (ID Month Var1  Var2) ($);
  cards;
ABC1  1  .  Y    
ABC2  1  .   .     
ABC3  1 Y  .     
ABC4  1 .    .     
ABC5  1  .  Y    
ABC1  2 Y Y    
ABC2  2  .  Y    
ABC3  2 .   .        
ABC4  2 Y .        
ABC5  2  .  . 
;

data want;
  set have;
  by month id;
  array vars {*} var1-var2;
  do index=1 to dim(vars);
    value=vars[index];
    output;
  end;
  drop var1-var2;
run;

proc tabulate data=want missing;
  class Month index value;
  keylabel n=' ';
  table month, index*value;
run;

Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Re: Create variables in a sequence using Macro (SAS Data Step or Proc SQL)

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!