BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ariin
Fluorite | Level 6

Dear SAS Community,

I am a new user in SAS and I have encountered an issue while working on large data set. Let me introduce the data-table I am working with.

My table has millions of rows and it is unique at ID+Month level. The Month variable runs from 1 to 100. Then I have series of column named like var1, var2, var3.....var100.  And these columns only contain two values; a blank or "Y". I leave a snapshot of the table below. 

 

IDMonthVar1Var2….….Var100
ABC11 Y  Y
ABC21     
ABC31Y    
ABC41    Y
ABC51 Y  Y
ABC12YY  Y
ABC22 Y   
ABC32     
ABC42Y    
ABC52     
ABC1 Y  Y
ABC2Y   Y
ABC3     
ABC4 Y   
ABC5 Y   
ABC1100 Y  Y
ABC2100    Y
ABC3100     
ABC4100YY   
ABC5100YY  Y

 

 Using the table I want to create an output which has only unique IDs and columns as Month1, Month2, Month3, ..., Month100. In the columns Month1, Month2, ...., Month100, I want to have only two values as a blank or "Y" based on combination of a specific month and varx. For example, for Month = 1 and Var1, only ID ABC3 has Var1 = "Y" hence in my output table, Month1 value against ID ABC3 has the same value "Y". 

 

IDMonth1Month2….….Month100
ABC1 Y  Y
ABC2 Y  Y
ABC3Y    
ABC4     
ABC5    Y

 

Hope the long explanation helps. 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Given that you're new to SAS you might not realize yet that working with a narrow table structure is very often advantageous. 

Consider to structure your data as below. Whether you want to create rows where the value for a var<n> variable is missing depends a bit on how you intend to use the data. But it's of course also never a big problem to add such missing rows later for some specific use cases. And some SAS Proc's also allow to just create a second "template" table with all the cross sections you want to see in a report independent if this data exists in the base table or not.

ID Month index value
ABC1 1 2 Y
ABC3 1 1 Y
ABC5 1 2 Y
ABC1 2 1 Y
ABC1 2 2 Y
ABC2 2 2 Y
ABC4 2 1 Y

 

Here a sample how to create such a structure and report on it

data have;
  infile cards truncover expandtabs;
  input (ID Month Var1  Var2) ($);
  cards;
ABC1  1  .  Y    
ABC2  1  .   .     
ABC3  1 Y  .     
ABC4  1 .    .     
ABC5  1  .  Y    
ABC1  2 Y Y    
ABC2  2  .  Y    
ABC3  2 .   .        
ABC4  2 Y .        
ABC5  2  .  . 
;

data want;
  set have;
  by month id;
  array vars {*} var1-var2;
  do index=1 to dim(vars);
    value=vars[index];
    output;
  end;
  drop var1-var2;
run;

proc tabulate data=want missing;
  class Month index value;
  keylabel n=' ';
  table month, index*value;
run;

 

Patrick_0-1700459316237.png

 

View solution in original post

13 REPLIES 13
PaigeMiller
Diamond | Level 26

Honestly, I don't grasp what you are trying to do here, somehow var1 - var100 turns into month1 - month100 but I can't follow your explanation.

 

Furthermore, I always object to trying to take data like this and put an calendar type information into variable names like month1 - month100. Although I don't understand your data or what you are trying to do, I do understand SAS and it seems to me that you can work with the data much more easily in the original form rather than in the form you want to output.

--
Paige Miller
Ariin
Fluorite | Level 6

Hi, thank you for your response. I agree that my explanation is not apt. Here is a glimpse of how I would do it using a case when statement. The problem is I have to write this case when 100 times as I have 100 distinct months. I am trying to find an easier way to write this code using some kind of loop which can based on the two conditions Month and Varx = "Y", create individual Monthx columns. 

 

 

Proc Sql;
Create table data_want as
Select distinct ID,
case when var1 = "Y" and Month = 1 then "Y" else "" end as Month1,
case when var2 = "Y" and Month = 2 then "Y" else "" end as Month2,
.....
.....
case when var100 = "Y" and Month = 100 then "Y" else "" end as Month100
from data_have
group by 1;
Quit;

PaigeMiller
Diamond | Level 26

So when month=1 you only look at var1 and when month=2 you only look at var2 and so on?

--
Paige Miller
PaigeMiller
Diamond | Level 26

Before I go ahead and try to write a program, I feel compelled to ask ... why is this rearrangement of the data needed? Normally leaving data in the long arrangement makes subsequent programming easier, while turning data into a wide data set usually makes subsequent programming harder. So what is the reason you are doing this? What will be done with the data after you re-arrange it?

--
Paige Miller
Ariin
Fluorite | Level 6
So the IDs based on Month+Varx combination lead to certain specific payout in the future and the client wants to understand two things: how many IDs/contract are there in each month (Monthx) and how much will be the payout for each month and second, how different are these contracts when MonthX changes from Month1 to Month10 or Month4 to Month14 and so on... I hope it makes sense.
PaigeMiller
Diamond | Level 26

@Ariin wrote:
So the IDs based on Month+Varx combination lead to certain specific payout in the future and the client wants to understand two things: how many IDs/contract are there in each month (Monthx) and how much will be the payout for each month and second, how different are these contracts when MonthX changes from Month1 to Month10 or Month4 to Month14 and so on... I hope it makes sense.

But this can be done without re-arranging to a wide data set.

--
Paige Miller
Ariin
Fluorite | Level 6

could be, but the output table is much easier to filer for a or more specific months and will be useful in any downstream activities whereas the input table is not based on months. 

PaigeMiller
Diamond | Level 26

@Ariin wrote:

could be, but the output table is much easier to filer for a or more specific months and will be useful in any downstream activities whereas the input table is not based on months. 


Well I guess I disagree, but does the code from @Ksharp work for you?

--
Paige Miller
Ksharp
Super User

Assuming I understood what you mean.

 

data have;
infile cards truncover expandtabs;
input (ID	Month	Var1	Var2) ($);
cards;
ABC1	1	 .	Y	 	 
ABC2	1	 .	 .	 	 
ABC3	1	Y	 .	 	 
ABC4	1	. 	 .	 	 
ABC5	1	 .	Y	 	 
ABC1	2	Y	Y	 	 
ABC2	2	 .	Y	 	 
ABC3	2	. 	. 	 	 	 
ABC4	2	Y	. 	 	 	 
ABC5	2	 .	. 
;

data temp;
 set have;
 if vvaluex(cats('var',month))='Y' then do;name=cats('month',month);value='Y';output; end;
keep ID name value month;
run;
proc sort data=temp;by ID ;run;
proc transpose data=temp out=want(drop=_:);
by ID;
id name;
var value;
run;
proc sql noprint;
select distinct cats('month',month) into: month separated by ' ' from have;
quit;
data want;
 retain ID &month.;
 set want;
run;
Ariin
Fluorite | Level 6
@Ksharp, I am looking at response, thank you so much
yabwon
Onyx | Level 15

I'd say: nice usecase for two dimensional array:

%let size = 100;
data have;
  call streaminit(123);

  do id = 'ABC1','ABC2','ABC3','ABC4','ABC5';
    do Month =1 to &size.;
      array Var[1:&size.] $ 1;
      do _N_=1 to &size.;
        if rand('uniform') > 0.8 then Var[_N_]="Y";
      end;
      output;
      call missing(of Var[*]);
    end;
  end;
run;

data want;
  array a[&size.,&size.] $ 1 _temporary_;

  r=0;
  do until (last.id);
    set have;
    by ID;
    r+1;
    array Var[1:&size.] $ 1;

    do c =1 to &size.;
      a[r,c]=Var[c];
    end;
  end;

  array M[&size.] $ 1 Month1-Month100;
  do r=1 to &size.;
    M[r] = a[r,r];
  end;
  output;

drop r c Month Var: ;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Patrick
Opal | Level 21

Given that you're new to SAS you might not realize yet that working with a narrow table structure is very often advantageous. 

Consider to structure your data as below. Whether you want to create rows where the value for a var<n> variable is missing depends a bit on how you intend to use the data. But it's of course also never a big problem to add such missing rows later for some specific use cases. And some SAS Proc's also allow to just create a second "template" table with all the cross sections you want to see in a report independent if this data exists in the base table or not.

ID Month index value
ABC1 1 2 Y
ABC3 1 1 Y
ABC5 1 2 Y
ABC1 2 1 Y
ABC1 2 2 Y
ABC2 2 2 Y
ABC4 2 1 Y

 

Here a sample how to create such a structure and report on it

data have;
  infile cards truncover expandtabs;
  input (ID Month Var1  Var2) ($);
  cards;
ABC1  1  .  Y    
ABC2  1  .   .     
ABC3  1 Y  .     
ABC4  1 .    .     
ABC5  1  .  Y    
ABC1  2 Y Y    
ABC2  2  .  Y    
ABC3  2 .   .        
ABC4  2 Y .        
ABC5  2  .  . 
;

data want;
  set have;
  by month id;
  array vars {*} var1-var2;
  do index=1 to dim(vars);
    value=vars[index];
    output;
  end;
  drop var1-var2;
run;

proc tabulate data=want missing;
  class Month index value;
  keylabel n=' ';
  table month, index*value;
run;

 

Patrick_0-1700459316237.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2472 views
  • 4 likes
  • 5 in conversation