BookmarkSubscribeRSS Feed
kashun
Obsidian | Level 7

I do have a data which I would like to create variables to flag them out. The values in cat are changes as columns. I am thinking there is a better way to do this instead of if-else statement.

Have

data cats;
   input ID $ Cat $ Date $;
   datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;

Want

ID cat_A cat_B cat_C A_date B_date C_date
1 1 1 0 1/17/2020 4/12/2020  
2 1 0 0 2/21/2020    
3 0 0 1     12/30/2020
4 0 1 1   1/5/2020 7/19/2020
5 1 0 0 6/25/2020    
6 0 1 0   7/17/2020  
7 0 1 0   6/20/2020  
7 REPLIES 7
PaigeMiller
Diamond | Level 26

In my opinion, the better way to do this is to leave the data set long, instead of making it wide with awkward column names.

 

What is the next step after your obtain this wide data set? What analysis or report are you planning to produce? Most analyses and reports are more easily produced from long data sets.

--
Paige Miller
kashun
Obsidian | Level 7
@PaigeMiller. The data has duplicates and I do need it to be one id per row. This will be used as a lookup table to run other programs.
PaigeMiller
Diamond | Level 26

Plenty of ways to de-duplicate data without creating this long data set with awkward variable names.

 

But you didn't explain what the next analysis or report is going to be, so it would help greatly to know what that is. Without that knowledge of what is next, I'm not going to try to create code for this problem. Long data sets can be used for lookup tables as well.

--
Paige Miller
Reeza
Super User

@PaigeMiller wrote:

Long data sets can be used for lookup tables as well.


Usually easier to do a lookup from a long data set since you can merge easily. 

Kurt_Bremser
Super User

@kashun wrote:
@PaigeMiller. The data has duplicates and I do need it to be one id per row. This will be used as a lookup table to run other programs.

Duplicates of cat will cause problems when transposing. Searching for duplicates is always easier in the long dataset.

Reeza
Super User
data cats;
   input ID $ Cat $ Date $;
   datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;

proc transpose data=cats out=cats_wide prefix=GROUP_;
by ID;
id CAT;
var DATE;
run;

data want;
set cats_wide;
array flags(*) flag1-flag3;
array GROUP(*) group_A--group_c;

do i=1 to 3;
flags(i) = not missing(group(i)) ;
end;

drop i _name_;
run;

proc print data=want;
run;

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd

 


@kashun wrote:

I do have a data which I would like to create variables to flag them out. The values in cat are changes as columns. I am thinking there is a better way to do this instead of if-else statement.

Have

data cats;
   input ID $ Cat $ Date $;
   datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;

Want

ID cat_A cat_B cat_C A_date B_date C_date
1 1 1 0 1/17/2020 4/12/2020  
2 1 0 0 2/21/2020    
3 0 0 1     12/30/2020
4 0 1 1   1/5/2020 7/19/2020
5 1 0 0 6/25/2020    
6 0 1 0   7/17/2020  
7 0 1 0   6/20/2020  

 

Ksharp
Super User

Merge Skill proposed by me ,Art.T and Matt.

 

https://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

 

 

data cats;
   input ID $ Cat $ Date $;
   datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;
proc sql noprint;
select distinct catt('cats(where=(cat_',cat,'="',cat,'") rename=(cat=cat_',cat,' date=',cat,'_date))')
into : merge separated by ' '
 from cats;
quit;
data want;
merge &merge.;
by id;
run;
data want;
 set want;
 array x{*} $ cat_:;
 do i=1 to dim(x);
  if missing(x{i}) then x{i}='0';
   else x{i}='1';
 end;
 drop i;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 693 views
  • 2 likes
  • 5 in conversation