Re: Modifying Values to Variables

kashun · Posted 10-15-2021 11:07 AM

I do have a data which I would like to create variables to flag them out. The values in cat are changes as columns. I am thinking there is a better way to do this instead of if-else statement.

Have

data cats;
   input ID $ Cat $ Date $;
   datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;

Want

ID	cat_A	cat_B	cat_C	A_date	B_date	C_date
1	1	1	0	1/17/2020	4/12/2020
2	1	0	0	2/21/2020
3	0	0	1			12/30/2020
4	0	1	1		1/5/2020	7/19/2020
5	1	0	0	6/25/2020
6	0	1	0		7/17/2020
7	0	1	0		6/20/2020

PaigeMiller · Posted 10-15-2021 11:14 AM

In my opinion, the better way to do this is to leave the data set long, instead of making it wide with awkward column names.

What is the next step after your obtain this wide data set? What analysis or report are you planning to produce? Most analyses and reports are more easily produced from long data sets.

--
Paige Miller

kashun · Posted 10-15-2021 11:20 AM

@PaigeMiller. The data has duplicates and I do need it to be one id per row. This will be used as a lookup table to run other programs.

PaigeMiller · Posted 10-15-2021 11:22 AM

Plenty of ways to de-duplicate data without creating this long data set with awkward variable names.

But you didn't explain what the next analysis or report is going to be, so it would help greatly to know what that is. Without that knowledge of what is next, I'm not going to try to create code for this problem. Long data sets can be used for lookup tables as well.

--
Paige Miller

Reeza · Posted 10-15-2021 12:37 PM

@PaigeMiller wrote:

Long data sets can be used for lookup tables as well.

Usually easier to do a lookup from a long data set since you can merge easily.

Kurt_Bremser · Posted 10-15-2021 12:20 PM

@kashun wrote:
@PaigeMiller. The data has duplicates and I do need it to be one id per row. This will be used as a lookup table to run other programs.

Duplicates of cat will cause problems when transposing. Searching for duplicates is always easier in the long dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Reeza · Posted 10-15-2021 11:22 AM

data cats;
   input ID $ Cat $ Date $;
   datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;

proc transpose data=cats out=cats_wide prefix=GROUP_;
by ID;
id CAT;
var DATE;
run;

data want;
set cats_wide;
array flags(*) flag1-flag3;
array GROUP(*) group_A--group_c;

do i=1 to 3;
flags(i) = not missing(group(i)) ;
end;

drop i _name_;
run;

proc print data=want;
run;

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd

@kashun wrote:

I do have a data which I would like to create variables to flag them out. The values in cat are changes as columns. I am thinking there is a better way to do this instead of if-else statement.

Have
data cats;
   input ID $ Cat $ Date $;
   datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;
Want

ID cat_A cat_B cat_C A_date B_date C_date

1 1 1 0 1/17/2020 4/12/2020

2 1 0 0 2/21/2020

3 0 0 1 12/30/2020

4 0 1 1 1/5/2020 7/19/2020

5 1 0 0 6/25/2020

6 0 1 0 7/17/2020

7 0 1 0 6/20/2020

Ksharp · Posted 10-16-2021 07:16 AM

Merge Skill proposed by me ,Art.T and Matt.

https://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

data cats;
   input ID $ Cat $ Date $;
   datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;
proc sql noprint;
select distinct catt('cats(where=(cat_',cat,'="',cat,'") rename=(cat=cat_',cat,' date=',cat,'_date))')
into : merge separated by ' '
 from cats;
quit;
data want;
merge &merge.;
by id;
run;
data want;
 set want;
 array x{*} $ cat_:;
 do i=1 to dim(x);
  if missing(x{i}) then x{i}='0';
   else x{i}='1';
 end;
 drop i;
run;

Registration is open

SAS Training: Just a Click Away