Solved: turning random non-duplicate entries into columns

axescot78 · Posted 06-29-2023 08:53 PM

I have a dataset of 5000 observations and about 200 variables by user id. A little more than 1000 users have multiple entries. These aren't duplicate entries. They have different values for some of those 200 variables. They are not the same variables for all users that have multiple entries. See example data set below.

data fake_data;
input user_id 3. Var1 2. Var2 3. Var3 2. Var4 3. Var5 2. Var6 $;
datalines;
03 0 10 0 31 0 .
03 0 11 1 31 1 .
04 0 00 1 10 0 .
04 0 47 0 21 1 .
04 0 47 0 21 0 .
04 0 74 0 21 0 .
05 0 00 0 22 1 .
05 0 15 0 22 0 .
06 0 00 9 26 0 .
06 0 00 9 26 0 .
06 0 00 9 26 0 .
07 0 00 0 72 0 .
07 5 00 0 84 1 .
07 0 00 9 84 0 .
08 0 00 0 32 1 .
08 0 00 9 32 0 .
09 0 54 0 54 0 Excel
09 0 54 0 54 0 Excel
10 0 73 0 18 0 .
10 0 00 9 18 0 .
10 0 23 0 18 1 .
10 0 93 0 18 0 .
11 0 37 0 12 1 .
11 0 43 0 12 0 .
11 0 43 0 12 1 .
12 0 40 0 15 0 .
12 0 00 0 15 0 .
13 1 00 0 16 0 .
13 0 63 0 16 1 .
;

I need to turn this into a wide data set where it's one row per user id without having columns that duplicate data. That is, when there is different values make column wide, do not duplicate columns with same value. How can I do this when the data is so random? I can work in SAS, Excel, or R.

SASKiwi · Posted 06-29-2023 09:38 PM

Why do you think you need to dedup into columns? Deduping into rows is a lot easier and will make analysis a lot easier. A dedup by row could look like this:

userid VarID Value
03     var1   0
03     var2  10
03     var3   0
03     var4  31
03     var5   0
03     var6   .
03     var2  11
03     var3   1
03     var5   1

View solution in original post

SASKiwi · Posted 06-29-2023 09:38 PM

Why do you think you need to dedup into columns? Deduping into rows is a lot easier and will make analysis a lot easier. A dedup by row could look like this:

userid VarID Value
03     var1   0
03     var2  10
03     var3   0
03     var4  31
03     var5   0
03     var6   .
03     var2  11
03     var3   1
03     var5   1

Tom · Posted 06-29-2023 10:25 PM

I have no idea what you mean by "make column wide".

You can make a character variable longer (to a limit of 32767 bytes) so it could store multiple values. But you cannot store multiple numbers into one single value.

Ksharp · Posted 06-30-2023 07:37 AM

/*
I think you'd better to post the output you want to see
to explain your question better.
The simplest way to do this is using PROC SUMMARRY.
*/

data fake_data;
input user_id 3. Var1 2. Var2 3. Var3 2. Var4 3. Var5 2. Var6 $;
datalines;
03 0 10 0 31 0 .
03 0 11 1 31 1 .
04 0 00 1 10 0 .
04 0 47 0 21 1 .
04 0 47 0 21 0 .
04 0 74 0 21 0 .
05 0 00 0 22 1 .
05 0 15 0 22 0 .
06 0 00 9 26 0 .
06 0 00 9 26 0 .
06 0 00 9 26 0 .
07 0 00 0 72 0 .
07 5 00 0 84 1 .
07 0 00 9 84 0 .
08 0 00 0 32 1 .
08 0 00 9 32 0 .
09 0 54 0 54 0 Excel
09 0 54 0 54 0 Excel
10 0 73 0 18 0 .
10 0 00 9 18 0 .
10 0 23 0 18 1 .
10 0 93 0 18 0 .
11 0 37 0 12 1 .
11 0 43 0 12 0 .
11 0 43 0 12 1 .
12 0 40 0 15 0 .
12 0 00 0 15 0 .
13 1 00 0 16 0 .
13 0 63 0 16 1 .
;

proc sql noprint;
select max(n) into :n
 from (select count(*) as n from fake_data group by user_id);
quit;
proc summary data=fake_data;
by user_id;
output out=want(drop=_freq_ _type_) idgroup(out[&n.] ( Var1-Var6)=);
run;

turning random non-duplicate entries into columns

Re: turning random non-duplicate entries into columns

Re: turning random non-duplicate entries into columns

Re: turning random non-duplicate entries into columns

Re: turning random non-duplicate entries into columns

turning random non-duplicate entries into columns

Re: turning random non-duplicate entries into columns

Re: turning random non-duplicate entries into columns

Re: turning random non-duplicate entries into columns

Re: turning random non-duplicate entries into columns

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away