BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
axescot78
Quartz | Level 8

I have a dataset of 5000 observations and about 200 variables by user id. A little more than 1000 users have multiple entries. These aren't duplicate entries. They have different values for some of those 200 variables. They are not the same variables for all users that have multiple entries. See example data set below.

 

data fake_data;
input user_id 3. Var1 2. Var2 3. Var3 2. Var4 3. Var5 2. Var6 $;
datalines;
03 0 10 0 31 0 .
03 0 11 1 31 1 .
04 0 00 1 10 0 .
04 0 47 0 21 1 .
04 0 47 0 21 0 .
04 0 74 0 21 0 .
05 0 00 0 22 1 .
05 0 15 0 22 0 .
06 0 00 9 26 0 .
06 0 00 9 26 0 .
06 0 00 9 26 0 .
07 0 00 0 72 0 .
07 5 00 0 84 1 .
07 0 00 9 84 0 .
08 0 00 0 32 1 .
08 0 00 9 32 0 .
09 0 54 0 54 0 Excel
09 0 54 0 54 0 Excel
10 0 73 0 18 0 .
10 0 00 9 18 0 .
10 0 23 0 18 1 .
10 0 93 0 18 0 .
11 0 37 0 12 1 .
11 0 43 0 12 0 .
11 0 43 0 12 1 .
12 0 40 0 15 0 .
12 0 00 0 15 0 .
13 1 00 0 16 0 .
13 0 63 0 16 1 .
;

I need to turn this into a wide data set where it's one row per user id without having columns that duplicate data. That is, when there is different values make column wide, do not duplicate columns with same value. How can I do this when the data is so random? I can work in SAS, Excel, or R.

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Why do you think you need to dedup into columns? Deduping into rows is a lot easier and will make analysis a lot easier. A dedup by row could look like this:

userid VarID Value
03     var1   0
03     var2  10
03     var3   0
03     var4  31
03     var5   0
03     var6   .
03     var2  11
03     var3   1
03     var5   1

View solution in original post

3 REPLIES 3
SASKiwi
PROC Star

Why do you think you need to dedup into columns? Deduping into rows is a lot easier and will make analysis a lot easier. A dedup by row could look like this:

userid VarID Value
03     var1   0
03     var2  10
03     var3   0
03     var4  31
03     var5   0
03     var6   .
03     var2  11
03     var3   1
03     var5   1
Tom
Super User Tom
Super User

I have no idea what you mean by "make column wide".

 

You can make a character variable longer (to a limit of 32767 bytes) so it could store multiple values.   But you cannot store multiple numbers into one single value.

Ksharp
Super User
/*
I think you'd better to post the output you want to see
to explain your question better.
The simplest way to do this is using PROC SUMMARRY.
*/

data fake_data;
input user_id 3. Var1 2. Var2 3. Var3 2. Var4 3. Var5 2. Var6 $;
datalines;
03 0 10 0 31 0 .
03 0 11 1 31 1 .
04 0 00 1 10 0 .
04 0 47 0 21 1 .
04 0 47 0 21 0 .
04 0 74 0 21 0 .
05 0 00 0 22 1 .
05 0 15 0 22 0 .
06 0 00 9 26 0 .
06 0 00 9 26 0 .
06 0 00 9 26 0 .
07 0 00 0 72 0 .
07 5 00 0 84 1 .
07 0 00 9 84 0 .
08 0 00 0 32 1 .
08 0 00 9 32 0 .
09 0 54 0 54 0 Excel
09 0 54 0 54 0 Excel
10 0 73 0 18 0 .
10 0 00 9 18 0 .
10 0 23 0 18 1 .
10 0 93 0 18 0 .
11 0 37 0 12 1 .
11 0 43 0 12 0 .
11 0 43 0 12 1 .
12 0 40 0 15 0 .
12 0 00 0 15 0 .
13 1 00 0 16 0 .
13 0 63 0 16 1 .
;

proc sql noprint;
select max(n) into :n
 from (select count(*) as n from fake_data group by user_id);
quit;
proc summary data=fake_data;
by user_id;
output out=want(drop=_freq_ _type_) idgroup(out[&n.] ( Var1-Var6)=);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 343 views
  • 0 likes
  • 4 in conversation