BookmarkSubscribeRSS Feed
undercover_247
Fluorite | Level 6

I have a dataset that looks like this

IDusers_2016users_2017users_2018users_2019users_2020
jjones11000
asmith01110

 

I want a table that looks like this

Yearcount_users_year
20161
20172

 

I want to do this in proc sql but I don't understand how to create a variable using a combination of if/then logic and creating a new variable. A users could be involved in this program for multiple years so I couldn't really do this in a datstep without multiple steps. Any advice? 

 

 

5 REPLIES 5
ballardw
Super User

What happened to Id, users2018, users2019 and users2020?

 

SQL does not support "if". The Case statement, which allows conditional assignments in SQL doesn't really work across "rows" the way you may be thinking.

 

If you actually want all of your "years" in the result SQL is going to be pretty cumbersome to do this because the data as currently structured is not set up for SQL "group by" to get summaries such as total, which would want a single variable name Year(or similar) with the value of year AND another single variable to sum.

 

If this were my data:

data need;
   set have;
   array u (2016:2020) users_2016 - users_2020;
   do year= 2016 to 2020;
      value= u[year];
      output;
   end;
   keep year value;
run;

proc means data=need sum;
   class year;
   var value;
run;
Kurt_Bremser
Super User

You will need multiple steps, a you first have to correct the mistake of hiding data (years) in structure (variable names). Once you have transposed and extracted the year, further analysis will be simple.

PGStats
Opal | Level 21

Just for fun. If you really insist on using SQL :

 

data have;
input ID $   users_2016  users_2017  users_2018  users_2019  users_2020;
datalines;
jjones    1   1   0   0   0
asmith    0   1   1   1   0
;

proc sql;
create table want (year num, nbUsers num);
insert into want select 2016, count (distinct ID) from have where users_2016;
insert into want select 2017, count (distinct ID) from have where users_2017;
insert into want select 2018, count (distinct ID) from have where users_2018;
insert into want select 2019, count (distinct ID) from have where users_2019;
insert into want select 2020, count (distinct ID) from have where users_2020;
select * from want;
quit;

PGStats_0-1698780933434.png

😁

 

PG
mkeintz
PROC Star

@undercover_247 wrote:

A users could be involved in this program for multiple years so I couldn't really do this in a datstep without multiple steps. Any advice? 


 

I don't see why this needs more than a single data step:

 

data have;
input ID $   users_2016  users_2017  users_2018  users_2019  users_2020;
datalines;
jjones    1   1   0   0   0
asmith    0   1   1   1   0
;

data want (keep=year count);
  set have end=end_of_have;
  array totl {2016:2020} _temporary_;
  array usrs {2016:2020} users_2016-users_2020;

  do year=lbound(usrs) to hbound(usrs);
    totl{year}+usrs{year};
  end;
  if end_of_have;
  do year=lbound(usrs) to hbound(usrs);
    count=totl{year};
    output;
  end;
run;

This primarily benefits from not only the ability to declare arrays, but also from the ability to set lower and upper array bounds corresponding to the expected YEAR values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26
proc summary data=have;
    var users:;
    output out=want sum=;
run;
proc transpose data=want out=want1;
    var users:;
run;

 

Agreeing with @Kurt_Bremser that putting years into variable names is a particularly poor choice, perhaps not so much in this simple example, but certainly in the long run you want to AVOID years (or other calendar information) in variable names as it will make your programming more difficult.

 

With regard to @mkeintz's solution, certainly you can do this in one data step, but the code he provides is usually beyond the ability of most programmers to create, and is (in my opinion) rather complicated for what is a relatively simple problem. I don't want users (especially new SAS users) to get the impression that finding sums or counts requires such complicated code. In any case, I recommend using built-in SAS PROCs whenever possible, as I show above.

 

Complicated code or multi-step code is avoided by originally creating the data into a more SAS-friendly structure. Instead of years in the variable names, year should be a variable itself.

 

data have;
     input user $ year;
     cards;
jjones 2016
jjones 2017
asmith 2016
asmith 2017
asmith 2018
;

 

With this layout, the calculations are done with very simple code

 

proc freq data=have;
    tables year/noprint out=want(drop=percent);
run;

 

--
Paige Miller

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2367 views
  • 1 like
  • 6 in conversation