BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Kelly_K
Fluorite | Level 6

Hi, SAS Community.

 

I'm trying to subset a dataset so that only those observations with the year 2008 remain, but I get a data table with no values in any of the columns or there is no change in the output dataset.

 

My data looks something like this:

 

id    var1    var2    var3      year

1       F        25       no       2008

2      M        29       yes     2016

3      M        22       no      2008

4      F        22        no      2016

etc.

 

 

I tried the following two code variations but got the empty data table (i.e., all var columns are present but there are no observations in any of the cells):

 

data subset;
     set dataset;
     where year=2008;
run;

 

data subset;
set dataset;
if year=2008;
run;

 

 

I tried the following code but the observations with year=2016 remained (i.e., dataset = subset).

 

data subset;
set dataset;
if year=2016 then delete;
run;

 

 

Does anyone know what is going on?  In this instance, Google is not being my friend!

 

Kelly

1 ACCEPTED SOLUTION

Accepted Solutions
Kelly_K
Fluorite | Level 6

Actually, the variable I call "Date" is not actually a date.  It is an indicator indicating a participant's survey year, and I created it during the data cleaning stage.  However, in the process of writing up an explanation of how I created the variable, I figured out that the problem related to my use of the variable labels in the Where statement and not the actual values.  

Here is the process I used to create the variable:  Starting with two separate datasets, one for survey year 2008 and one for survey year 2016, and I cleaned each one separately before merging them to create the dataset I called "dataset".  I created and formatted the variables I'm calling "year" using the simple code:

*While cleaning the 2008 data;

value year 0="2008" 1="2016;
year = 0    
format year year.;

*While cleaning the 2016 data;
value year 0="2008" 1="2016;
year = 1    
format year year.;

I then merged the 2008 and 2016 datasets into a single dataset, which included a binary variable called "year" with two values, 0 and 1, and two labels 2008 and 2016.  And that is the source of the problem -- I needed to use the values after the equal (=) sign in the Where statement and not the labels.  The following code worked perfectly:

data subset;
     set dataset;
     where year=0;
run;

My apologies for the fire drill!  It didn't occur to me that the use of the name "date" would signal an actual date variable and not a plain vanilla numeric variable that happened to be a year.  And since I created the variable more than 6 months ago, I had forgotten the actual values were 0 and 1 and not 2008 and 2016.  

Thanks so much for your quick response -- it helped me figure out the problem.

Kelly

View solution in original post

7 REPLIES 7
SASKiwi
PROC Star

Please post your complete SAS log. That should be the first thing to look at if your program doesn't work as expected.

Patrick
Opal | Level 21

Your year variable could be one of below

  • type character and contain a string of digits
  • type numeric and contain a number
  • type numeric and contain a SAS date value with a format attached to print this count of days since 1/1/1960 as year
  • type numeric and contain a SAS datetime value with a format attached to print this count of seconds since 1/1/1960 as year

Below should point you into the right direction.

data have;
  year_c='2022';
  year_n=2022;
  year_dt='31dec2022'd;
  year_dttm='31dec2022:23:59:59'dt;
  format year_dt year4. year_dttm dtyear4.;
run;

proc print data=have;
run;

Patrick_0-1672277133619.png

All the variables above print as year. 

 

Proc Contents will tell you what you're actually dealing with.

proc contents data=have;
run;

Patrick_1-1672277315089.png

 

Depending on the type and content of your source variable your selection syntax for a specific year needs to look differently. One of below options should be suitable for your case.

data want;
  set have;

  /* one of below should work */
  if strip(year_c)='2022';

  if year_n=2022;

  if year(year_dt)=2022;
  if put(year_dt,year4.)='2022';

  if year(datepart(year_dttm))=2022;
  if put(year_dttm,dtyear4.)='2022';
run;

 

Kelly_K
Fluorite | Level 6

Actually, the variable I call "Date" is not actually a date.  It is an indicator indicating a participant's survey year, and I created it during the data cleaning stage.  However, in the process of writing up an explanation of how I created the variable, I figured out that the problem related to my use of the variable labels in the Where statement and not the actual values.  

Here is the process I used to create the variable:  Starting with two separate datasets, one for survey year 2008 and one for survey year 2016, and I cleaned each one separately before merging them to create the dataset I called "dataset".  I created and formatted the variables I'm calling "year" using the simple code:

*While cleaning the 2008 data;

value year 0="2008" 1="2016;
year = 0    
format year year.;

*While cleaning the 2016 data;
value year 0="2008" 1="2016;
year = 1    
format year year.;

I then merged the 2008 and 2016 datasets into a single dataset, which included a binary variable called "year" with two values, 0 and 1, and two labels 2008 and 2016.  And that is the source of the problem -- I needed to use the values after the equal (=) sign in the Where statement and not the labels.  The following code worked perfectly:

data subset;
     set dataset;
     where year=0;
run;

My apologies for the fire drill!  It didn't occur to me that the use of the name "date" would signal an actual date variable and not a plain vanilla numeric variable that happened to be a year.  And since I created the variable more than 6 months ago, I had forgotten the actual values were 0 and 1 and not 2008 and 2016.  

Thanks so much for your quick response -- it helped me figure out the problem.

Kelly

SASKiwi
PROC Star

@Kelly_K - Perhaps this is also a lesson about naming your variables correctly based on their contents 🙂

Kelly_K
Fluorite | Level 6

Gee, thanks for the unsolicited advice.  

Tom
Super User Tom
Super User

Your terminology usage is off.

A variable can only have one LABEL.  It is used to assign a more descriptive string to a variable that can be used for example as the column header when printing a report.  So in your example you might have wanted to attach the a label like "Survey YEAR" to the variable YEAR.

 

What you are calling a LABEL is instead the FORMATTED value.  So your variable is numeric with values of 0 or 1.  You have associated a format with the variable so that the values are displayed using the result of applying the format to the values the variable contains.  So 0 is printed as the text string "2008" and 1 is printed as the string "2016".

 

If you want to test based on the formatted value of a variable you use the PUT() (or putn() or putc()) function

data subset;
     set dataset;
     where put(year,year.)='2008';
run;

or if you use the VVALUE() function you don't need to know the name of the format attached to the variable, but you cannot use that in a WHERE.  So use a subsetting IF instead.

data subset;
     set dataset;
     if vvalue(year)='2008';
run;

 

Kelly_K
Fluorite | Level 6

Actually, the variable I call "Date" is not actually a date. It is an indicator indicating a participant's survey year, and I created it during the data cleaning stage. However,

in the process of writing up an explanation of how I created the variable, I figured out that the problem related to my use of the variable labels in the Where statement

and not the actual values.

 

Here is the process I used to create the variable: Starting with two separate datasets, one for survey year 2008 and one for survey year 2016, and I cleaned each one

separately before merging them to create the dataset I called "dataset". I created and formatted the variables I'm calling "year" using the simple code:

*WHILE CLEANING THE 2008 DATA; 
value year 0="2008" 1="2016; 
*As part of a data step; 
year = 0; 
format year year.; 

*WHILE CLEANING THE 2016 DATA; 
value year 0="2008" 1="2016; 
*As part of a data step; 
year = 1 
format year year.;

I then merged the 2008 and 2016 datasets into a single dataset, which included a binary variable called "year" with two values, 0 and 1, and two labels 2008 and 2016

And that is the source of the problem -- I needed to use the values after the equal (=) sign in the Where statement and not the labels. The following code worked perfectly.

data sample2008;
     set combined;
     where year=0;
run;

So sorry for the fire drill.  It hadn't occurred to me that naming the variable "Year" would signal to others that it was a year variable and not a plain vanilla numeric variable.  And since I created the variable more than 6 months ago, I had forgotten that the actual values were 0 and 1.  

 

Thanks for your quick reply — it helped me figure out the problem.

 

Kelly

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 948 views
  • 3 likes
  • 4 in conversation