BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jiantos
Fluorite | Level 6

Hello SAS community,

 

I have one datafile called intervention_dataset, that has assessments completed from 2015-2023 (n=1925), and I am looking to match a pair of assessment outcomes (from different children) between 2015-2020 (n=1225), and 2020-2023 (n= 700) on key variables. I think I can conduct a chi-square using categorical variables, possibly with a 'where' statement, however, I am unsure if I need to create a new dataset or column in order to analyze both reference 'between' dates. Maybe there is a better analytic solution...

 

E.g. of very faulty logic:

 

Proc freq data= intervention_dataset

Tables var1 var2 var3 completed between '06MAR2015'd and '16MAR2020'd by var1 var2 var 3 completed between '17MAR2020'd and '06MAR2023'd/chisq;

run;

 

Any advice or solutions to creating a new column or coding for 'between' reference dates, or is there a simpler process to above? I am using SAS 9.4.

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

No data so can't test anything.

You need to have all of the values in a single data set, which I am not sure if that is the case as you description sort of rambles.

 

If you want to treat a group of values as a single level in something like a chi-square test then create a custom format

 

Proc format;
value mydategroup
'06Mar2015'd - '16Mar2020'd ='2015-2020'
'17Mar2020'd -'06Mar2023'd ='2020-2023'
;
run;

proc freq data=intervention_dataset;
   tables (var1 var2 var3) *completed / chisq;
   format completed mydategroup.;
run;

Formats are quite often the easiest most flexible way to create any grouping of values based on a single variable.

You could create a different format such as 2015-2018, 2018-2021 and 2021-2023 (just as an example) and change the analysis by using the different format.

 

Formats allow you to avoid adding multiple variables (i.e. "columns") to you data, which with large data sets sometimes takes noticeable amounts of time. Also the logic can be a bit simpler.

 

My SAS set up typically has 10 or so age group formats, such as specific ages based on topic specific boundaries, and 3, 5 and 10 year age bands. Then just apply the specific age group format for the specific topic. In this case I often have the format define a group of values "age not of interest" or similar so the doesn't need to be any fancy "where" clauses.

View solution in original post

4 REPLIES 4
ballardw
Super User

No data so can't test anything.

You need to have all of the values in a single data set, which I am not sure if that is the case as you description sort of rambles.

 

If you want to treat a group of values as a single level in something like a chi-square test then create a custom format

 

Proc format;
value mydategroup
'06Mar2015'd - '16Mar2020'd ='2015-2020'
'17Mar2020'd -'06Mar2023'd ='2020-2023'
;
run;

proc freq data=intervention_dataset;
   tables (var1 var2 var3) *completed / chisq;
   format completed mydategroup.;
run;

Formats are quite often the easiest most flexible way to create any grouping of values based on a single variable.

You could create a different format such as 2015-2018, 2018-2021 and 2021-2023 (just as an example) and change the analysis by using the different format.

 

Formats allow you to avoid adding multiple variables (i.e. "columns") to you data, which with large data sets sometimes takes noticeable amounts of time. Also the logic can be a bit simpler.

 

My SAS set up typically has 10 or so age group formats, such as specific ages based on topic specific boundaries, and 3, 5 and 10 year age bands. Then just apply the specific age group format for the specific topic. In this case I often have the format define a group of values "age not of interest" or similar so the doesn't need to be any fancy "where" clauses.

jiantos
Fluorite | Level 6

Thank you for your support, I truly appreciate it. I now understand formats much better!

 

That said, when I use the code that you've provided, I recieve a contingency table with the MM/DD/YYYY, as opposed to a composite of all outcomes within the restricted dates.

 

Attached is a screenshot of what I am referring to. Any ideas on how to resolve this?

 

ballardw
Super User

Data.

Actual data.

When output doesn't match expected then the likely place to look is the LOG. Copy the submitted code, including the Proc format if you used it, the proc freq and all of the associated messages. Open a text box on the forum  using the </> icon and paste.

 

I strongly suspect that you have a warning about the format as your values are character and not date values as the format expects.

Or run proc contents on your data set and share the result.

jiantos
Fluorite | Level 6
Data intervention_dataset
Proc format;
value mydategroup
'06Mar2015'd - 16Mar2020'd='2015-2020'
'17mar2020'd - '07Mar2023'd='2020-2023'
;
run;

Proc freq data=intervention_dataset;
tables (Var1 Var2 Var3) *complete_d / chisq;
format completed mydategroup.;
run;

The warning received from the LOG was:

 

Proc freq data=intervention_dataset;

tables (Var1 Var2 Var3) *complete_d / chisq;

format completed mydategroup.;

WARNING: Variable COMPLETED not found in data set work.intervention_dataset.

run;

 

FYI, complete_d is the datasets date variable. Hopefully this information can help explain the issue...

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 442 views
  • 1 like
  • 2 in conversation