BookmarkSubscribeRSS Feed
cleokatt
Calcite | Level 5

Hi, 
I have a dataset where the year goes from [2000,2015] where I have these countries: 
Spain, France, Denmark and European Union and let's say two colomns; company='A' and 'B' and value='dollar' 

 

Q1: if Spain for company A have average vaule 0.288

        Denmark also

        France also 

for aaaalll of the 15 years.

 

BUT! We have EU (recall, France, Spain, Denmark are all in European Union (EU)) and for one (1) year EU have the avarage for the same company A, have tha average vaule at 1,000

 

 

Q2:  For company B;, France, Spain, Denmark and  European Union (EU)) have data from [2000,2015] and have ~the same avarage all the four, how to solve that issue besT?

7 REPLIES 7
Tom
Super User Tom
Super User

I cannot figure out what you are asking.  I cannot figure out what data you have.

Are you asking how to make a dataset to record the information?

Or do you already have a dataset and are asking how to analyze it?  If so then what is the dataset you have?  What are the variable names, their types and how many observations do you have?

 

Here a pure guess about what you might have based on snippets of information pull from different parts of your question:

data have;
  length compary country $20 year value 8 currency $10 ;
  input company -- currency ;
cards;
A Spain 2000 100 Euro
A France 2000 200 Euro
B US 2002 300 Dollar
;

Am I close to guessing what your data look like?

cleokatt
Calcite | Level 5

Hi,


My dataset have #obs=3000~ and it looks like this: this is just me and my fiction data that I just trying to be better so I can become an data analyst, and let's say that we have a data set like this?

countrycountryavg(value)how many we have data years
DEGermany0.1114
CACanada0.215
NONorway2.016
FRFrance0.1115
TRTurkey0.1115
UAUkraine0.113
WorldWorld1.0016
IDIndonesia0.59
INIndia0.515
GBUnited Kingdom0.815
Non-EU EuropeNon-EU Europe1.015
CHSchweiz0.89
EUEuropean Union1.015

 

See? Germany and France are in EU but we have "in total" a row that have the name EU, isn't that a bias? How to solve that?

And for Ukraine example, it only have data for 3years, how to solve that?

 

And how about the row "World" all of the countires above is in the world, how to solve that?

Tom
Super User Tom
Super User

Still not communicating clearly.

First let's convert your LISTING into an actual dataset.  A dataset cannot have two variables with the same name. And you don't want to use variable names that have parentheses and other strange characters in them.  You could put such text in to the variable's LABEL if you want so that it can be used to make other pretty listings.

data have;
  infile cards dsd dlm='|' truncover;
  input country1 :$20. country2 :$20. value years;
cards;
DE|Germany|0.11|14
CA|Canada|0.2|15
NO|Norway|2.0|16
FR|France|0.11|15
TR|Turkey|0.11|15
UA|Ukraine|0.11|3
World|World|1.00|16
ID|Indonesia|0.5|9
IN|India|0.5|15
GB|United Kingdom|0.8|15
Non-EU Europe|Non-EU Europe|1.0|15
CH|Schweiz|0.8|9
EU|European Union|1.0|15
;

Now what is the question.

  Is the goal to create such data?  If so what does the SOURCE data look like?

  Is the goal to use such data? If so what is it you want to create? Do you just want to make a REPORT (textual or graphical)?  Or do you need to generate some aggregated data?

 

I suspect that perhaps you are trying to ask how to summarize data across multiple dimensions.  So perhaps you want the average for each country but also an average for all countries in one region (geographical regions such as Europe or North America.  or perhaps political regions such as the EU.)

 

SAS has lots of features to help with that question.  Which to use depends on how you have the source data and what type of report you want to produce. 

cleokatt
Calcite | Level 5
Just want to learn how to solve this kind of dataset 😕

but say just for generate some aggregated data
Quentin
Super User

We can't provide a solution, without more of a question.  We can't solve a dataset, but we can solve a question about how to get some specific information from a dataset.

 

At a high level, I agree with your thought that having rows in the data for Country='World' and Country='EU' is  a bad idea.  They're not countries.  If you want to calculate and average value for all countries using PROC MEANS, you would not want to have rows for World and EU be included in that calculation.  

 

If I were given that dataset, my first thought would be to delete the rows that are not actually countries.  But whether I did that or not would depend on my goal.

 

In a single dataset, I only want one level of aggregation.  I might have a dataset for country information, and another dataset for city information, and another for region information.  But I wouldn't want a dataset where some rows are for countries, some are for cities, and some are for regions.  

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
cleokatt
Calcite | Level 5

@Quentin wrote:

In a single dataset, I only want one level of aggregation.  I might have a dataset for country information, and another dataset for city information, and another for region information.  But I wouldn't want a dataset where some rows are for countries, some are for cities, and some are for regions.  


So you would split the dataset into three different sets?

Tom
Super User Tom
Super User

@cleokatt wrote:


So you would split the dataset into three different sets?


That is possible.  But it might be easier to just have one dataset that uses two ID variables to identify the observations.  One could indicate the TYPE of region and other the NAME of the region. 

 

data have ;
  infile card dsd truncover;
  input region_type :$10. region_name :$30.  value;
cards;
country,USA,100
country,ENGLAND,200
continent,EUROPE,300
;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 744 views
  • 1 like
  • 3 in conversation