The best way to handle a dataset where where we have countries that a...

cleokatt · Posted 05-04-2024 10:15 AM

Hi,
I have a dataset where the year goes from [2000,2015] where I have these countries:
Spain, France, Denmark and European Union and let's say two colomns; company='A' and 'B' and value='dollar'

Q1: if Spain for company A have average vaule 0.288

Denmark also

France also

for aaaalll of the 15 years.

BUT! We have EU (recall, France, Spain, Denmark are all in European Union (EU)) and for one (1) year EU have the avarage for the same company A, have tha average vaule at 1,000

Q2: For company B;, France, Spain, Denmark and European Union (EU)) have data from [2000,2015] and have ~the same avarage all the four, how to solve that issue besT?

Tom · Posted 05-04-2024 10:44 AM

I cannot figure out what you are asking. I cannot figure out what data you have.

Are you asking how to make a dataset to record the information?

Or do you already have a dataset and are asking how to analyze it? If so then what is the dataset you have? What are the variable names, their types and how many observations do you have?

Here a pure guess about what you might have based on snippets of information pull from different parts of your question:

data have;
  length compary country $20 year value 8 currency $10 ;
  input company -- currency ;
cards;
A Spain 2000 100 Euro
A France 2000 200 Euro
B US 2002 300 Dollar
;

Am I close to guessing what your data look like?

cleokatt · Posted 05-04-2024 10:58 AM

Hi,

My dataset have #obs=3000~ and it looks like this: this is just me and my fiction data that I just trying to be better so I can become an data analyst, and let's say that we have a data set like this?

country	country	avg(value)	how many we have data years
DE	Germany	0.11	14
CA	Canada	0.2	15
NO	Norway	2.0	16
FR	France	0.11	15
TR	Turkey	0.11	15
UA	Ukraine	0.11	3
World	World	1.00	16
ID	Indonesia	0.5	9
IN	India	0.5	15
GB	United Kingdom	0.8	15
Non-EU Europe	Non-EU Europe	1.0	15
CH	Schweiz	0.8	9
EU	European Union	1.0	15

See? Germany and France are in EU but we have "in total" a row that have the name EU, isn't that a bias? How to solve that?

And for Ukraine example, it only have data for 3years, how to solve that?

And how about the row "World" all of the countires above is in the world, how to solve that?

Tom · Posted 05-04-2024 01:11 PM

Still not communicating clearly.

First let's convert your LISTING into an actual dataset. A dataset cannot have two variables with the same name. And you don't want to use variable names that have parentheses and other strange characters in them. You could put such text in to the variable's LABEL if you want so that it can be used to make other pretty listings.

data have;
  infile cards dsd dlm='|' truncover;
  input country1 :$20. country2 :$20. value years;
cards;
DE|Germany|0.11|14
CA|Canada|0.2|15
NO|Norway|2.0|16
FR|France|0.11|15
TR|Turkey|0.11|15
UA|Ukraine|0.11|3
World|World|1.00|16
ID|Indonesia|0.5|9
IN|India|0.5|15
GB|United Kingdom|0.8|15
Non-EU Europe|Non-EU Europe|1.0|15
CH|Schweiz|0.8|9
EU|European Union|1.0|15
;

Now what is the question.

Is the goal to create such data? If so what does the SOURCE data look like?

Is the goal to use such data? If so what is it you want to create? Do you just want to make a REPORT (textual or graphical)? Or do you need to generate some aggregated data?

I suspect that perhaps you are trying to ask how to summarize data across multiple dimensions. So perhaps you want the average for each country but also an average for all countries in one region (geographical regions such as Europe or North America. or perhaps political regions such as the EU.)

SAS has lots of features to help with that question. Which to use depends on how you have the source data and what type of report you want to produce.

cleokatt · Posted 05-04-2024 02:18 PM

Just want to learn how to solve this kind of dataset 😕

but say just for generate some aggregated data

Quentin · Posted 05-04-2024 03:05 PM

We can't provide a solution, without more of a question. We can't solve a dataset, but we can solve a question about how to get some specific information from a dataset.

At a high level, I agree with your thought that having rows in the data for Country='World' and Country='EU' is a bad idea. They're not countries. If you want to calculate and average value for all countries using PROC MEANS, you would not want to have rows for World and EU be included in that calculation.

If I were given that dataset, my first thought would be to delete the rows that are not actually countries. But whether I did that or not would depend on my goal.

In a single dataset, I only want one level of aggregation. I might have a dataset for country information, and another dataset for city information, and another for region information. But I wouldn't want a dataset where some rows are for countries, some are for cities, and some are for regions.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

cleokatt · Posted 05-05-2024 06:03 AM

@Quentin wrote:
In a single dataset, I only want one level of aggregation. I might have a dataset for country information, and another dataset for city information, and another for region information. But I wouldn't want a dataset where some rows are for countries, some are for cities, and some are for regions.

So you would split the dataset into three different sets?

Tom · Posted 05-05-2024 10:48 AM

@cleokatt wrote:

So you would split the dataset into three different sets?

That is possible. But it might be easier to just have one dataset that uses two ID variables to identify the observations. One could indicate the TYPE of region and other the NAME of the region.

data have ;
  infile card dsd truncover;
  input region_type :$10. region_name :$30.  value;
cards;
country,USA,100
country,ENGLAND,200
continent,EUROPE,300
;

The best way to handle a dataset where where we have countries that are also involved in an other?

Re: The best way to handle a dataset where where we have countries that are also involved in an oth

Re: The best way to handle a dataset where where we have countries that are also involved in an oth

Re: The best way to handle a dataset where where we have countries that are also involved in an oth

Re: The best way to handle a dataset where where we have countries that are also involved in an oth

Re: The best way to handle a dataset where where we have countries that are also involved in an oth

Re: The best way to handle a dataset where where we have countries that are also involved in an oth

Re: The best way to handle a dataset where where we have countries that are also involved in an oth

SAS Innovate 2025: Call for Content