Which AWS services allow users to run SQL queries against data stored in Amazon S3?

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. With a few clicks in the AWS Management Console, customers can point Athena at their data stored in S3 and begin using standard SQL to run interactive queries and get results in seconds. Athena is serverless, so there is no infrastructure to setup or manage, and customers pay only for the queries they run. You can use Athena to process logs, perform data analytics, and run interactive queries. Athena scales automatically – executing queries in parallel – so results are fast, even with large datasets and complex queries.  

Serverless. Zero infrastructure. Zero administration.

Amazon Athena is serverless, so there is no infrastructure to manage. You don’t need to worry about configuration, software updates, failures or scaling your infrastructure as your datasets and number of users grow. Athena automatically takes care of all of this for you, so you can focus on the data, not the infrastructure.

Easy to get started

To get started, log into the Athena console, define your schema using the console wizard or by entering DDL statements, and immediately start querying using the built-in query editor. You can also use AWS Glue to automatically crawl data sources to discover data and populate your Data Catalog with new and modified table and partition definitions. Results are displayed in the console within seconds, and automatically written to a location of your choice in S3. You can also download them to your desktop. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

Easy to query, just use standard SQL

Amazon Athena uses Presto, an open source, distributed SQL query engine optimized for low latency, interactive data analysis. This means you can run queries against large datasets in Amazon S3 using ANSI SQL, with full support for large joins, window functions, and arrays. Athena supports a wide variety of data formats such as CSV, JSON, ORC, Avro, or Parquet. With Athena’s federated data source connectors, you can query additional data stores and join the data with data stored in Amazon S3. You can access Athena and run queries from the Athena console, API, CLI, AWS SDK, and supported business intelligence and SQL development applications through Athena's JDBC and ODBC drivers.

Pay per query

With Amazon Athena, you pay only for the queries that you run. You are charged based on the amount of data scanned by each query. You can get significant cost savings and performance gains by compressing, partitioning, or converting your data to a columnar format, because each of those operations reduces the amount of data that Athena needs to scan to execute a query.

Fast performance

With Amazon Athena, you don’t have to worry about managing or tuning clusters to get fast performance. Athena is optimized for fast performance with Amazon S3. Athena automatically executes queries in parallel, so that you get query results in seconds, even on large datasets.  

Highly available & durable

Amazon Athena is highly available and executes queries using compute resources across multiple facilities, automatically routing queries appropriately if a particular facility is unreachable. Athena uses Amazon S3 as its underlying data store, making your data highly available and durable. Amazon S3 provides durable infrastructure to store important data and is designed for durability of 99.999999999% of objects. Your data is redundantly stored across multiple facilities and multiple devices in each facility.

Secure

Amazon Athena allows you to control access to your data by using AWS Identity and Access Management (IAM) policies, access control lists (ACLs), and Amazon S3 bucket policies. With IAM policies, you can grant IAM users fine-grained control to your S3 buckets. By controlling access to data in S3, you can restrict users from querying it using Athena. Athena also allows you to easily query encrypted data stored in Amazon S3 and write encrypted results back to your S3 bucket. Both, server-side encryption and client-side encryption are supported.

Integrated

Amazon Athena integrates out-of-the-box with AWS Glue. With Glue Data Catalog, you will be able to create a unified metadata repository across various services, crawl data sources to discover data and populate your Data Catalog with new and modified table and partition definitions, and maintain schema versioning. You can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize query performance and reduce costs. Learn more about AWS Glue.

Federated query

Athena provides connectors for enterprise data sources including Amazon DynamoDB, Amazon Redshift, Amazon OpenSearch, MySQL, PostgreSQL, Redis, and other popular third-party data stores. Athena’s data connectors allow you to generate insights from multiple data sources using Athena’s easy-to-use SQL syntax and without the need to move your data with ETL scripts. Data connectors run as AWS Lambda functions and can be enabled for cross-account access which allow you to scale SQL queries to hundreds of end-users. For a list of supported sources, see Using Athena Data Source Connectors and to learn how to build a custom data source connector, see Athena’s connector SDK. 

Machine learning

You can invoke your SageMaker Machine Learning models in an Athena SQL query to run inference. The ability to use ML models in SQL queries makes complex tasks such anomaly detection, customer cohort analysis and sales predictions as simple as writing a SQL query. Athena makes it easy for anyone with SQL experience to run ML models deployed on Amazon SageMaker.

Which AWS services allow users to run SQL queries against data stored in Amazon S3?

Learn more about Amazon Athena pricing

Explore all pricing options offered with Amazon Athena.

Learn more 

Which AWS services allow users to run SQL queries against data stored in Amazon S3?

Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 

Which AWS services allow users to run SQL queries against data stored in Amazon S3?

Start building on the console

Get started building with Amazon Athena in the AWS Management Console.

Sign in 

AWS support for Internet Explorer ends on 07/31/2022. Supported browsers are Chrome, Firefox, Edge, and Safari. Learn more »

Which service can be used to run SQL queries on data stored in S3?

S3 Select also supports compression on CSV and JSON objects with GZIP or BZIP2, and server-side encrypted objects. You can perform SQL queries using AWS SDKs, the SELECT Object Content REST API, the AWS Command Line Interface (AWS CLI), or the AWS Management Console.

Which AWS service can be used to load data from Amazon S3?

Alternatively, a web-based interface for accessing and managing Amazon S3 resources is available via the AWS Management Console.

Can you use SQL on S3?

S3 Select is an AWS S3 feature that allows developers to run SQL queries on objects in S3 buckets.

Does Athena only work with S3?

This is very similar to other SQL query engines, such as Apache Drill. But unlike Apache Drill, Athena is limited to data only from Amazon's own S3 storage service. However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc.