Crawler PNG and SVG Icon
AWS Glue Crawler is a component that automatically scans data sources, infers schemas, and creates metadata tables in the AWS Glue Data Catalog.
Last Modified: August 29, 2025

16px
32px
48px
64px
Details
Key Features
- Automatically scans data sources to detect schema and metadata.
- Populates AWS Glue Data Catalog entries.
- Supports incremental crawls for efficiency.
- Integrates with Amazon S3, RDS, Redshift, and JDBC sources.
Common Use Cases
- Automatically discovering and cataloging new datasets in S3
- Updating schema changes in AWS Glue Data Catalog
- Classifying data by file type and structure for ETL jobs
Explore More Icons
Resource Explorer
AWS Resource Explorer enables you to search and discover AWS resources across regions and accounts from a single location.
Outposts rack
AWS Outposts rack is a part of the Outposts family that delivers AWS compute and storage racks to on-premises locations for low-latency applications.
Cloud Control API
AWS Cloud Control API is a set of common APIs for creating, reading, updating, deleting, and listing cloud resources across AWS and third-party services.
Infrastructure Composer
AWS Infrastructure Composer is a visual tool that helps developers create and deploy infrastructure using AWS CloudFormation templates more easily.
Kinesis
Amazon Kinesis is a platform on AWS to collect, process, and analyze real-time streaming data at scale for insights and operational responses.
Elastic Load Balancing
Elastic Load Balancing automatically distributes incoming traffic across multiple targets to ensure application scalability and fault tolerance.
Command Line Interface
AWS Command Line Interface (CLI) is a tool that enables you to manage AWS services and resources through commands in your terminal.
App Mesh
AWS App Mesh is a service mesh that provides application-level networking to make it easy to monitor and control microservices running on AWS.
IQ
AWS IQ is a marketplace that connects AWS customers with certified freelancers and consulting partners for on-demand project help and expert support.
Bottlerocket
Bottlerocket is a Linux-based open-source operating system purpose-built by AWS for running containers securely and efficiently.
Elastic Inference
Amazon Elastic Inference allows you to attach low-cost GPU-powered inference acceleration to Amazon EC2 and SageMaker instances.
Pinpoint
Amazon Pinpoint is a flexible and scalable outbound and inbound marketing communications service for sending targeted messages to customers across multiple channels.
FSx for Lustre
Amazon FSx for Lustre provides a high-performance file system optimized for fast processing of workloads like machine learning, HPC, and analytics.
Clean Rooms
AWS Clean Rooms is a privacy-enhancing collaboration service that enables multiple parties to analyze their collective data without sharing raw data.
QuickSight
Amazon QuickSight is a cloud-powered business intelligence (BI) service that enables you to visualize and share insights from your data with interactive dashboards.
Service Management Connector
AWS Service Management Connector integrates AWS services like Service Catalog with third-party ITSM tools such as ServiceNow or Jira Service Management.
ECS Task
Amazon ECS Task is the smallest deployable unit in ECS, representing a single running container or group of containers defined by a task definition.
Cost and Usage Report
AWS Cost and Usage Report (CUR) provides the most detailed information available about your AWS costs and usage, exported to Amazon S3 for advanced analysis.
DataZone
AWS DataZone is a data management service that helps you catalog, share, govern, and access data across organizational boundaries in a secure and scalable way.
Console Mobile Application
The AWS Console Mobile Application allows you to view and manage a select set of AWS resources from your mobile device.
Wavelength
AWS Wavelength brings AWS services to the edge of the 5G network, minimizing latency for mobile and edge applications by deploying compute closer to users.
IoT ExpressLink
AWS IoT ExpressLink provides easy and secure connectivity to AWS IoT Core through hardware modules preloaded with AWS firmware.
NAT Gateway
A NAT Gateway enables instances in a private subnet to connect to the internet while preventing unsolicited inbound traffic.
Polly
Amazon Polly is a text-to-speech (TTS) service that uses deep learning to synthesize lifelike human speech in multiple languages.