Crawler PNG and SVG Icon
AWS Glue Crawler is a component that automatically scans data sources, infers schemas, and creates metadata tables in the AWS Glue Data Catalog.
Last Modified: August 29, 2025
16px
32px
48px
64px
Details
Key Features
- Automatically scans data sources to detect schema and metadata.
- Populates AWS Glue Data Catalog entries.
- Supports incremental crawls for efficiency.
- Integrates with Amazon S3, RDS, Redshift, and JDBC sources.
Common Use Cases
- Automatically discovering and cataloging new datasets in S3
- Updating schema changes in AWS Glue Data Catalog
- Classifying data by file type and structure for ETL jobs
Explore More Icons
IoT Device Defender
AWS IoT Device Defender is a fully managed service that helps secure your fleet of IoT devices by continuously auditing and monitoring security policies.
Elemental MediaStore
AWS Elemental MediaStore is a storage service optimized for media that offers the performance, consistency, and low latency required for video workloads.
QuickSight
Amazon QuickSight is a cloud-powered business intelligence (BI) service that enables you to visualize and share insights from your data with interactive dashboards.
Managed Streaming for Apache Kafka
Amazon MSK (Managed Streaming for Apache Kafka) is a fully managed service for building and running applications using Apache Kafka on AWS.
Cognito
Amazon Cognito provides user authentication, authorization, and user management for web and mobile apps, with social and enterprise identity federation support.
Volume
Volume refers to block storage resources like EBS volumes that can be attached to EC2 instances for durable, low-latency storage.
Managed Services
AWS Managed Services (AMS) helps enterprises operate their AWS infrastructure by providing ongoing management, monitoring, patching, and operational support.
MemoryDB
Amazon MemoryDB for Redis is a Redis-compatible, in-memory database service designed for ultra-fast performance and durability.
Amazon DynamoDB Accelerator (DAX)
Amazon DynamoDB Accelerator (DAX) is a fully managed, in-memory cache for DynamoDB that delivers up to a 10x performance improvement for read-heavy workloads.
Distro for OpenTelemetry
AWS Distro for OpenTelemetry is a secure, production-ready distribution of the OpenTelemetry project for collecting observability data.
Resource Explorer
AWS Resource Explorer enables you to search and discover AWS resources across regions and accounts from a single location.
ECS Anywhere
Amazon ECS Anywhere extends Amazon Elastic Container Service (ECS) to manage and run container workloads on customer-managed infrastructure, including on-premises servers.
AWS
Amazon Web Services (AWS) is a comprehensive cloud computing platform offering over 200 fully featured services including computing, storage, databases, machine learning, analytics, and more to help businesses scale and innovate faster.
HealthImaging
Amazon HealthImaging is a service that stores, transforms, and analyzes medical imaging data at scale using cloud-native tools and standards.
Managed Service for Prometheus
Amazon Managed Service for Prometheus is a fully managed, scalable, and secure monitoring service for container metrics using Prometheus.
Managed Service for Apache Flink
Amazon Managed Service for Apache Flink is a fully managed service for building and running real-time stream processing applications using Apache Flink.
FSx for NetApp ONTAP
Amazon FSx for NetApp ONTAP offers fully managed NetApp file systems on AWS with familiar features like snapshots, clones, and data tiering.
Amplify
AWS Amplify is a set of tools and services that helps developers build scalable, full-stack web and mobile applications on AWS.
Firewall Manager
AWS Firewall Manager is a security management service that makes it easier to centrally configure and manage firewall rules across multiple AWS accounts and resources.
Route 53
Amazon Route 53 is a scalable and highly available Domain Name System (DNS) web service for domain registration and traffic routing.
Serverless Application Repository
AWS Serverless Application Repository is a managed repository for discovering, deploying, and publishing serverless applications built with Lambda and other AWS services.
Simple Queue Service
Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables decoupling and scaling of microservices and distributed systems.
Proton
AWS Proton is a fully managed application delivery service that helps platform teams standardize and automate infrastructure and deployment for microservices.
Resilience Hub
AWS Resilience Hub helps you assess and improve the resilience of your applications using AWS best practices.