Crawler PNG and SVG Icon
AWS Glue Crawler is a component that automatically scans data sources, infers schemas, and creates metadata tables in the AWS Glue Data Catalog.
Last Modified: August 29, 2025
16px
32px
48px
64px
Details
Key Features
- Automatically scans data sources to detect schema and metadata.
- Populates AWS Glue Data Catalog entries.
- Supports incremental crawls for efficiency.
- Integrates with Amazon S3, RDS, Redshift, and JDBC sources.
Common Use Cases
- Automatically discovering and cataloging new datasets in S3
- Updating schema changes in AWS Glue Data Catalog
- Classifying data by file type and structure for ETL jobs
Explore More Icons
EKS Distro
Amazon EKS Distro (EKS-D) is the open-source distribution of the same Kubernetes components used by Amazon EKS, enabling consistent cluster operations on any infrastructure.
Data Lake
AWS Data Lake is a centralized, scalable, and secure data repository that allows you to store and analyze all your structured and unstructured data.
Secrets Manager
AWS Secrets Manager helps you securely store, retrieve, rotate, and manage access to sensitive information such as database credentials and API keys.
Outposts family
AWS Outposts family consists of fully managed solutions that extend AWS infrastructure, services, and tools to on-premises locations for a hybrid cloud experience.
Hosted Zone
A Hosted Zone in Amazon Route 53 is a container for records that define how traffic is routed for a domain and its subdomains.
Rekognition
Amazon Rekognition is a computer vision service that enables image and video analysis for face detection, object recognition, and more.
DocumentDB
Amazon DocumentDB is a scalable, fully managed document database service that supports MongoDB workloads.
Professional Services
AWS Professional Services is a global team of experts that helps customers realize their desired business outcomes using the AWS Cloud through specialized engagements.
CloudSearch
Amazon CloudSearch is a managed service that makes it simple to set up, manage, and scale a search solution for your website or application.
Thinkbox XMesh
Thinkbox XMesh is a geometry caching system that optimizes complex animated geometry workflows in 3D applications.
Systems Manager
AWS Systems Manager gives you visibility and control of your AWS infrastructure by unifying resource management under one interface.
S3 on Outposts
Amazon S3 on Outposts brings object storage to on-premises environments using AWS Outposts, enabling data residency and low-latency workloads.
Elemental Server
AWS Elemental Server is an on-premises video processing system that converts input video for distribution to TVs, PCs, and mobile devices.
Elastic Inference
Amazon Elastic Inference allows you to attach low-cost GPU-powered inference acceleration to Amazon EC2 and SageMaker instances.
Pinpoint APIs
Amazon Pinpoint APIs provide programmatic access to campaigns, user segments, message templates, and analytics for engaging customers through push, email, and SMS.
App Studio
AWS App Studio is a development environment to build generative AI applications quickly using visual tools and built-in integrations.
Nova
Amazon Nova refers to internal AI infrastructure or services (if announced); details may vary as it's not yet publicly defined.
DataSync
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage and AWS.
Simple Queue Service
Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables decoupling and scaling of microservices and distributed systems.
Red Hat OpenShift Service on AWS
Red Hat OpenShift Service on AWS (ROSA) is a fully managed service that enables you to run Red Hat OpenShift, a Kubernetes-based container platform, directly on AWS.
HealthOmics
Amazon Omics is a purpose-built service for storing, querying, and analyzing genomic, transcriptomic, and other omics data at scale.
Entity Resolution
AWS Entity Resolution is a machine learning-powered service that helps match, link, and deduplicate records across datasets for accurate data consolidation.
Managed Services
AWS Managed Services (AMS) helps enterprises operate their AWS infrastructure by providing ongoing management, monitoring, patching, and operational support.
HealthLake
Amazon HealthLake is a HIPAA-eligible service that stores, transforms, and analyzes health data in the FHIR format for advanced analytics and ML.