Saturday, December 30, 2023

AWSCertCP: AWS Analytics Services, AWS Athena, AWS Data Exchange, Amazon EMR, Amazon Glue, Amazon Kinesis

 - AWS Analytics Services 

- Amazon Athena

- AWS Data Exchange 

- Amazone EMR

- AWS Glue 

- Amazon Kinesis 



Amazon Athena 

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Amazon Athena also makes it easy to interactively run data analytics using Apache Spark without having to plan for, configure or manage resources. When running apache spark application on Athena, one need to submit the spark code for processing and receive the results directly. Apache spark applications can be developed using simple notebook experience in Amazon Athena console. 


Athena SQL or Apache Spark are server less so no need to setup infrastructure or manage and we need to pay only pay for the queries we run. Athena also can scale automatically running queries in parallel so the results are fast even with large data set or complex queries. 

A sample Athena console looks like this




AWS Data Exchange:

AWS Data Exchange is a service that allows users to easily share and manage data entitlements from other organizations at scale. It also allows users to quickly identify, subscribe to, and use third-party data


Amazone EMR 

Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.




Usecases are:

Perform big data analytics: Run large-scale data processing and what-if analysis using statistical algorithms and predictive models to uncover hidden patterns, correlations, market trends, and customer preferences.

Build scalable data pipelines: Extract data from a variety of sources, process it at scale, and make it available for applications and users.

Process real-time data streams: Analyze events from streaming data sources in real-time to create long-running, highly available, and fault-tolerant streaming data pipelines.

Accelerate data science and ML adoption: Analyze data using open-source ML frameworks such as Apache Spark MLlib, TensorFlow, and Apache MXNet. Connect to Amazon SageMaker Studio for large-scale model training, analysis, and reporting.

AWS Glue 

Preparing your data to obtain quality results is the first step in an analytics or ML project. AWS Glue is a serverless data integration service that makes data preparation simpler, faster, and cheaper. You can discover and connect to over 70 diverse data sources, manage your data in a centralized data catalog, and visually create, run, and monitor ETL pipelines to load data into your data lakes.

Features are: 

- Flexible support for ETL, ELT, batch, streaming and more, with no lock-in
- Petabyte scale, pay-as-you-go billing, any data size
- Support all data users from developers to business users
- Complete data integration capabilities in one serverless service

AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.




Glue provides Various tools for data scientists, analyzers for ETL and ELT tasks 

Amazon Kinesis 

Collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.
Ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications.
Process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

Use Cases
Build video analytics applications: Securely stream video from camera-equipped devices in homes and public places to AWS and use these video streams for security monitoring, face detection,and other analytics.
Evolve from batch to real-time analytics: Perform real-time analytics on data that has been traditionally analyzed using batch processing. For example, sharing data between different applications and streaming extract-transform-load.
Build real-time applications: Use Kinesis for real-time applications such as application monitoring, fraud detection, and live leader-boards to learn about what your customers and applications are doing right now and react promptly.
Analyze IoT device data: Process streaming data from IoT devices and use the data to send real-time alerts or take other actions programmatically when a sensor exceeds certain operating thresholds.

Basically , mostly all the time encoded data can be injected and analysed. 
It provides APIs also and get the analyzes with the timestamps. Provides video recognition.
 

Amazon Kinesis Data Firehose 
Useful to extract the data and run some JQs to place the data into various storage such as S3 



In this image above, it attempts to create partitions in S3 for storing the data according to a custom data ventilator Id. 


references:

https://d1.awsstatic.com/training-and-certification/docs-cloud-practitioner/AWS-Certified-Cloud-Practitioner_Exam-Guide.pdf



No comments:

Post a Comment