My AWS certification notes part 3 from A Cloud Guru material

CloudFront

CDN
Can be used to deliver static, dynamic, streaming and interactive content
Geographically disbursed data centers
Edge location – location where content is cached. Different from Region / AZ
Edge locations are not just read only. We can put the objects to sync to origin.
Edge Location will cache the content for the first time use, and serve the content from edge location for the subsequent requests. Request will not goto server. Improve latency
TTL – time to live – objects are cached for the life of TTL. by default 24 hours
Manually clearing cache is allowed. But its chargeable
Origin – Origin of the files, CDN will distribute. This can be S3 or Ec2 or Elastic Load balancer or Route 53, Can work with non aws services as well
Distribution – name of CDN, which consists of collection of edge locations
Web distribution – for websites, http/https
RTMP – for media streaming. Audio/Video
S3 transfer acceleration make a use of cloudfront to accelerate transfer from S3 to end user.
Say for example, if geographically distributed users are uploading multiple files to S3 which is located in UK region.
Instead of uploading the files to S3 directly, they can upload to their nearest edge location and then edge locations accelerate the transfer to S3 trough optimized aws network.
Eg. prefix.s3-accelerate.amazonaws.com
S3 – https://s3-eu-west-1.amazonaws.com/test
https
Website – http://test.s3-website-eu-west-1.amazonaws.com
http
can be secured through cloudfront
Restrict Bucket Access – bucket can be accessed only via CloudFront
Origin access identity
Grant Read Permission on Bucket. By default No. change it to Yes, update the bucket policy. Important step.
Protocol policy – http or https, https only or convert http to https
Restrict Viewer Access – Signed URL or Signed cookies – for specific paid content
Price class – use all edge locations or specific
Alternate domain names
SSL certificate – default or custom (upload custom cert to aws cert manager)
Enable IPv6
Blocklist or whitelist specific geo locations
Invalidate – manually invalidating the content to remove the content from the cache (chargeable)
S3 performance optimization
GET-intensive workload
- If most of the workload is GET, use cloud front to optimize the performance
Mixed workload
- Means mix of GET, PUT, DELETE etc
- Avoid sequential key names for S3 objetcs
- S3 uses key name to determine the which partition an object stored
- For heavy workload, this might cause i/o issues
- So add random prefix to key names, to prevent multiple objects stored in the same partition
Bucket level properties
Versioning
Server access logging
Object level logging
Static website hosting
Tags
Transfer acceleration
Events
Bucket level properties
Storage class
Meta Data
Virtual hosted S3 URL
http://bucket.s3.amazonaws.com
http://bucket.s3-aws-region.amazonaws.com
Path style S3 URL
http://s3.amazonaws.com/bucket –> for US East – N.Virginia
http://s3-aws-region.amazonaws.com/bucket –> for other regions
http://s3.aws-region.amazonaws.com/bucket –> This also works
S3 – https://s3-eu-west-1.amazonaws.com/test
https
Website – http://test.s3-website-eu-west-1.amazonaws.com
http
can be secured through cloudfront
Versioning
Pricing includes all the versions
Delete api will not delete the object permanently. it will just add the delete marker. so this object eligible for year end billing
To delete permanently, you must delete object version id
Cross region replication requires versioning enabled on both source and destination buckets
if you run PCI or HIPAA-compliant workloads, aws recommends to log your cloudfront usage data for last 355 days by
Enabling cloudfront access logs
Capture requests that are sent to the cloudfront API
https://aws.amazon.com/s3/faqs/

Storage Gateway

To back up the data
availiable for download as image
You can install on a host on your datacenter
supports either VMware ESXi or Microsoft Hyper-V
once installed, associate with aws service through activation process
Replicates data from your own data center (on premise) to AWS. You install it as a host on your data center.
4 types
File Gateway (NFS)
- Stores flat files directly in S3
- Word, pdf, image, video
Volumes Gateway (iSCSI)
- block based
- disk volumes
- OS, sql server stored in virtual hard disk
- Stores into S3 in the form of EBS snapshots
- its blockbased. so cant store into S3 directly
- snapshots are incremental
- Stored Volumes
  - Entire data is stored in on site (on prime) and SG then backups this data up asynchronously to Amazon S3 (as EBS snapshot).
  - GS volumes provide durable and inexpensive off-site backups that you can recover locally or on Amazon EC2.
  - 1 GB to 16 GB
- Cached Volumes
  - Only your most frequently accessed data is stored in on prime and Your entire data set is stored in S3 (as EBS snapshot).
  - You don’t have to go out and buy large SAN arrays for your office/data center, so you can get significant cost savings.
  - If you lose internet connectivity however, you will not be able to access all of your data.
  - 1 GB to 32 GB
Tape Gateway (VTL)
- Used for backup
- Limitless collection or VT. Each VT can be stored in a VTL. If it is stored in Glacier, is it a VT Shelf.
- If you use products like NetBackup etc you can do away with this and just the VTL. It will get rid of your physical tapes and create the virtual ones.
- supported by Netbackup, Backup Exec, Veeam

SnowBall

previously import/export disk (legacy)
can transfer your data (both import and export) in aws using amazon’s high speed internal network avoiding internet
Snowball
onBoard storage
Petabyte-scale data transport solution
80 TB snowball in all regions
tamper resistant enclosures
256 bit encryption
Trusted platform module (TPM)
once data transfer completed, aws does the erasure of snowball appliance
Snowball Edge
onBoard storage and compute capabilities
AZ in on prime (like)
can ensure to continue run your applications even they are not able to access cloud. can collect transfer the data to aws later
100 TB data transfer device
SnowMobile
Exabyte-scale data transport solution
can transfer upto 100 PetaByte per snow mobile
45 foot long shipping container truck
secure, fast and cost effective
Snowball can import to S3 and export from S3
if your data is in glacier then export to S3 and export using snowball

Dynamo DB

No SQL DB
You can’t specify AZ when you create DynamoDB table
Supports both document and Key-Value data models
Stored SSD storage – always
Spread across 3 geo locations
Choice of 2 consistency models
Eventual consistent reads (default) – all copies of data will reach all the data centers within second. Read after short time will always return the updated data – best read performance
Strongly consistent reads
It uses conditional writes for consistency. For PutItem, DeleteItem, UpdateItem operations. Operations succeeds only if attributes meet one or more conditions. Otherwise it returns error.
It uses optimistic concurrency control – optimistic locking – its strategy – your DB writes are protected from being overwritten by writes of others and vice versa
Supports atomic counters – all write requests are applied in the order they received
Tables
Items – row
Attributes – key-value
Json, html or XML
Stores and retrieves data based on primary key
2 types of primary key
Partition key – key will determine which partition the data is stored. Unique, eg: userId, productId
Composite Key – Partition Key + Sort Key, situations where partition key cannot be unique, E.g.: same user adds multiple entries in online forum. Here composite key will be user id + timestamp when the post posted
Partition Key – Hash
Sort Key – Range
Access control can be managed via AWS IAM. Create user or Role to manage dynamoDb. Can also use special iam condition to restrict user access to only their own records. (Partition id = userId) – IAM policy parameter : dynamodb:LeadingKeys
Has index features, 2 types of index
Local secondary index
- Can be created only at the time of table creation. Cannot modify later
- It has same partition key as your original table
- But different sort key
Global secondary index
- can be created at anytime
- can have different partition key and different sort key
scan vs query
Query –
Query by partition key or partition key + sorted key.
By default returns all the attributes.
You can use the project expression parameter to return only the specific parameters
By default sorted by sort key – numeric ascending – 1 2 3 4
Sorting order can be reversed by setting ScanIndexForward parameter to false. This is applicable only to query not for scan
By default – eventually consistent
Query is more efficient then scan
- Scan
Returns entire table.
By default returns all the attributes.
You can use the project expression parameter to return only the specific parameters
Filter = to refine the results
Scan – dumps the entire table and add the filter – filter is extra step to remove the data
To improve performance
Avoid scan
Use page size
Parallel scan – by default scan process sequentially, get 1 MB of data and retrieve next 1 MB, can configure parallel scan by logically dividing table or index into segments and scan each segment in parallel. Avoid parallel if your table already busy
Provisioned throughputs – is measured by capacity units
Pricing based on the capacity units
When you create a table, you specify your requirements in terms of read capacity units and write capacity units
If your application reads or writes larger items it will consume more capacity units and it will cost you more as well
I read capacity unit represents 1 strongly consistent or 2 eventually consistent read per second for items upto 4 kb
Write capacity unit –
1 * write capacity unit = 1 * 1 KB write per second
Read capacity unit
1 * read capacity unit = 2 * eventually consistent reads of 4kb per second (by default)
1 * read capacity unit = 1 * strongly consistent reads of 4kb per second (by default)
If table with 5 read capacity units and 5 write capacity units then
5 * 4kb strongly consistent reads = 20 kb per second
5 * 2 * 4kb eventually consistent reads = 40 kb per second (twice as strongly consistent)
5 * 1kb writes = 5 kb per second
To calculate capacity units
If your application need to read 80 items (table rows) per second and each item is 3 kb size
For strongly consistence reads
- Calculate how many read capacity required for each read –
- size of each item /4 kb
- 3kb / 4kb = 0.75 – round to 1
- For 80 items = 80 read capacity units required
For eventual consistent reads
- Double the throughput of strongly consistence reads
- Divide 80/ 2
- 40 read capacity units required
If your application want to write 100 items per second and each item is 512 bytes in size
Calculate how many capacity units for each write
Size of each item / 1 kb
512 bytes / 1 kb = 0.5 = round to 1
For 100 items = 100 capacity units required
Dynamo DB Accelerator – DAX
Fully managed, clustered, in-memory cache for DynamoDb
More for read only purpose
Delivers upto 10x read performance improvement
Ideal for read –heavy and burst workloads
E.g. auction apps, gaming, retails sites during black Friday promotions
It’s a write – through caching service – means data written to the cache and as well as back end store at the same time
Allows you to point your dynamoDb api calls at the DAX cluster
If the item available in cache – cache hit
If the item not available in cache – cache miss
In cache miss – DAX performs eventually consistent getItem operation against dynamoDB and update cache and return
DAX reduces read load on dynamoDB
May be able to reduce provisioned read capacity and save money
Not suitable for
- Strongly consistent reads
- Write intensive applications
- Apps do not perform many reads
- Apps do not require micro second response time
getItem api
returns set of attributes for the item with the given primary key
eventual consistent read by default
set consistentRead to true for strongly consistent
BatchGetItem Api
Read multiple items
Upto 100 items or 1 MB of data
for any aws account, limit is 256 tables per region – contact aws for more
cumulative (concurrent) number of tables and indexes in the CREATING, UPDATING, DELETING state cannot exceed 10 – if so you will receive “LimitExceededException”
If you want to create more than one table with secondary index you must do sequentially. Create one wait till it became active and proceed to next
Global secondary index is index with hash and range key that can be different from those on the table
You can create upto 5 global secondary index when you create table
Each table can have upto 5 local secondary indexes
So totally 10 secondary index. Cant increase the limit beyond 10
Number. String, Boolean data types can be indexed
Set, list, map types cannot be Indexed
Max limit of item collection is 10 GB
Smallest amount of capacity unit can be purchased is 100 (both reads and writes)
Max size of item in dynamoDB = 400 kb
Number if attributes item can have = no limit, but total size including attribute names and values should not exclude 400 KB
Doesn’t support cross table joins (core RDS features not supported)
ItemCollectionSizeLimitExceededException – for a table with local secondary index, exceeded the maximum size limit of 10 GB
Scan is always eventually consistent
Secondary index are optional
If you do more reads/writes then provisioned capacity units, requests will be throttled and you will receive 400 error code – ProvisionedThroughputExceededException
Push button scaling – can scaled your DB on the fly. no down time.
No need to explicitly to create Multi AZ. Dynamo DB is highly available
It automatically replicate the data across mutplie AZ
Better to support stateless web/app apps (RDS too but DynamoDB is better option)
Global Table – for good latency, if data is accessed from diff geo locations
DynamoDB Autoscaling – dynamically adjusts the provisioned throughput capacity on your behalf. increases read and write capacity units incase of sudden traffic without throttling and also reduces when the load is less

RedShift

Dataware housing service
for Business intelligence
OLAP
Configuration
Single node – 160 GB
Multi node
- Leader node – manage client connections and receive queries
- Compute node – store data and perform queries and computations. upto 128 compute nodes
Columnar data stroage
redshift organizes the data by column
row based systems for for transaction processing
column based systems for datawarehosuing and analytics
10 times faster
Advanced compression
doesnt require index or materialized views
uses less space
when loading the data into empty table, Redshift automatically samples your data and selects appropriate compression scheme.
Massive parallel processing (MPP)
Easy to add nodes
Pricing
you will not be charged for leader node
charged for computer node
charged for backup
charged for data transfer (only within VPC)
Security
transit – SSL
rest – AES 256
by default RedShift takes care of key management
- can manage your own keys by HSM
- or AWS KMS
Availability
currently available in 1 AZ
can restore snapshots to new AZ’s during outage
For DR – configure cross region snapshots
When you create Redshift cluster its locked down by default. no one can access it. Add inbound rules in security group to provide access.

KMS

Key management service
Keys are region based.
Customer Master Key – CMK
Alias
Creation date
Description
Key state
Key Material (customer provided or aws provided)
Can never be exported. Keys in HSM can be exported
aws kms encrypt
aws kms decrypt
aws kms re-encrypt
aws kms enable-key-rotation
Envelope encryption key – data key – encrypted customer master key (CMK)
Encrypt the data – get the envelop key by encryption cmk and encrypt the data using envelop key
Decrypt the data – decrypt the envelop key using CMK and decrypt the data using decrypted master key
key cannot be deleted immediately. Need to disable the key and schedule for deletion (7 to 30 days)
HMS – Customer can have the complete control of keys and lifecycle

SQS

simple queue service
first, oldest aws service
Pull based system
Massages can contain upto 1 kb to 256kb of text in any format
Billed at 64kb chunks
I request can have 1 to 10 messages but max size is 256kb
For message size larger then 256Kb use Amazon SQS Extended client library for Java – this library lets you to send a sqs message that contains reference of object in S3 that can be large as 2 GB
Types
Standard queues (default) – order is not guaranteed, message delivered at least once
FIFO queues (First – In – First – Out) – order guaranteed. Limited to 300 tps, message delivered once, no duplicates (ends with .fifo extension
Messages can be kept in queue from 1 min to 14 days
Default retention period is 4 days
Visibility Timeout – amount of time the message is invisible in sqs queue. If the message once pulled it will be invisible. If the message is processed within visibility timeout, message will be deleted from sqs otherwise message will became visible for next thread to process
ChangeMessageVisibility – api to extend visibility timeout
Default visibility timeout is 30 seconds, max is 12 hours
Short polling – will poll for messages frequently and return immediately with empty response if the sqs is empty
Long polling – doesn’t return response until the message comes or long poll time out, can save your money
To enable long polling, set ReceiveMessageWaitTimeSeconds to greater then 0 or equal or less then 20
Max long poll time out – 20 seconds
Can use JMS
No limit on the number of Queues
Can configure access IAM policy that allows anonymous users to access the queue
Free tier provides 1 million requests per month at no charge
SQS is PCI DSS level 1 certified
When consumes receives and processes message from queue, message remains in the queue after the visibility time out. Amazon doesn’t automatically delete the message. Since it’s a distributed system and there no guarantee component processed the message. It will be deleted only after the retention period.
So consumer must delete the message from queue – DeleteMessage API
Dead letter queues – can target for messages which cant processed successfully
User can delete the queue at any time, whether its empty or not
Queue Name
Limited to 80 characters
Alphanumeric, -, _ allowed
Unique within aws account
SQS can trigger lamabda function
to selece message to delete, use receiptHandle of the message ( not the messageId whcih you receive when you send the message)
SQS can delete message from the queue even if a visibility timeout setting causes the message to be locked by another consumer.
SQS doesnt encrypt the messages by default. there is option to encrypt the messages.

SNS

Simple notification service
Topic
SMS, EMAIL, SQS, email, http/https, email-json, application, lamabda
Push based system
To send notifications from cloud
Push notifications
Also can deliver notifications by text, email, sqs or http endpoint
SNS can trigger lambda functions – when message published sns topic that has lambda function subscribed, then lambda function will be invoked with the payload of the message published
All the messages published to SNS topic will be stored redundantly across multiple availability zones
Pay as you go model
Pub – sub model
O.50$ per 1 million amazon SNS requests
Fanout pattern – message published to SNS topic is distributed to number of SQS in parallel, by using this pattern, can build the application can process parallel asynchronous processing.
SNS message body would have
Type
MessageId
Subject
Message
Signature Version
Signature
SigningCertURL
UnSubscribeURL
SNS to send notification to mobile endpoints (whether its direct or subscription based, first we need to register the app with aws). To register mobile with AWS,
Enter name to represent your app
Select platform
Provide your credentials for the notification service to platform
After registration, create an endpoint for the app and mobile device
Then endpoint will be used in SNS to send the notifications.
If SNS distribute the messages to SQS then, below attributes should not be empty or null
Name
Value
Type
MessageBody
To receive messages published to topic, you have to subscribe to the endpoint of the topic
Topic name
Should be unique within aws account
Limited to 256 characters
Alphanumeric, -, _ are allowed
Subscription requests are valid for = 3 days for confirmation
Once message published cannot be recalled
CreatePlatformEndpoint API – to register multiple device tokens
Platform supported
GCM – Google Cloud Messaging
APNS – Apple push notification service
ADM – Amazon device messaging
WNS – Windows Push notification service
MPNS – Microsoft Push notification service
Baidu cloud push – android in china

SWF

Workers
Deciders
They can run independently and scale quickly
SWF ensures task is assigned only once and never duplicated
Guarantees the delivery order
SWF domains – isolate set of types, executions, tasks from others within aws account
Can register domain by console or registerdomain action in swf api
Json format
Maximum workflow can be one year, value always measured in seconds
Decision task occur when the state of the workflow changes
Ec2 can perform worker task
Server resides outside aws can perform EC2 task
Humans can perform activity task not decision task
Max 100 SWF domains
Max 10000 workflow and activity types (in total)
Use case – Video encoding
Actors
workflow starters
deciders
activity workers – could be humans

SES

Simple Email Service
Email only
To send marketing, notification, and transactional emails
Can also use to receive emails. Automatically delivered to S3 bucket. Can trigger lamba and SNS
Use cases –
Automated emails
purchase confirmations, shipping notifications, order status updates
Marketing communications, advertisements, newsletters, special offers
Not subscription based. All you need to know is email address

Elastic Transcoder

Media transcoder in the cloud
convert media files into different formats that will play on smartphone, tablet, pC etc
supports all the popular output formats
you dont need to guess, aws will take care of the format, configurations work best on particular devices
pay by mins that you transcode and resolution at which you transcode.

Kinesis

To send your streaming data
used to consume big data
purchase from online store, stock prices, game data, social network data, IOT data, geo data (uber)
Kinesis streams –
Video streams – securely stream video from connected devices to AWS for analytics and machine learning
Data Streams – build custom applications to process data in real time
Producer will send the data to Streams (Producers – Ec2, mobile, laptop or dedicated server)
Stream will have shards
In shards data can be retained for 24 hours (default )
Can change upto 7 days
And then consumes will consume the data to analyses it (consumer – Ec2)
And send it S3 or dynamoDb or Redshit or EMR
Kinesis Firehose
Capture, transform, load data streams into aws data stores for near real time analytics with BI tools
No shards, no consumers, no retention period – don’t have manage. Automatic
Analyze the data within and automatically uses lambda
Analyze is optional – its within firehose
Producers will send the data to Firehose
Firehose will send to S3 or Red shift
Kinesis Analytics
Allows you run sql queries on the data (sent to Firehose or Streams) and the result data can be stored in S3, Redshift

Elastic Bean Stalk

Java, .net, php, node, python, ruby, go and docker
Apache tomcat, Nginx, passenger , Puma and IIS
Tomcat for Java
Apache HTTP server for PHP & Python & Node js
Nginx also node js
Passanger or puma for ruby apps
Microsoft IIS for .Net
Will handle deployment, capacity provisioning, load balancing, auto scaling and application health
You still retail full control of underlying amazon resources
You pay only for the resources required to store and run your application
Integrated cloud watch and X-Ray
Deployment policies
All the once
- Deploys the new versions to all instances simultaneously
- All of your instances will be out of service during the deployment. Outage
- If the update fails, you need rollback by redeploying the previous version to all the instances
- Not suitable for critical prod environments
Rolling
- Deploys new version in batches
- Each batch of instances will be out of service during the deployment
- Your environment capacity will be reduced by the number of instances in a batch during the deployment
- If the update fails, you need to perform additional rolling update to revert the changes
- Not suitable for performance sensitive systems
Rolling with additional batch
- Launces additional batch of instances
- Deploys the new version in batches
- Maintains full capacity during the deployment
- No downtime
- If the update fails, you need to perform additional rolling update to revert the changes
Immutable
- Deploys the newer version to fresh group of instances in their own auto scaling group
- When the new instances passed the health checks, they are moved to existing auto scaling group and the old instances are terminated
- Full capacity during deployment and no down time
- If the update fails, just terminate the new instances
- Suitable for mission critical production systems
Elastic beanstalk configuration
- You can define packages to install
- Create linux users and groups
- Run shell commands
- Enable service
- Configure load balancer
- Files are written in yaml or json
- Extension – .config
- Saved inside the folder .ebextensions
- Name can be anything
- .ebextensions folder must be included in the top level directory of your app source code bundle
RDS can be integrated with EBS is two ways
Can launch RDS instance with in EBS. RDS instance will be created with in EBS environement
- Good option for Dev and Test
- If you terminate EBS, RDS instances also will be terminated. So not suitable for prod
Launch RDS outside EBS and integrate with EBS
- Decouple RDS and EBS
- Additional security group must be added
- Need to provide connection string
Ec2, RDS, ELB, S3, SNS and Auto scaling group can be deployed in beanstalk (SQS – cant)
AWS toolkit for eclipse – to update running app in beanstalk
can be used to create web server environement and worker environement
supports the deployment of web applications from docker containers

Cloud Breaker – Vinoth's Blog

AWS Solution Architect Certification Notes – Part 3

CloudFront

Storage Gateway

SnowBall

Dynamo DB

RedShift

KMS

SQS

SNS

SWF

SES

Elastic Transcoder

Kinesis

Elastic Bean Stalk

Leave a comment Cancel reply

CloudFront

Storage Gateway

SnowBall

Dynamo DB

RedShift

KMS

SQS

SNS

SWF

SES

Elastic Transcoder

Kinesis

Elastic Bean Stalk

Share this:

Related

Leave a comment Cancel reply