My AWS certification notes part 3 from A Cloud Guru material
CloudFront
- CDN
- Can be used to deliver static, dynamic, streaming and interactive content
- Geographically disbursed data centers
- Edge location – location where content is cached. Different from Region / AZ
- Edge locations are not just read only. We can put the objects to sync to origin.
- Edge Location will cache the content for the first time use, and serve the content from edge location for the subsequent requests. Request will not goto server. Improve latency
- TTL – time to live – objects are cached for the life of TTL. by default 24 hours
- Manually clearing cache is allowed. But its chargeable
- Origin – Origin of the files, CDN will distribute. This can be S3 or Ec2 or Elastic Load balancer or Route 53, Can work with non aws services as well
- Distribution – name of CDN, which consists of collection of edge locations
- Web distribution – for websites, http/https
- RTMP – for media streaming. Audio/Video
- S3 transfer acceleration make a use of cloudfront to accelerate transfer from S3 to end user.
- Say for example, if geographically distributed users are uploading multiple files to S3 which is located in UK region.
- Instead of uploading the files to S3 directly, they can upload to their nearest edge location and then edge locations accelerate the transfer to S3 trough optimized aws network.
- Eg. prefix.s3-accelerate.amazonaws.com
- S3 – https://s3-eu-west-1.amazonaws.com/test
- https
- Website – http://test.s3-website-eu-west-1.amazonaws.com
- http
- can be secured through cloudfront
- Restrict Bucket Access – bucket can be accessed only via CloudFront
- Origin access identity
- Grant Read Permission on Bucket. By default No. change it to Yes, update the bucket policy. Important step.
- Protocol policy – http or https, https only or convert http to https
- Restrict Viewer Access – Signed URL or Signed cookies – for specific paid content
- Price class – use all edge locations or specific
- Alternate domain names
- SSL certificate – default or custom (upload custom cert to aws cert manager)
- Enable IPv6
- Blocklist or whitelist specific geo locations
- Invalidate – manually invalidating the content to remove the content from the cache (chargeable)
- S3 performance optimization
- GET-intensive workload
- If most of the workload is GET, use cloud front to optimize the performance
- Mixed workload
- Means mix of GET, PUT, DELETE etc
- Avoid sequential key names for S3 objetcs
- S3 uses key name to determine the which partition an object stored
- For heavy workload, this might cause i/o issues
- So add random prefix to key names, to prevent multiple objects stored in the same partition
- Bucket level properties
- Versioning
- Server access logging
- Object level logging
- Static website hosting
- Tags
- Transfer acceleration
- Events
- Bucket level properties
- Storage class
- Meta Data
- Virtual hosted S3 URL
- http://bucket.s3.amazonaws.com
- http://bucket.s3-aws-region.amazonaws.com
- Path style S3 URL
- http://s3.amazonaws.com/bucket –> for US East – N.Virginia
- http://s3-aws-region.amazonaws.com/bucket –> for other regions
- http://s3.aws-region.amazonaws.com/bucket –> This also works
- S3 – https://s3-eu-west-1.amazonaws.com/test
- https
- Website – http://test.s3-website-eu-west-1.amazonaws.com
- http
- can be secured through cloudfront
- Versioning
- Pricing includes all the versions
- Delete api will not delete the object permanently. it will just add the delete marker. so this object eligible for year end billing
- To delete permanently, you must delete object version id
- Cross region replication requires versioning enabled on both source and destination buckets
- if you run PCI or HIPAA-compliant workloads, aws recommends to log your cloudfront usage data for last 355 days by
- Enabling cloudfront access logs
- Capture requests that are sent to the cloudfront API
- https://aws.amazon.com/s3/faqs/
Storage Gateway
- To back up the data
- availiable for download as image
- You can install on a host on your datacenter
- supports either VMware ESXi or Microsoft Hyper-V
- once installed, associate with aws service through activation process
- Replicates data from your own data center (on premise) to AWS. You install it as a host on your data center.
- 4 types
- File Gateway (NFS)
- Stores flat files directly in S3
- Word, pdf, image, video
- Volumes Gateway (iSCSI)
- block based
- disk volumes
- OS, sql server stored in virtual hard disk
- Stores into S3 in the form of EBS snapshots
- its blockbased. so cant store into S3 directly
- snapshots are incremental
- Stored Volumes
- Entire data is stored in on site (on prime) and SG then backups this data up asynchronously to Amazon S3 (as EBS snapshot).
- GS volumes provide durable and inexpensive off-site backups that you can recover locally or on Amazon EC2.
- 1 GB to 16 GB
- Cached Volumes
- Only your most frequently accessed data is stored in on prime and Your entire data set is stored in S3 (as EBS snapshot).
- You don’t have to go out and buy large SAN arrays for your office/data center, so you can get significant cost savings.
- If you lose internet connectivity however, you will not be able to access all of your data.
- 1 GB to 32 GB
- Tape Gateway (VTL)
- Used for backup
- Limitless collection or VT. Each VT can be stored in a VTL. If it is stored in Glacier, is it a VT Shelf.
- If you use products like NetBackup etc you can do away with this and just the VTL. It will get rid of your physical tapes and create the virtual ones.
- supported by Netbackup, Backup Exec, Veeam
SnowBall
- previously import/export disk (legacy)
- can transfer your data (both import and export) in aws using amazon’s high speed internal network avoiding internet
- Snowball
- onBoard storage
- Petabyte-scale data transport solution
- 80 TB snowball in all regions
- tamper resistant enclosures
- 256 bit encryption
- Trusted platform module (TPM)
- once data transfer completed, aws does the erasure of snowball appliance
- Snowball Edge
- onBoard storage and compute capabilities
- AZ in on prime (like)
- can ensure to continue run your applications even they are not able to access cloud. can collect transfer the data to aws later
- 100 TB data transfer device
- SnowMobile
- Exabyte-scale data transport solution
- can transfer upto 100 PetaByte per snow mobile
- 45 foot long shipping container truck
- secure, fast and cost effective
- Snowball can import to S3 and export from S3
- if your data is in glacier then export to S3 and export using snowball
Dynamo DB
- No SQL DB
- You can’t specify AZ when you create DynamoDB table
- Supports both document and Key-Value data models
- Stored SSD storage – always
- Spread across 3 geo locations
- Choice of 2 consistency models
- Eventual consistent reads (default) – all copies of data will reach all the data centers within second. Read after short time will always return the updated data – best read performance
- Strongly consistent reads
- It uses conditional writes for consistency. For PutItem, DeleteItem, UpdateItem operations. Operations succeeds only if attributes meet one or more conditions. Otherwise it returns error.
- It uses optimistic concurrency control – optimistic locking – its strategy – your DB writes are protected from being overwritten by writes of others and vice versa
- Supports atomic counters – all write requests are applied in the order they received
- Tables
- Items – row
- Attributes – key-value
- Json, html or XML
- Stores and retrieves data based on primary key
- 2 types of primary key
- Partition key – key will determine which partition the data is stored. Unique, eg: userId, productId
- Composite Key – Partition Key + Sort Key, situations where partition key cannot be unique, E.g.: same user adds multiple entries in online forum. Here composite key will be user id + timestamp when the post posted
- Partition Key – Hash
- Sort Key – Range
- Access control can be managed via AWS IAM. Create user or Role to manage dynamoDb. Can also use special iam condition to restrict user access to only their own records. (Partition id = userId) – IAM policy parameter : dynamodb:LeadingKeys
- Has index features, 2 types of index
- Local secondary index
- Can be created only at the time of table creation. Cannot modify later
- It has same partition key as your original table
- But different sort key
- Global secondary index
- can be created at anytime
- can have different partition key and different sort key
- scan vs query
- Query –
- Query by partition key or partition key + sorted key.
- By default returns all the attributes.
- You can use the project expression parameter to return only the specific parameters
- By default sorted by sort key – numeric ascending – 1 2 3 4
- Sorting order can be reversed by setting ScanIndexForward parameter to false. This is applicable only to query not for scan
- By default – eventually consistent
- Query is more efficient then scan
- Scan
- Returns entire table.
- By default returns all the attributes.
- You can use the project expression parameter to return only the specific parameters
- Filter = to refine the results
- Scan – dumps the entire table and add the filter – filter is extra step to remove the data
- To improve performance
- Avoid scan
- Use page size
- Parallel scan – by default scan process sequentially, get 1 MB of data and retrieve next 1 MB, can configure parallel scan by logically dividing table or index into segments and scan each segment in parallel. Avoid parallel if your table already busy
- Provisioned throughputs – is measured by capacity units
- Pricing based on the capacity units
- When you create a table, you specify your requirements in terms of read capacity units and write capacity units
- If your application reads or writes larger items it will consume more capacity units and it will cost you more as well
- I read capacity unit represents 1 strongly consistent or 2 eventually consistent read per second for items upto 4 kb
- Write capacity unit –
- 1 * write capacity unit = 1 * 1 KB write per second
- Read capacity unit
- 1 * read capacity unit = 2 * eventually consistent reads of 4kb per second (by default)
- 1 * read capacity unit = 1 * strongly consistent reads of 4kb per second (by default)
- If table with 5 read capacity units and 5 write capacity units then
- 5 * 4kb strongly consistent reads = 20 kb per second
- 5 * 2 * 4kb eventually consistent reads = 40 kb per second (twice as strongly consistent)
- 5 * 1kb writes = 5 kb per second
- To calculate capacity units
- If your application need to read 80 items (table rows) per second and each item is 3 kb size
- For strongly consistence reads
- Calculate how many read capacity required for each read –
- size of each item /4 kb
- 3kb / 4kb = 0.75 – round to 1
- For 80 items = 80 read capacity units required
- For eventual consistent reads
- Double the throughput of strongly consistence reads
- Divide 80/ 2
- 40 read capacity units required
- If your application want to write 100 items per second and each item is 512 bytes in size
- Calculate how many capacity units for each write
- Size of each item / 1 kb
- 512 bytes / 1 kb = 0.5 = round to 1
- For 100 items = 100 capacity units required
- Dynamo DB Accelerator – DAX
- Fully managed, clustered, in-memory cache for DynamoDb
- More for read only purpose
- Delivers upto 10x read performance improvement
- Ideal for read –heavy and burst workloads
- E.g. auction apps, gaming, retails sites during black Friday promotions
- It’s a write – through caching service – means data written to the cache and as well as back end store at the same time
- Allows you to point your dynamoDb api calls at the DAX cluster
- If the item available in cache – cache hit
- If the item not available in cache – cache miss
- In cache miss – DAX performs eventually consistent getItem operation against dynamoDB and update cache and return
- DAX reduces read load on dynamoDB
- May be able to reduce provisioned read capacity and save money
- Not suitable for
- Strongly consistent reads
- Write intensive applications
- Apps do not perform many reads
- Apps do not require micro second response time
- getItem api
- returns set of attributes for the item with the given primary key
- eventual consistent read by default
- set consistentRead to true for strongly consistent
- BatchGetItem Api
- Read multiple items
- Upto 100 items or 1 MB of data
- for any aws account, limit is 256 tables per region – contact aws for more
- cumulative (concurrent) number of tables and indexes in the CREATING, UPDATING, DELETING state cannot exceed 10 – if so you will receive “LimitExceededException”
- If you want to create more than one table with secondary index you must do sequentially. Create one wait till it became active and proceed to next
- Global secondary index is index with hash and range key that can be different from those on the table
- You can create upto 5 global secondary index when you create table
- Each table can have upto 5 local secondary indexes
- So totally 10 secondary index. Cant increase the limit beyond 10
- Number. String, Boolean data types can be indexed
- Set, list, map types cannot be Indexed
- Max limit of item collection is 10 GB
- Smallest amount of capacity unit can be purchased is 100 (both reads and writes)
- Max size of item in dynamoDB = 400 kb
- Number if attributes item can have = no limit, but total size including attribute names and values should not exclude 400 KB
- Doesn’t support cross table joins (core RDS features not supported)
- ItemCollectionSizeLimitExceededException – for a table with local secondary index, exceeded the maximum size limit of 10 GB
- Scan is always eventually consistent
- Secondary index are optional
- If you do more reads/writes then provisioned capacity units, requests will be throttled and you will receive 400 error code – ProvisionedThroughputExceededException
- Push button scaling – can scaled your DB on the fly. no down time.
- No need to explicitly to create Multi AZ. Dynamo DB is highly available
- It automatically replicate the data across mutplie AZ
- Better to support stateless web/app apps (RDS too but DynamoDB is better option)
- Global Table – for good latency, if data is accessed from diff geo locations
- DynamoDB Autoscaling – dynamically adjusts the provisioned throughput capacity on your behalf. increases read and write capacity units incase of sudden traffic without throttling and also reduces when the load is less
RedShift
- Dataware housing service
- for Business intelligence
- OLAP
- Configuration
- Single node – 160 GB
- Multi node
- Leader node – manage client connections and receive queries
- Compute node – store data and perform queries and computations. upto 128 compute nodes
- Columnar data stroage
- redshift organizes the data by column
- row based systems for for transaction processing
- column based systems for datawarehosuing and analytics
- 10 times faster
- Advanced compression
- doesnt require index or materialized views
- uses less space
- when loading the data into empty table, Redshift automatically samples your data and selects appropriate compression scheme.
- Massive parallel processing (MPP)
- Easy to add nodes
- Pricing
- you will not be charged for leader node
- charged for computer node
- charged for backup
- charged for data transfer (only within VPC)
- Security
- transit – SSL
- rest – AES 256
- by default RedShift takes care of key management
- can manage your own keys by HSM
- or AWS KMS
- Availability
- currently available in 1 AZ
- can restore snapshots to new AZ’s during outage
- For DR – configure cross region snapshots
- When you create Redshift cluster its locked down by default. no one can access it. Add inbound rules in security group to provide access.
KMS
- Key management service
- Keys are region based.
- Customer Master Key – CMK
- Alias
- Creation date
- Description
- Key state
- Key Material (customer provided or aws provided)
- Can never be exported. Keys in HSM can be exported
- aws kms encrypt
- aws kms decrypt
- aws kms re-encrypt
- aws kms enable-key-rotation
- Envelope encryption key – data key – encrypted customer master key (CMK)
- Encrypt the data – get the envelop key by encryption cmk and encrypt the data using envelop key
- Decrypt the data – decrypt the envelop key using CMK and decrypt the data using decrypted master key
- key cannot be deleted immediately. Need to disable the key and schedule for deletion (7 to 30 days)
- HMS – Customer can have the complete control of keys and lifecycle
SQS
- simple queue service
- first, oldest aws service
- Pull based system
- Massages can contain upto 1 kb to 256kb of text in any format
- Billed at 64kb chunks
- I request can have 1 to 10 messages but max size is 256kb
- For message size larger then 256Kb use Amazon SQS Extended client library for Java – this library lets you to send a sqs message that contains reference of object in S3 that can be large as 2 GB
- Types
- Standard queues (default) – order is not guaranteed, message delivered at least once
- FIFO queues (First – In – First – Out) – order guaranteed. Limited to 300 tps, message delivered once, no duplicates (ends with .fifo extension
- Messages can be kept in queue from 1 min to 14 days
- Default retention period is 4 days
- Visibility Timeout – amount of time the message is invisible in sqs queue. If the message once pulled it will be invisible. If the message is processed within visibility timeout, message will be deleted from sqs otherwise message will became visible for next thread to process
- ChangeMessageVisibility – api to extend visibility timeout
- Default visibility timeout is 30 seconds, max is 12 hours
- Short polling – will poll for messages frequently and return immediately with empty response if the sqs is empty
- Long polling – doesn’t return response until the message comes or long poll time out, can save your money
- To enable long polling, set ReceiveMessageWaitTimeSeconds to greater then 0 or equal or less then 20
- Max long poll time out – 20 seconds
- Can use JMS
- No limit on the number of Queues
- Can configure access IAM policy that allows anonymous users to access the queue
- Free tier provides 1 million requests per month at no charge
- SQS is PCI DSS level 1 certified
- When consumes receives and processes message from queue, message remains in the queue after the visibility time out. Amazon doesn’t automatically delete the message. Since it’s a distributed system and there no guarantee component processed the message. It will be deleted only after the retention period.
- So consumer must delete the message from queue – DeleteMessage API
- Dead letter queues – can target for messages which cant processed successfully
- User can delete the queue at any time, whether its empty or not
- Queue Name
- Limited to 80 characters
- Alphanumeric, -, _ allowed
- Unique within aws account
- SQS can trigger lamabda function
- to selece message to delete, use receiptHandle of the message ( not the messageId whcih you receive when you send the message)
- SQS can delete message from the queue even if a visibility timeout setting causes the message to be locked by another consumer.
- SQS doesnt encrypt the messages by default. there is option to encrypt the messages.
SNS
- Simple notification service
- Topic
- SMS, EMAIL, SQS, email, http/https, email-json, application, lamabda
- Push based system
- To send notifications from cloud
- Push notifications
- Also can deliver notifications by text, email, sqs or http endpoint
- SNS can trigger lambda functions – when message published sns topic that has lambda function subscribed, then lambda function will be invoked with the payload of the message published
- All the messages published to SNS topic will be stored redundantly across multiple availability zones
- Pay as you go model
- Pub – sub model
- O.50$ per 1 million amazon SNS requests
- Fanout pattern – message published to SNS topic is distributed to number of SQS in parallel, by using this pattern, can build the application can process parallel asynchronous processing.
- SNS message body would have
- Type
- MessageId
- Subject
- Message
- Signature Version
- Signature
- SigningCertURL
- UnSubscribeURL
- SNS to send notification to mobile endpoints (whether its direct or subscription based, first we need to register the app with aws). To register mobile with AWS,
- Enter name to represent your app
- Select platform
- Provide your credentials for the notification service to platform
- After registration, create an endpoint for the app and mobile device
- Then endpoint will be used in SNS to send the notifications.
- If SNS distribute the messages to SQS then, below attributes should not be empty or null
- Name
- Value
- Type
- MessageBody
- To receive messages published to topic, you have to subscribe to the endpoint of the topic
- Topic name
- Should be unique within aws account
- Limited to 256 characters
- Alphanumeric, -, _ are allowed
- Subscription requests are valid for = 3 days for confirmation
- Once message published cannot be recalled
- CreatePlatformEndpoint API – to register multiple device tokens
- Platform supported
- GCM – Google Cloud Messaging
- APNS – Apple push notification service
- ADM – Amazon device messaging
- WNS – Windows Push notification service
- MPNS – Microsoft Push notification service
- Baidu cloud push – android in china
SWF
- Workers
- Deciders
- They can run independently and scale quickly
- SWF ensures task is assigned only once and never duplicated
- Guarantees the delivery order
- SWF domains – isolate set of types, executions, tasks from others within aws account
- Can register domain by console or registerdomain action in swf api
- Json format
- Maximum workflow can be one year, value always measured in seconds
- Decision task occur when the state of the workflow changes
- Ec2 can perform worker task
- Server resides outside aws can perform EC2 task
- Humans can perform activity task not decision task
- Max 100 SWF domains
- Max 10000 workflow and activity types (in total)
- Use case – Video encoding
- Actors
- workflow starters
- deciders
- activity workers – could be humans
SES
- Simple Email Service
- Email only
- To send marketing, notification, and transactional emails
- Can also use to receive emails. Automatically delivered to S3 bucket. Can trigger lamba and SNS
- Use cases –
- Automated emails
- purchase confirmations, shipping notifications, order status updates
- Marketing communications, advertisements, newsletters, special offers
- Not subscription based. All you need to know is email address
Elastic Transcoder
- Media transcoder in the cloud
- convert media files into different formats that will play on smartphone, tablet, pC etc
- supports all the popular output formats
- you dont need to guess, aws will take care of the format, configurations work best on particular devices
- pay by mins that you transcode and resolution at which you transcode.
Kinesis
- To send your streaming data
- used to consume big data
- purchase from online store, stock prices, game data, social network data, IOT data, geo data (uber)
- Kinesis streams –
- Video streams – securely stream video from connected devices to AWS for analytics and machine learning
- Data Streams – build custom applications to process data in real time
- Producer will send the data to Streams (Producers – Ec2, mobile, laptop or dedicated server)
- Stream will have shards
- In shards data can be retained for 24 hours (default )
- Can change upto 7 days
- And then consumes will consume the data to analyses it (consumer – Ec2)
- And send it S3 or dynamoDb or Redshit or EMR
- Kinesis Firehose
- Capture, transform, load data streams into aws data stores for near real time analytics with BI tools
- No shards, no consumers, no retention period – don’t have manage. Automatic
- Analyze the data within and automatically uses lambda
- Analyze is optional – its within firehose
- Producers will send the data to Firehose
- Firehose will send to S3 or Red shift
- Kinesis Analytics
- Allows you run sql queries on the data (sent to Firehose or Streams) and the result data can be stored in S3, Redshift
Elastic Bean Stalk
- Java, .net, php, node, python, ruby, go and docker
- Apache tomcat, Nginx, passenger , Puma and IIS
- Tomcat for Java
- Apache HTTP server for PHP & Python & Node js
- Nginx also node js
- Passanger or puma for ruby apps
- Microsoft IIS for .Net
- Will handle deployment, capacity provisioning, load balancing, auto scaling and application health
- You still retail full control of underlying amazon resources
- You pay only for the resources required to store and run your application
- Integrated cloud watch and X-Ray
- Deployment policies
- All the once
- Deploys the new versions to all instances simultaneously
- All of your instances will be out of service during the deployment. Outage
- If the update fails, you need rollback by redeploying the previous version to all the instances
- Not suitable for critical prod environments
- Rolling
- Deploys new version in batches
- Each batch of instances will be out of service during the deployment
- Your environment capacity will be reduced by the number of instances in a batch during the deployment
- If the update fails, you need to perform additional rolling update to revert the changes
- Not suitable for performance sensitive systems
- Rolling with additional batch
- Launces additional batch of instances
- Deploys the new version in batches
- Maintains full capacity during the deployment
- No downtime
- If the update fails, you need to perform additional rolling update to revert the changes
- Immutable
- Deploys the newer version to fresh group of instances in their own auto scaling group
- When the new instances passed the health checks, they are moved to existing auto scaling group and the old instances are terminated
- Full capacity during deployment and no down time
- If the update fails, just terminate the new instances
- Suitable for mission critical production systems
- Elastic beanstalk configuration
- You can define packages to install
- Create linux users and groups
- Run shell commands
- Enable service
- Configure load balancer
- Files are written in yaml or json
- Extension – .config
- Saved inside the folder .ebextensions
- Name can be anything
- .ebextensions folder must be included in the top level directory of your app source code bundle
- RDS can be integrated with EBS is two ways
- Can launch RDS instance with in EBS. RDS instance will be created with in EBS environement
- Good option for Dev and Test
- If you terminate EBS, RDS instances also will be terminated. So not suitable for prod
- Launch RDS outside EBS and integrate with EBS
- Decouple RDS and EBS
- Additional security group must be added
- Need to provide connection string
- Ec2, RDS, ELB, S3, SNS and Auto scaling group can be deployed in beanstalk (SQS – cant)
- AWS toolkit for eclipse – to update running app in beanstalk
- can be used to create web server environement and worker environement
- supports the deployment of web applications from docker containers