Data - logs, metrics, activity, messages- notifications etc
DBs, systems - store data,KV stores, search indexes, caches
How to move - flow of data - publisher/subscriber
messaging systems-ActiveMQ, RabbitMQ, MQSeries - pub & sub
big data - hadoop - real time, store, process periodically, continious low atency processing, data warehousing
log aggregations,
ETC/transformation tools-not system - stream centric
continuously evolving and ever growing stream
Linkedin - interna infra - streaming platform - pub & sub to streams of data
store, process
modern distri sys - cluster, scale elastically
storage - guaranteed delivery, replicate persisted data
stream processing - compute derived streams, dynamic datasets - less code
-------------------------------------------------------
pub - classify
broker
sub
kafka - distri commit log/streaming
unit of data - message ~ row/record
optional metadata - hash of key - num of partitions in topic
written in batches - compressed
batch - same topic, partition
--
schema - understand msg
json, xml - type handling, compat btn schema versions
Avro - serialzn fmw - hadoop - compact, schema - payload, type, evoln
consistent data format - decoupls read n write
--
categorize - topics ~ table/folder
partition ~ single commit log
append only - order guaranteed within partition
redundancy, scalability - diff servers - horiz
stream ~ topic
stream procsg - kafka streams, apache samza, storm
--
kafka clients - producer, consumer
client APIs - Connect for integration, Streams for procsg
custom partition based on biz rules
offset - metadata - int - unque within partition
cons group - topic - cons:partition ownership
--
broker - 1 server
cluster of brokers - controller-admin, partitions to brokers, monitor
partitions - replications in multiple brokers - leader and followers
--
retention - period/topic size
expire - delete
--
multi clusters - seggr data, isolation-security, multiple DCs(disaster recovery)
replication - within cluster
mirror maker - between clusters
--
multi prods - aggr
multi cons - group
disk based retention
scalable - huge data, without going offline
high perf
data ecosys - any i/p, o/p
--
No comments:
Post a Comment