Protobuf Rampup

Protobuf Learning

1. What are Protocol Buffers?

  • Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.
  • You define how you want your data to be structured once, then you use special generated source code to easily write and read your structured data

2. Why use Protocol Buffers?

  • XML is human readable and wide language supports
    • but is notoriously space intensive
    • encoding/ decoding can impose a huge performance penalty on applications
  • With protocol buffers
    • write a .proto description of the data structure
    • the protocol buffer compiler then creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format
    • the generated class provides getters and setters for the fields
    • take care of the details of reading and writing the protocol buffer as a unit

3. Java Tutorial (In Proto2)

3.1 Define Protocol Format

syntax = "proto2";

// starts with package delcaration 
// we should define this to get rid of name conflict 
package tutorial;

// enable generating a separate .java file for each generated class 
option java_multiple_files = true;

// specify in what java package name your generated classes should live
// if not set here, it will simply match the pkg name given by the package declaration 
option java_package = "com.example.tutorial.protos";

// define the class name of the wrapper class which will represent this file 
// if not given, it will be auto generated by converting the file name to upper camel case 
option java_outer_classname = "AddressBookProtos";

Message Definition: An aggregate containing a set of typed fields 
Contain certain standard types
    + boo1
    + int32 
    + float 
    + double 
    + string 
we could also add further structure to msgs by using other msg types as field types 

+ marker 
    + identify the unique tag field use in binary encoding 
    + try to use 1 - 15 as it neeeds one less byte

+ modifier 
    + optional 
        + field may or may not be set 
        + if not, a default value will be used 
            + we could set our own default values 
            + or system will provide defaults 
                + numeric types -- zero 
                + strings -- empty string 
                + bools -- false 
                + embedded messages -- default instance or prototype of the message, which has none of its fields set 
    + repeated 
        + the field may be repeated any number of times [0, xxx) 
        + order will be preserved in the protocol buffer 
        + act like a dynamic sized array 
    + required 
        + a value for the field must be provided
        + try to build an uninitialized msg will throw runtime exception 
        + parse an uninitialzied msg will throw IOException 
        + required is not favored as it cannot be backward compatible 
message Person {
    // =1 marker identify the unique tag that field uses in the binary encoding 
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;

  message PhoneNumber {
    optional string number = 1;
    optional PhoneType type = 2 [default = HOME];

  repeated PhoneNumber phones = 4;

message AddressBook {
  repeated Person people = 1;

3.2 Compiling Protocol Buffers

  • To generate the classes, we need to run the protocol buffer compiler
  • specify the source directory, the destination directory and the path to our .proto
protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto 

3.3 Protocol Buffer API

  • compiler helps auto generate source file
    • getters and setters
    • each field also has clear method to set the field back to its empty state
  • Builders vs Messages
    • message classes are immutable
    • builder is used when you first construct a builder, then we could call the builder’s build() method
  • standard message methods
    • isInitialized check if all the required fields have been set
    • toString returns a human readable representation of the msg
    • mergeFrom(Message other) merge the contents of other into this msg, overwrite singular scalar fields
    • clear clear all the fields back to the empty state
  • Parsing and Serialization
    • byte[] toByteArray();
      • serializes the msg and returns a byte array containing its raw bytes
    • static xxx parseFrom(byte[] data);
      • parse a msg from the given byte array
    • void writeTo(OutputStream output);
      • serialize the msg and writes to an OutputStream
    • static xxx parseFrom(InputStream input);
      • reads and parses a msg from an InputStream

3.4 How to extend a Protocol Buffer

  • In the new version of the protocol buffer
    • must not change the tag numbers of any existing fields
    • must not add or delete any required fields
    • may delete optional or repeated fields
    • may add new optional or repeated fields but must use fresh tag numbers

4. Overall Guide (In Proto3)

4.1 Define message type

  • Each field in the msg definition need to have a unique number
    • those numbers are used to identify fields in the message binary format
    • the number should never be changed
  • specify field rules
    • singular
      • default field rule for proto3 syntax
      • can have zero or one of this field
    • repeated
      • can be repeated any number of times (including zero)
  • reserved fields
    • if you update a msg type by entirely removing a field or commenting it out, future users can reuse the field number but it would bring severe issues,
    • thus we could reserved the number for deleted fields and tag number
message Foo {
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
  • Post compiler running
    • Compiler generates a .java file with a class for each message type, as well as Builder classes for creating message class instances
  • For enum values
    • every enum definition must contain a constant that maps to zero as its first element
    • we can allow alias thus we could assign the same value to different enum constants
  • import
    • we could do import thus we could use definitions from other .proto file

4.2 Scalar Value Types

Language Guide (proto3) | Protocol Buffers | Google Developers

4.3 Nested Types

  • we could define and use msg types inside other msg types
message SearchResponse {
  message Result {
    string url = 1;
    string title = 2;
    repeated string snippets = 3;
  repeated Result results = 1;

// to use the msg type outside its parent message type 
message SomeOtherMessage {
  SearchResponse.Result result = 1;

4.4 Updating a Message Type

  • don’t change the field numbers for any existing fields
  • if you add new fields, any msg serialized by code using your old msg format can still be parsed by your new generated code
    • keep in mind the default values for these elements so that new code can properly interact with msgs generated by old code
  • to remove a field
    • rename the field with prefix like OBSOLETE_
    • or make the filed number reserved,
  • int32, uint32, int64, uint64 and bool are all compatible
  • string and bytes are compatible as long as the bytes are valid UTF-8

4.5 Special Keywords

4.5.1 Any

  • let you use messages as embedded types without having their .proto definition
  • it contains an aribitrary serialized messages as bytes

4.5.2 Oneof

  • if we have a msg with many fields and where at most one field will be set at the same time, we can enforce the behavior and save memory by using the oneof feature
  • at most one field can be set at the same time
  • setting any member of the oneof automatically clears all the other members

4.6 Maps

  • map<key_type, value_type> map_field = N;

4.7 Define Service

  • If you want to use message types with an RPC system, we can define an RPC service interface in a .proto file
  • then the protocol buffer compiler will generate service interface code and stubs in chosen language

4.8 Options

  • Options do not change the overall meaning of a declaration, but may affect the way it is handled in a particular context.

  • java_package

    • pkg you want to use for your generated Java classes
  • java_outer_classname

    • class name for the wrapper java class you want to generate
  • java_multiple_files

  • optimize_for

    • SPEED
      • Compiler will generate code for serializing, parsing and performing other common operations on your msg types.
      • Code is highly optimized
      • generate minimal classes
      • operations will be slower
      • only depend on the lite runtime library
      • usefyl for apps running on constrained platform like mobile phones


  1. Overview
  2. Language Guide
  3. Java Tutorial
  4. Java Generated Code
  5. Java Encoding


文章标题:Protobuf Rampup


本文作者:Leilei Chen

发布时间:2021-12-25, 10:09:49

最后更新:2021-12-25, 10:11:09


版权声明: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。