多机

  • parameter sever / trainer across every machine ( full bandwidth IO )
  • data-parallelism

parameters division by input server

sparse model training:

  • some gradients are ignored
  • prefetch the needed parameters
  • regularization: ?????

多 GPU

sequence

???