As I know all 'draw_' functions are extremely slow (manual disagrees, but they were always slow from my experience)... Probably the way those functions are working in on acknex side has something to do with AMD GPUs...
Edit: take a look at the particles ms/frame... it's around ~100ms... damn