Synthetic benchmarks on SSDs (or in general flash, such as NVMe) tend to give misleading results. That's because they are so fast, you're really testing the performance of the rest of the stack. In the real world, SSDs are underneath the file system, and get used from the kernel. In silly benchmarks, they are used from userspace. The problem is that the user -> system transition for direct IO is a subsystem that is otherwise not utilized heavily, and frequently not performance tuned. That's particularly true for aio, which is a stepchild of OS implementation, because nobody who does serious work needs it. So if you run a benchmark, you're testing an irrelevant backwater of the system, and the results are unlikely to be representative. There is an exception to that, namely operating systems (such as AIX or HP-UX) that have been carefully tuned to best performance with databases that bypass file systems (such as DB2 or Oracle). On those, you can run benchmarks that mimic the database IO pattern and IO access method, and expect somewhat realistic results.
The other problem is this. In the real world, SSDs are used as part of a storage stack, typically with a file system (or database) on top. The performance of the system depends heavily on how the file system uses the SSD. That's because even more than spinning rust, the performance of SSDs depends on the IO pattern: size of IOs, spacing (sequential versus random versus interleaved versus strided), queue depth or asynchronicity. SSDs have way more software in them than disks, and their FTLs are very complex (look in the research literature for papers on FTLs), and the FTLs explicitly tune for workloads.
The usual benchmarking advice applies: the only real and relevant benchmark is your workload. Configure the system for production, tune it, and run whatever workload you need to serve, and measure the performance.