Support --workload-list option for vcs simulation framework. By add running args like `--workload-list=<listfile_name>`,user can run several workload in a single launch on VCS or Palladium. This feature will greatly reduce time required to run many short workloads. Each line of list file should `workload-name max-instr-limit`.
To support this feature, we add workload-switch logic. When a workload is done (or exceed max-instr-limit),we will raise workload-switch signal according to simv_result. Then the reset signal will be set to high. Other logic will only rely on reset logic, so we can see each workload independently.
Note: Now DUT memory has two implementation schemes: load memory to Hardware memory when initial, or load memory in software, then read/write by DPIC. Currently the first scheme cannot support reload memory when reset again. So we must ensure using DPIC memory.
The whole progress of handling workload-list is like follows:
1. When initial, user pass workload-list.
2. When `reset` is set to high. User will trigger simv_init, it will get workload and max-instr from workload-list, just like only a workload is run. Software and DUT memory(DPIC) will be inited in this.
3. When `simv_nstep` is ended successfully. `workload-switch` will set `reset` to high. Software should be free by difftest_finish. Then Back to Point 2.
4. When `simv_init` gets no more workload from list. The whole simulation is ended.
* Batch: pack DPIC to reduce sync times
We pack DPIC of DifftestBundles on Hardware and unpack it in Software. So tens of DPIC called times will be reduced to one, which reduce sync times in platform such as palladium.
It can speed up full-difftest of XiangShan from 0.23MHz to 0.29MHz with GlobalEnable,Squash and Gfifo(Non-block) in Palladium. Without Gfifo, it can speed up from 0.025 MHz to 0.11MHz.
Difftest cycle delayed for DifftestBundles to collect valid bundles in bundleNum cycles, which cause delayed comparision result.
BatchInterval and BatchFinish are appended to infoVec to seperate DPIC packed for different Cycle. Other elements contains BundleType and BundleLen to help unpack DataVec.
Now we set maximum data byte packed as 3900, and maximum info byte packed as 90, because Palladium limits total bytes of gfifo param should less than 4000.
We add --max-num-width verilator args to increase bit width limits. See related issue verilator/verilator#2082
For misalignment of struct member, we add packed attribute to struct declaration so all member will be aligned by byte, which helps use memcpy for DPIC.
Note: When run full-difftest in XiangShan with Batch mode, it is possible to contains only one cycle DPICs each time for larger possible maximum bytes each cycle. So diffStateSelect should also be open for VCS and Palladium.
We reshape GatewaySink for same control interface and different IO interface corresponding to DufftestBundle(single-pack) and batchIO(Batch). DPIC for different io interface should extend DPICBase for general methods.
Currently all macros in Gateway can be decided by GatewayConfig,
we move macros inside Config for better comprehension.
Difftest-related macros in Verilog can be configured in generated
DifftestMacros.v, which should be included by all files use those macros.
Some redundant passing of config and return val of collect is removed.
Some macros are renamed for similar format.
Makefile and mk are changed to include generated/DifftestMacros.v
Set RANDOMIZE_DELAY to 0 will cause some random initialized reg be
initialized to X. RANDOMIZE_DELAY should satisfy:
* round(RANDOMIZE_DELAY) > 0
* RANDOMIZE_DELAY < reset delay
See https://github.com/chipsalliance/firrtl/pull/835
This commit adds support for using Synopsys VCS to simulate SimTop.
Difftest is also supported.
For now, we use src/test/vsrc/vcs/top.v as the top-level module.
In the future, we may support VCS slave mode for better scalability.