Misc

d3image

Challenge

我一定是训练模型训练出了幻觉，怎么从这张图里看出了“不存在”的文字？

Sloution

为了还原mysterious_invitation.png中隐藏的信息，我们需要实现your_decode_net函数。根据编码过程，d3net是一个可逆网络，它将DWT变换后的封面图像和秘密信息合并后进行转换。因此，your_decode_net实际上是d3net的逆过程。

具体步骤如下：

实现 INV_block 的逆操作 INV_block_reverse： INV_block 是 d3net 的基本组成单元。我们需要根据其前向传播的数学关系，推导出反向传播以恢复原始输入。
实现 D3net 的逆操作 D3net_reverse： D3net 由多个 INV_block 串联组成。其逆操作就是将 INV_block_reverse 按相反的顺序串联起来。
在 decode 函数中使用 D3net_reverse：
- 将待解码的图片进行DWT变换。
- 构建D3net_reverse的输入。由于d3net的前向传播是(cover_dwt, payload_dwt) -> (stego_dwt, z_channels)，那么其逆向传播就是(stego_dwt, z_prior) -> (recovered_cover_dwt, recovered_payload_dwt)。这里的z_prior通常是一个全零张量，表示编码时被压缩或推向零的隐变量。
- 运行D3net_reverse以获得恢复的秘密信息DWT。
- 对恢复的秘密信息DWT应用IWT，还原为原始的位图表示。
- 最后，将位图转换为文本信息。

下面是修改后的文件内容：

block.py:

python

1import torch2import torch.nn as nn3from utils import initialize_weights4 5# Dense connection6class ResidualDenseBlock_out(nn.Module):7    def __init__(self, bias=True):8        super(ResidualDenseBlock_out, self).__init__()     9        self.channel = 1210        self.hidden_size = 32   11        self.conv1 = nn.Conv2d(self.channel, self.hidden_size, 3, 1, 1, bias=bias)12        self.conv2 = nn.Conv2d(self.channel + self.hidden_size, self.hidden_size, 3, 1, 1, bias=bias)13        self.conv3 = nn.Conv2d(self.channel + 2 * self.hidden_size, self.hidden_size, 3, 1, 1, bias=bias)14        self.conv4 = nn.Conv2d(self.channel + 3 * self.hidden_size, self.hidden_size, 3, 1, 1, bias=bias)15        self.conv5 = nn.Conv2d(self.channel + 4 * self.hidden_size, self.channel, 3, 1, 1, bias=bias)16        self.lrelu = nn.LeakyReLU(inplace=True)17        # initialization18        initialize_weights([self.conv5], 0.)19 20    def forward(self, x):21        x1 = self.lrelu(self.conv1(x))22        x2 = self.lrelu(self.conv2(torch.cat((x, x1), 1)))23        x3 = self.lrelu(self.conv3(torch.cat((x, x1, x2), 1)))24        x4 = self.lrelu(self.conv4(torch.cat((x, x1, x2, x3), 1)))25        x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))26        return x527 28class INV_block(nn.Module):29    def __init__(self, clamp=2.0):30        super().__init__()31        32        self.channels = 333        self.clamp = clamp34        # ρ35        self.r = ResidualDenseBlock_out()36        # η37        self.y = ResidualDenseBlock_out()38        # φ39        self.f = ResidualDenseBlock_out()40 41    def e(self, s):42        return torch.exp(self.clamp * 2 * (torch.sigmoid(s) - 0.5))43 44    def forward(self, x):45        x1, x2 = (x.narrow(1, 0, self.channels*4),46                  x.narrow(1, self.channels*4, self.channels*4))47 48        t2 = self.f(x2)49        y1 = x1 + t250        s1, t1 = self.r(y1), self.y(y1)51        y2 = self.e(s1) * x2 + t152 53        return torch.cat((y1, y2), 1)54 55# Added for inverse operation56class INV_block_reverse(nn.Module):57    def __init__(self, inv_block_instance):58        super().__init__()59        # Store references to the original block's sub-modules60        # This is critical to use the SAME trained weights61        self.r = inv_block_instance.r62        self.y = inv_block_instance.y63        self.f = inv_block_instance.f64 65        self.channels = inv_block_instance.channels66        self.clamp = inv_block_instance.clamp67 68    def e(self, s):69        return torch.exp(self.clamp * 2 * (torch.sigmoid(s) - 0.5))70 71    def forward(self, y_cat):72        # y_cat is torch.cat((y1, y2), 1)73        y1, y2 = (y_cat.narrow(1, 0, self.channels*4),74                  y_cat.narrow(1, self.channels*4, self.channels*4))75 76        # Inverse operations based on INV_block.forward:77        # Original:78        # t2 = self.f(x2)79        # y1 = x1 + t2             => x1 = y1 - t280        # s1, t1 = self.r(y1), self.y(y1)81        # y2 = self.e(s1) * x2 + t1 => x2 = (y2 - t1) / self.e(s1)82 83        # Reversing order:84        # 1. Calculate s1 and t1 using y185        s1 = self.r(y1)86        t1 = self.y(y1)87 88        # 2. Calculate x2 using y2, t1, and s189        e_s1 = self.e(s1)90        x2 = (y2 - t1) / e_s191 92        # 3. Calculate t2 using x293        t2 = self.f(x2)94 95        # 4. Calculate x1 using y1 and t296        x1 = y1 - t297 98        return torch.cat((x1, x2), 1)99

utils.py:

python

1import torch.nn as nn2import torch.nn.init as init3import torch4import numpy as np5import math6from reedsolo import RSCodec7import zlib8 9rs = RSCodec(128)10 11def initialize_weights(net_l, scale=1):12    if not isinstance(net_l, list):13        net_l = [net_l]14    for net in net_l:15        for m in net.modules():16            if isinstance(m, nn.Conv2d):17                init.kaiming_normal_(m.weight, a=0, mode='fan_in')18                m.weight.data *= scale  # for residual block19                if m.bias is not None:20                    m.bias.data.zero_()21            elif isinstance(m, nn.Linear):22                init.kaiming_normal_(m.weight, a=0, mode='fan_in')23                m.weight.data *= scale24                if m.bias is not None:25                    m.bias.data.zero_()26            elif isinstance(m, nn.BatchNorm2d):27                init.constant_(m.weight, 1)28                init.constant_(m.bias.data, 0.0)29    30class IWT(nn.Module):31    def __init__(self):32        super(IWT, self).__init__()33        self.requires_grad = False34 35    def forward(self, x):36        r = 237        in_batch, in_channel, in_height, in_width = x.size()38        #print([in_batch, in_channel, in_height, in_width])39        out_batch, out_channel, out_height, out_width = in_batch, int(40            in_channel / (r ** 2)), r * in_height, r * in_width41        x1 = x[:, 0:out_channel, :, :] / 242        x2 = x[:, out_channel:out_channel * 2, :, :] / 243        x3 = x[:, out_channel * 2:out_channel * 3, :, :] / 244        x4 = x[:, out_channel * 3:out_channel * 4, :, :] / 245 46 47        h = torch.zeros([out_batch, out_channel, out_height, out_width]).float().cuda()48 49        h[:, :, 0::2, 0::2] = x1 - x2 - x3 + x450        h[:, :, 1::2, 0::2] = x1 - x2 + x3 - x451        h[:, :, 0::2, 1::2] = x1 + x2 - x3 - x452        h[:, :, 1::2, 1::2] = x1 + x2 + x3 + x453 54        return h55class DWT(nn.Module):56    def __init__(self):57        super(DWT, self).__init__()58        self.requires_grad = False59 60    def forward(self, x):61        x01 = x[:, :, 0::2, :] / 262        x02 = x[:, :, 1::2, :] / 263        x1 = x01[:, :, :, 0::2]64        x2 = x02[:, :, :, 0::2]65        x3 = x01[:, :, :, 1::2]66        x4 = x02[:, :, :, 1::2]67        x_LL = x1 + x2 + x3 + x468        x_HL = -x1 - x2 + x3 + x469        x_LH = -x1 + x2 - x3 + x470        x_HH = x1 - x2 - x3 + x471        return torch.cat((x_LL, x_HL, x_LH, x_HH), 1)72    73def random_data(cover,device):74    return torch.zeros(cover.size(), device=device).random_(0, 2)75 76def auxiliary_variable(shape):77    noise = torch.zeros(shape).cuda()78    for i in range(noise.shape[0]):79        noise[i] = torch.randn(noise[i].shape).cuda()80 81    return noise82 83def computePSNR(origin,pred):84    origin = np.array(origin)85    origin = origin.astype(np.float32)86    pred = np.array(pred)87    pred = pred.astype(np.float32)88    mse = np.mean((origin/1.0 - pred/1.0) ** 2 )89    if mse < 1.0e-10:90      return 10091    return 10 * math.log10(255.0**2/mse)92 93def make_payload(width, height, depth, text, batch = 1):94    message = text_to_bits(text) + [0] * 3295 96    payload = message97    while len(payload) < batch * width * height * depth:98        payload += message99 100    101    payload = payload[:batch * width * height * depth]102    return torch.FloatTensor(payload).view(batch, depth, height, width)103 104def text_to_bits(text):105    return bytearray_to_bits(text_to_bytearray(text))106 107def bytearray_to_bits(x):108    result = []109    for i in x:110        bits = bin(i)[2:]111        bits = '00000000'[len(bits):] + bits112        result.extend([int(b) for b in bits])113 114    return result115 116def text_to_bytearray(text):117    assert isinstance(text, str), "expected a string"118    x = zlib.compress(text.encode("utf-8"))119    x = rs.encode(bytearray(x))120 121    return x122 123def bits_to_bytearray(bits):124    ints = []125    bits = np.array(bits)126    bits = 0 + bits127    bits = bits = bits.tolist()128    for b in range(len(bits) // 8):129        byte = bits[b * 8:(b + 1) * 8]130        ints.append(int(''.join([str(bit) for bit in byte]), 2))131    return bytearray(ints)132 133def bytearray_to_text(x):134    try:135        text = rs.decode(x)136        text = zlib.decompress(text[0])137            138        return text.decode("utf-8")139    except BaseException:140        return False

d3net.py:

python

1from model import *2from block import INV_block, INV_block_reverse # Import INV_block_reverse3 4class D3net(nn.Module):5 6    def __init__(self):7        super(D3net, self).__init__()8        self.inv1 = INV_block()9        self.inv2 = INV_block()10        self.inv3 = INV_block()11        self.inv4 = INV_block()12        self.inv5 = INV_block()13        self.inv6 = INV_block()14        self.inv7 = INV_block()15        self.inv8 = INV_block()16 17    def forward(self, x):18 19        out = self.inv1(x)20        out = self.inv2(out)21        out = self.inv3(out)22        out = self.inv4(out)23        out = self.inv5(out)24        out = self.inv6(out)25        out = self.inv7(out)26        out = self.inv8(out)27        return out28 29# Added for inverse operation30class D3net_reverse(nn.Module):31    def __init__(self, original_d3net_instance):32        super().__init__()33        self.inv_blocks_rev = nn.ModuleList()34        # Iterate through original blocks in reverse order35        # The original D3net has inv1 to inv8. So, index from 7 down to 0.36        for i in range(7, -1, -1): # From inv8 down to inv137            original_inv_block = getattr(original_d3net_instance, f'inv{i+1}')38            self.inv_blocks_rev.append(INV_block_reverse(original_inv_block))39 40    def forward(self, y_cat):41        # y_cat is the output of the forward pass of original D3net42        # which is (stego_dwt, z_channels)43        out = y_cat44        for inv_block_rev in self.inv_blocks_rev:45            out = inv_block_rev(out)46        # The final 'out' should be (recovered_cover_dwt, recovered_payload_dwt)47        return out

model.py:

python

1import torch.nn as nn2import torch3from d3net import D3net4 5 6class Model(nn.Module):7    def __init__(self,cuda=True):8        super(Model, self).__init__()9        self.model = D3net()10        if cuda:11            self.model.cuda()12        # init_model(self) # This is commented out, so it won't affect loading pretrained weights13 14    def forward(self, x):15        out = self.model(x)16        return out17 18 19def init_model(mod):20    for key, param in mod.named_parameters():21        split = key.split('.')22        if param.requires_grad:23            param.data = 0.01 * torch.randn(param.data.shape).cuda()24            if split[-2] == 'conv5':25                param.data.fill_(0.)

test.py:

python

1import torch2from model import Model3from utils import DWT, IWT, make_payload, auxiliary_variable, bits_to_bytearray, bytearray_to_text4import torchvision5from collections import Counter6from PIL import Image7import torchvision.transforms as T8 9# Import the reverse D3net10from d3net import D3net_reverse 11 12transform_test = T.Compose([13    T.CenterCrop((720,1280)),14    T.ToTensor(),15])16 17def load(name):18    state_dicts = torch.load(name)19    network_state_dict = {k:v for k,v in state_dicts['net'].items() if 'tmp_var' not in k}20    d3net.load_state_dict(network_state_dict)21 22def transform2tensor(img):23    img = Image.open(img)24    img = img.convert('RGB')25    return transform_test(img).unsqueeze(0).to(device)26 27def encode(cover, text):28    cover = transform2tensor(cover)29    B, C, H, W = cover.size()       30    payload = make_payload(W, H, C, text, B)31    payload = payload.to(device)32    cover_input = dwt(cover)33    payload_input = dwt(payload)        34    input_img = torch.cat([cover_input, payload_input], dim=1)35 36    output = d3net(input_img)37 38    output_steg = output.narrow(1, 0, 4 * 3)39    output_img = iwt(output_steg)40    # torchvision.utils.save_image(cover, f'./{text}.png')41    torchvision.utils.save_image(output_img,f'./steg.png')42 43 44def decode(steg_path):45    steg_tensor = transform2tensor(steg_path)46    stego_dwt = dwt(steg_tensor) # This is y1, 12 channels (B, 12, H/2, W/2)47 48    B, C, H, W = stego_dwt.size() # C is 12 (number of channels after DWT, i.e., 4*original_channels)49 50    # Create the 'z_prior' part (y2) for the inverse model.51    # In many invertible neural networks, the second part of the output (z_channels)52    # is trained to follow a simple distribution (e.g., standard normal or zero-mean).53    # For decoding, we feed the known stego_dwt (y1) and a sample from this prior (y2).54    # A common and simple choice for z_prior is a zero tensor if the model is designed55    # to push these latent channels towards zero.56    z_prior = torch.zeros(B, C, H, W).to(device) 57 58    # Concatenate stego_dwt (y1) and z_prior (y2) to form the input to D3net_reverse.59    # The input to the inverse network should have 24 channels (12 for y1, 12 for y2),60    # matching the output of the forward D3net.61    input_to_reverse = torch.cat((stego_dwt, z_prior), 1) # Total 24 channels62 63    # Instantiate the decoder network using the original D3net instance.64    # `d3net` in `__main__` is an instance of `Model`. 65    # `d3net.model` is the actual `D3net` instance that holds the trained weights.66    your_decode_net_instance = D3net_reverse(d3net.model)67    your_decode_net_instance.eval() # Set to evaluation mode68    your_decode_net_instance.to(device) # Move to device69 70    # Run the inverse model.71    # The output will be (recovered_cover_dwt, recovered_payload_dwt).72    # This output also has 24 channels.73    recovered_channels = your_decode_net_instance(input_to_reverse)74 75    # Extract the recovered payload DWT.76    # The original input to the forward D3net was (cover_input, payload_input), both 12 channels.77    # So, the second 12 channels of `recovered_channels` correspond to the payload.78    # `4 * 3` means 12 channels. We narrow from channel index 12 for 12 channels.79    secret_dwt = recovered_channels.narrow(1, 4 * 3, 4 * 3) # Channels 12 to 23 (inclusive), 12 channels total80 81    # Apply IWT to get the raw secret (back to 3 channels image representation).82    secret_rev = iwt(secret_dwt)83 84    # The rest of the decode function (from the original problem statement)85    # Reshape and convert to boolean bits.86    image = secret_rev.view(-1) > 0 # Convert to boolean tensor (torch.bool)87    88    candidates = Counter()89    # Convert boolean tensor to list of integers (0 or 1).90    bits = image.data.int().cpu().numpy().tolist()91    92    # The `make_payload` function adds `[0] * 32` as a delimiter. 93    # This translates to 4 zero bytes (`b'\x00\x00\x00\x00'`) after RS encoding and compression.94    for candidate in bits_to_bytearray(bits).split(b'\x00\x00\x00\x00'):95        candidate = bytearray_to_text(bytearray(candidate))96        if candidate:97            candidates[candidate] += 198    if len(candidates) == 0:99        raise ValueError('Failed to find message.')100    candidate, count = candidates.most_common(1)[0]101    print(candidate)102 103        104if __name__ == '__main__':105    d3net = Model()106    load('magic.potions')107    d3net.eval()108 109    dwt = DWT()110    iwt = IWT()111    112    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")113    114    text = r'd3ctf{Getting that model to converge felt like pure sorcery}'115    steg = r'./steg.png'116    cover = './poster.png'117    # encode(cover, text) # This line is commented out to prevent re-encoding.118    decode(steg) # Call decode with the stego image.