Automating a Paywalled Adobe Service

The Why

Adobe has some fantastic tooling for PDFs including their flagship product Acrobat. With their tentative purchase of Figma, I figured now would be a good time to rebuild my sorely outdated resumé in Figma, then export to PDF. The result turned out great, but the file size was about 2.7MB. Too big for some job boards as I soon discovered.

A quick Google search for how to compress PDFs returned Adobe’s online PDF compression tool. The tool provides an easy to use interface for uploading a PDF, selecting a compression level (low, medium, high), then downloading the compressed PDF.

The default compression level yields about a 4x compression from 2.7MB -> 667KB with no discernible loss in quality.

Here’s the problem – the app paywalls you after you use the tool once – an issue easily avoidable by using incognito tabs.

The how

While using incognito tabs in Chrome works, a CLI tool that lets users skip the cumbersome UI would be nice.

Introducing:

pdfren: a CLI tool that uses headless Chrome to compress your PDFs from the terminal.

From my time at hoodoo.digital, I utilized my experience writing something similar – a web scraping tool that downloads AEM artifacts and stores the files in AWS S3 buckets. While that particular tool was written with Node.JS and puppeteer, I had initially started the project in Golang using a headless browser driving tool called chromedp. After a bit of tinkering with limited success, I cut my losses and switched back to Javascript. For pdfren I wanted to use Go, so chromedp would be the library of choice.

Requirements

The idea of the app is simple: a cross-platform CLI tool that offers feature parity with Adobe’s browser-based PDF compressor tool. The CLI needs a single root command that runs the compression by loading the Adobe URL, uploading the specified PDF file on disk, then downloading the result. I would need the following dependencies for the project:

  • chromedp – for headless browser driving
  • cobra – for EZ command line commands and flag parsing
  • zerolog – for enabling a “verbose mode” via –verbose flag

Bootrapping the Go project was straightforward:

mkdir pdfren && cd pdfren # make the project directory

go mod init github.com/GradeyCullins/pdfren # init the module

# installs the dependencies
go get -u github.com/chromedp/chromedp
go get -u github.com/spf13/cobra
go get -u github.com/rs/zerolog/log

And we’re off to the races. The important bits of the code were as follows.

Defining the command

rootCmd = &cobra.Command{
	Use:   "pdfren",
	Short: "pdfren compresses your PDF using Adobe's online PDF compressor tool",
	Long:  "pdfren compresses your PDF using Adobe's online PDF compressor tool",
	Args:  cobra.MatchAll(cobra.ExactArgs(1), cobra.OnlyValidArgs),
	Run: func(cmd *cobra.Command, args []string) {
		log.Logger = log.Output(zerolog.ConsoleWriter{Out: os.Stderr})

		if isVerbose {
			zerolog.SetGlobalLevel(zerolog.DebugLevel)
		} else {
			zerolog.SetGlobalLevel(zerolog.InfoLevel)
		}

		pdfPath := args[0]
		f, err := os.Open(pdfPath)
		if err != nil {
			log.Fatal().Msg(err.Error())
		}

		RunCompressor(f, outFile)
	},
}

Here I’m defining the root command driver func, configuring the logger based on the verbose flag, validating the PDF file argument, then providing it to the RunCompressor function that kicks of the browser process.

DEFINING RunCompressor

// RunCompressor func
...

// Section 1
compressorURL := "https://www.adobe.com/acrobat/online/compress-pdf.html"
submitBtn := `button[data-test-id="ls-footer-primary-compress-button"]`
compressionBtn := fmt.Sprintf("input[data-test-id=\"compress-radio-option-%s\"]", compressionLevel)
downloadBtn := `button[data-testid="lifecycle-complete-5-download-button"]`

...

// Section 2
done := make(chan string, 1)
chromedp.ListenTarget(ctx, func(v interface{}) {
	if ev, ok := v.(*browser.EventDownloadProgress); ok {
		if ev.State == browser.DownloadProgressStateCompleted {
			done <- ev.GUID
			close(done)
		}
	}
})

...

// Section 3
if err := chromedp.Run(ctx,
	chromedp.SetUploadFiles(`input[accept=".pdf"]`, []string{file.Name()}, chromedp.NodeVisible),
	chromedp.WaitVisible(`div[aria-label="Select compression level:"]`, chromedp.NodeVisible),
); err != nil {
	log.Fatal().Msg(err.Error())
}

...

// Section 4
if err := chromedp.Run(ctx,
	chromedp.WaitVisible(downloadBtn, chromedp.NodeVisible),
	chromedp.WaitEnabled(downloadBtn, chromedp.NodeEnabled),
	browser.
		SetDownloadBehavior(browser.SetDownloadBehaviorBehaviorAllowAndName).
		WithDownloadPath(wd).
		WithEventsEnabled(true),
	chromedp.Click(downloadBtn, chromedp.NodeEnabled),
); err != nil {
	log.Fatal().Msg(err.Error())
}

guid := <-done

dlFile := filepath.Join(wd, guid)
os.Rename(dlFile, outFile)

Section 1

Here I’m showing the query selectors needed to navigate through the web compressor tool. Notably, the compressionBtn is a template string based on the compressionLevel flag which defaults to medium.

Section 2

Defining a channel and event listener to listen for the compressed file to download.

Section 3

Querying the PDF input on the webpage and setting the file selector to the file variable which corresponds to the PDF file on the user’s disk. Then wait for the upload to be complete by calling WaitVisible on the compression selection UI screen:

Section 4

First we wait for the download button to be complete after running the compression. Second is setting the browser download behavior to allow specifying the location the file is downloaded to. Third is to wait on the channel defined in section 2 to indicate the download is complete. Lastly get the file’s random name and rename to the name and location specified by the –outFile flag.

FIN

Boom, done! The project could be spruced up with tests, and progress indicators for upload/download, but for now it works!

Check out the source code and add stars or PRs, or install globally (make sure you have Go v16 or above):

go install github.com/GradeyCullins/pdfren@latest

Leave a Comment